waxing--artistic--statistic - Tumblr blog

waxing--artistic--statistic · 9 years ago

Text

Missing That Data

In the most simplistic sense: MCAR, MAR, NMAR

-Running fictional study of herpes among D list celebrities: predictor: # of current jobs outcome: seropostive HIV test

MCAR- missing caused by random proc

- Paris submits her samples to the clinic, but accidentally gets lost in the lab chute {mechanism: random}

MAR- missing at random

- Natalie and Bella are enrolled in the study, but refuse to submit their samples because they already know they are positive {mechanism: random but related to outcome}

NMAR- not missing at random

- Trump, Lindsay, and Charlie are currently working in the industry and don’t want to ruin their ‘reps’, thus refrain from submitting the samples {mechanism: unobserved variable of self preservation}

#missing values #fake

1 note · View note

waxing--artistic--statistic · 10 years ago

Text

lapply() and sapply()

lapply will return a list of the same length as the object (x)

basic argument structure: lapply(x, FUN,..)

wherein:

x= vector (list/atomic) or an expression

FUN= function, for ex: mean

...= optional functions

sapply() more ‘user-friendly’ mode of applying a function to each element of an object--- by default, returns a vector, matrix, or array--that is if simplify=‘array’

basic argument structure: sapply(x, FUN, ..., simplify=T, USE.NAMES=T)

wherein:

x= vector (atom/list)

FUN= function

...= additional functions

simplify= logic/character string

(can be TRUE/FALSE-- returning a simp. vector/matrix or can be =“array” and return an array)

USE.NAMES= logical, (i.e. T/F) , if =T, and X is a character, will use result as names

_____________________________________________________________

IN ACTION:

if there is a data frame (af), then: af<-data.frame(x=c(2,4,6,8,NA),y=c(14,23,5,0,1))

lapply() and coercion

(1) run a structure test-- although entered as a data frame, mode= list, but structure should be correct

> str(af) 'data.frame': 5 obs. of 2 variables: $ x: num 2 4 6 8 NA $ y: num 14 23 5 0 1

(2) add new argument lapply, coerce object, and run structural test

> af<-lapply(af,as.integer) > str(af) List of 2 $ x: int [1:5] 2 4 6 8 NA $ y: int [1:5] 14 23 5 0 1

OR > af<-lapply(af,as.character) > str(af) List of 2 $ x: chr [1:5] "2" "4" "6" "8" ... $ y: chr [1:5] "14" "23" "5" "0" ...

(4) reset af , this time use lapply() with [] to preserve original class/object

> af[]<-lapply(af,as.integer) > str(af) 'data.frame': 5 obs. of 2 variables: $ x: int 2 4 6 8 NA $ y: int 14 23 5 0 1

> af[]<-lapply(af,as.character) > str(af) 'data.frame': 5 obs. of 2 variables: $ x: chr "2" "4" "6" "8" ... $ y: chr "14" "23" "5" "0" ...

**Note: structural differences between using [] --- it keeps the data frame structure

lapply() using other functions (non-coercive)

(1) start with freshly reset data frame (af)

(2) add function of choice, print new object (af)

> af<-lapply(af,mean) > af $x [1] NA

$y ---(i.e. 14+23+0+5+1=43/5=8.6) [1] 8.6

note: remember, all operations will default to NA if an element value is NA

(3) reset af again, this time perform expressions with []

sapply()

*information drawn from cran-r library data base

#cran-r #simplify #vector #function

0 notes

waxing--artistic--statistic · 10 years ago

Text

Missing Indices

OOB stands for index out of bounds

[[]] and [] differ in their behavior when index is OOB

if a data frame (z) is entered: z<-data.frame(x=1:5,y=6:10)

if an atomic vector (a) is entered: a<-1:15

check the length of your object: (i.e. the # of indices)

> length(z) [1] 2

>length(z$x)

[1] 5

>length(z$y)

[1] 5

>length(c(z$x,z$y))

[1] 10

> length(a) [1] 15

now try to extract an object outside of these bounds using [[]] and []

> z[3] Error in `[.data.frame`(z, 3) : undefined columns selected

> z[[3]] Error in .subset2(x, i, exact = exact) : subscript out of bounds

> a[16] [1] NA

> a[[16]] Error in a[[16]] : subscript out of bounds

#adv-r #wickham #index #subsetting #length #dataframes

0 notes

waxing--artistic--statistic · 10 years ago

Text

Simplifying vs. Preserving

this table was taken from adv-r, Hadley WIckham

there is a difference between simplifying and preserving subsets

simplifying- returns the most simple data structure (can represents the output)

preserving- “keeps the structure of the output the same as the input “-- typically salient with programming (result will always be the same type)common error when subsetting: omitting drop=FALSE is a common mistake

_______________________________________________________________

#adv-r #simplify #preserve

0 notes

waxing--artistic--statistic · 10 years ago

Text

Subsetting Operators

So far it’s been established that: [] and $ selects columns/rows

[[]] allows for more selective element operations--only allows you to pull out single values or columns in data frames--**note: when using lists, you can only use [[]]

we will create a variable (a) to understand this [[]] subset

a<-list(1:4,4:8,z=letters[1:4])

structurally:

> str(a) List of 3 $ : int [1:4] 1 2 3 4 $ : int [1:5] 4 5 6 7 8 $ z: chr [1:4] "a" "b" "c" "d"

when using [[]]

it is most relevant for lists

you can specify a single element, or a string using c()

(1) using a single number or letter

> a[[1]] [1] 1 2 3 4

> a[['z']] [1] "a" "b" "c" "d"

> a$z [1] "a" "b" "c" "d"

#adv-r #subset #operators #lists #data frames

0 notes

waxing--artistic--statistic · 10 years ago

Text

Subsetting: Data Frames

> DATA FRAMES

(if you only have 1 var in the data frame, treated like a list--with 2 or more vectors, df behaves like matrix)

Taken directly from site: How many of your friends complain regularly about their bodies? ~None: 147 ~A few: 411 ~Some: 188 ~Most: 141 ~All: 53

construct an object that reflects it:

> body_complain<-data.frame(x=c('None','A few','Some','Most','All'),y=c(147,411,188,141,53))

object $ title of column

___________________________________________________________

>DATA FRAMES>STRUCTURAL QUALITIES

the structure of data frame subsets are unique

examine the differences using this variable (a)

> a x y z 1 1 1 a 2 2 2 b 3 3 3 c 4 4 4 d

> str(a) 'data.frame': 4 obs. of 3 variables: $ x: int 1 2 3 4 $ y: int 1 2 3 4 $ z: Factor w/ 4 levels "a","b","c","d": 1 2 3 4

Now examine the structural differences which subsetting initiates in data frames

> str(a[c(1)]) 'data.frame': 4 obs. of 1 variable: $ x: int 1 2 3 4

> str(a[,c(1,3)]) 'data.frame': 4 obs. of 2 variables: $ x: int 1 2 3 4 $ z: Factor w/ 4 levels "a","b","c","d": 1 2 3 4

> str(a[c(3),]) 'data.frame': 1 obs. of 3 variables: $ x: int 3 $ y: int 3 $ z: Factor w/ 4 levels "a","b","c","d": 3

To access a subscript of a data frame use $ and []

>a$y[3]

[1] 3

__________________________________________________________

>DATA FRAME > REASSIGNMENT

reassigning data frame elements combines subsets and reassignment and is discussed further in the section: Subsetting and Assignment

#subsetting #data.frames #adv-r

0 notes

waxing--artistic--statistic · 10 years ago

Text

Object Class

S3 Objects

(1) Atomic Vectors

(2) Arrays

(3) Lists

S4 Objects

More complex, need more subsetting operators

#object class #adv-r #class

1 note · View note

waxing--artistic--statistic · 10 years ago

Text

Subsetting

brackets [] are known as subsetting

This is an exciting concept that will require an understanding of:

3 subsetting operators

the 6 types of subsetting

behavioral differences among different objects

subsetting with assignment

subsetting ranges from simple processes (atomic vectors) to more complicated using S3 objects and assignment operations

#adv-r #subsetting #assignment

0 notes

waxing--artistic--statistic · 10 years ago

Text

Update

This is an important note: the source of information on this cite can be found in Advanced R, an awesome book written by Hadley Wickham. Wickham is a handsome programmer, educator, and Chief Scientist at RStudio

0 notes

waxing--artistic--statistic · 10 years ago

Text

Misc. & Shortcuts

abs(x)**absolute value of x

sqrt(x)**principal sqrt of x

x can be any numeric or complex vector display

for complex vectors

abs(x)==Mod(x) and sqrt(x)==x^0.5

$

!

indicative of ‘logical negation, or not’

for example: x<-rnorm(x),y<-rnorm(y), is.na(x)//should return all false, when perform y<-x[!is.na(x)], will recall a list of new y values (identical to x) because

!indicates or as in present all index values that are not true

since there are no NA values, should display all x values

&, &&

signifies ‘logical AND

|,||

signifies OR

#cran.r #r.shortcuts #logic

0 notes

waxing--artistic--statistic · 10 years ago

Text

Attributes

Attributes and Names

all objects can store attributes or metadata about the object

attributes are essentially named lists

modifying a vector, is a reductive process, much information is lost

however only 3 attributes survive modification

1. names 2. dimensions 3. class

examining/recalling attributes:

attr() for individual recall

attributes() to recall the list at once

assigning attributes

use names(), dim(), and class() not attr()

______________________________________________________________

Names

there are 5 ways to name a vector

(1) during assignment | x<-c(a=1,b=2,c-3)

(2) Names() modify existing vector | x<-1:3, names(x)<-c(’a’,’b’,’c’)

(3) setNames create a modified copy of existing vector | x<-1:3, x<-setNames(1:3,c(’a’,’b’,’c’))

(4) colnames(matrix)<- c(’name1′,’name2′) or rownames(x)<-

(5) rename a named element with names(object)[indexnumber]<-c(’new name’)

names() to recall a name

if names missing, will return empty “ “ in place of missing elements

if no names, will return NULL

unname() or names(x)<- NULL to create a new vector without names or names(x)<-NULL to remove all names from obj

#attributes #names #data_structures

0 notes

waxing--artistic--statistic · 10 years ago

Text

Data Structures

all objects in R are organized by 2 qualities

1. dimension

2. homogeneous vs. heterogeneous

Dimension

1d, 2d, nd

Homogeneous

atomic vectors, matrix, arrays

all composed of the same type

Heterogeneous

lists, and data frames

can be a mix of categorical and numerical

5 major data types in R and their qualities

atomic vector- 1d, is homo

matrix- 2d, is homo

array- nd, homo

list- 1d, hetero

data frame- 2d, hetero

#advancedr

0 notes

waxing--artistic--statistic · 10 years ago

Text

Complex Matrix Expressions

There are multiple ways to program linear equations

If performing OLS, enter expression for y<-A+b*x– then enter solve(a)

lm() will be used more often then lsfit() for linear modeling

Eigenvalues and eigenvectors

If a matrix is symmetric Sm, then the function eigen(Sm)$values will generate the eignvalues

The function while eigen(Sm)$vec are eigenvectors

Partitioned matrices

you can build a matrix from (1) another matrix (2) vectors using cbind and rbind functions

cbind() as in Matrix1<-cbind(vec_1,vec_2,vec_3) will add the arguments (vec 1-3) to Matrix 1, forming columns

rbind() as in Matrix1<-rbind(vec_4,vec_5,vec_6) adds vec4-6 to Matrix1 forming rows

Coercing Arrays/Matrices/Vectors

c() will clear all dimensional attributes & dimnames

as.vector() will convert and array/matrix into a simple vector

#matrix #expressions #linear #array

0 notes

waxing--artistic--statistic · 10 years ago

Text

Array() Function

remember how to use the array function with a string of numbers? (ex: zeta<-array(1:30,dim=c(2,5,3)) where 2=rows,5=columns,3=#Of subscripts or sub-arrays)

array(x,dim=c(rows,columns,subscripts)) if x is a vector string

the array function can also be used with a variable object

its important to note that dim & vector length must be the same for all objects in order for a arithmetic expression to be performed

#array #vector

1 note · View note

waxing--artistic--statistic · 10 years ago

Photo

by shoko

#images #flowers

1K notes · View notes

waxing--artistic--statistic · 10 years ago

Text

Index Matrix

The role of index vectors (i.e. combining a vector/object, an expression, and brackets--x[0>x])

In the previous section, I highlighted that arrays are salient with large sets of data. To recall arrays, commands generate individual (or groups of) elements.

This is useful because of the assignment function!

By combing these fundamentals, an index matrix expands the ability to extract, examine, or reconstruct elements of the array

#index #matrix #dimensional #arrays

0 notes

waxing--artistic--statistic · 10 years ago

Text

Arrays

array- is a “multiply sub scripted collection of data entries” i.e. an array is a sub (inner) set of rows/columns from a bigger set of data entries. A single data entry may have multiple arrays, and subscripts which are subsections of the array

dim(obj name)<- c(#of rows, # of columns, #of arrays)

array(1:z,dim=c(rows,columns) (make sure row*columns=z) or

array(1:z//column1,1:b//column2,dim=c(rows,columns))

dimensional vector- vector of a positive integers, if the variable x is a length k then the array is k dimensional--(i.e. a 1 dimensional array is basically a vector)

matrix- is a 2 dimensional array, there are multiple ways to construct a matrix

Formulas/Ex for Arrays

if b<-1:160, then dim(b)<-c(2,4,20)---in this example, this will create an object of 160 elements, 19 sets of 2 rows, and 4 columns

hint: you do not have to include array information at all (leave space empty) to create a single array

or you can use the array function

objectname[row#,column#]

using the numeric row/column indices, this recalls one element from the data set

objectname[,column#] or objectname[row#,]

this recalls the entire elements in the specified row or column

objectname[,,#of array]

this recalls a particular subscript of data for the object

> z[,,19] [,1] [,2] [,3] [,4] [1,] 145 147 149 151 [2,] 146 148 150 152 >

#arrays #dim #dimensional #vector #recalling variables

1 note · View note