waxing--artistic--statistic
waxing--artistic--statistic
Artistic Statistic
35 posts
Art. Stats. Sometimes even music.
Don't wanna be here? Send us removal request.
Text
Missing That Data
In the most simplistic sense: MCAR, MAR, NMAR
-Running fictional study of herpes among D list celebrities: predictor: # of current jobs outcome: seropostive HIV test
MCAR- missing caused by random proc
    - Paris submits her samples to the clinic, but accidentally gets lost in the lab chute {mechanism: random}
MAR- missing at random
    - Natalie and Bella are enrolled in the study, but refuse to submit their samples because they already know they are positive {mechanism: random but related to outcome}
NMAR- not missing at random
    - Trump, Lindsay, and Charlie are currently working in the industry and don’t want to ruin their ‘reps’, thus refrain from submitting the samples {mechanism: unobserved variable of self preservation}
1 note · View note
waxing--artistic--statistic · 10 years ago
Text
lapply() and sapply()
lapply will return a list of the same length as the object (x)
basic argument structure: lapply(x, FUN,..)
wherein:
x= vector (list/atomic) or an expression
FUN= function, for ex: mean
...= optional functions
sapply() more ‘user-friendly’ mode of applying a function to each element of an object--- by default, returns a vector, matrix, or array--that is if simplify=‘array’
basic argument structure: sapply(x, FUN, ..., simplify=T, USE.NAMES=T)
wherein:
x= vector (atom/list)
FUN= function
...= additional functions
simplify= logic/character string 
(can be TRUE/FALSE-- returning a simp. vector/matrix or can be =“array” and return an array)
USE.NAMES= logical, (i.e. T/F) , if =T, and X is a character, will use result as names
_____________________________________________________________
IN ACTION: 
if there is a data frame (af), then: af<-data.frame(x=c(2,4,6,8,NA),y=c(14,23,5,0,1))
lapply() and coercion
(1) run a structure test-- although entered as a data frame, mode= list, but structure should be correct
> str(af) 'data.frame':   5 obs. of  2 variables: $ x: num  2 4 6 8 NA $ y: num  14 23 5 0 1
(2) add new argument lapply, coerce object, and run structural test
> af<-lapply(af,as.integer) > str(af) List of 2 $ x: int [1:5] 2 4 6 8 NA $ y: int [1:5] 14 23 5 0 1
OR > af<-lapply(af,as.character) > str(af) List of 2 $ x: chr [1:5] "2" "4" "6" "8" ... $ y: chr [1:5] "14" "23" "5" "0" ...
(4) reset af , this time use lapply() with [] to preserve original class/object
> af[]<-lapply(af,as.integer) > str(af) 'data.frame':   5 obs. of  2 variables: $ x: int  2 4 6 8 NA $ y: int  14 23 5 0 1
> af[]<-lapply(af,as.character) > str(af) 'data.frame':   5 obs. of  2 variables: $ x: chr  "2" "4" "6" "8" ... $ y: chr  "14" "23" "5" "0" ...
**Note: structural differences between using [] --- it keeps the data frame structure
lapply() using other functions (non-coercive)
(1) start with freshly reset data frame (af) 
(2) add function of choice, print new object (af)
> af<-lapply(af,mean) > af $x [1] NA
$y ---(i.e. 14+23+0+5+1=43/5=8.6) [1] 8.6
note: remember, all operations will default to NA if an element value is NA
(3) reset af again, this time perform expressions with []
sapply()
*information drawn from cran-r library data base
0 notes
waxing--artistic--statistic · 10 years ago
Text
Missing Indices
OOB stands for index out of bounds
[[]] and [] differ in their behavior when index is OOB
if a data frame (z) is entered: z<-data.frame(x=1:5,y=6:10)
if an atomic vector (a) is entered: a<-1:15
check the length of your object: (i.e. the # of indices)
> length(z) [1] 2
>length(z$x)
[1] 5
>length(z$y)
[1] 5
>length(c(z$x,z$y))
[1] 10
> length(a) [1] 15
now try to extract an object outside of these bounds using [[]] and []
> z[3] Error in `[.data.frame`(z, 3) : undefined columns selected
> z[[3]] Error in .subset2(x, i, exact = exact) : subscript out of bounds
> a[16] [1] NA
> a[[16]] Error in a[[16]] : subscript out of bounds
0 notes
waxing--artistic--statistic · 10 years ago
Text
Simplifying vs. Preserving
Tumblr media
this table was taken from adv-r, Hadley WIckham
there is a difference between simplifying and preserving subsets
simplifying- returns the most simple data structure (can represents the output)
preserving- “keeps the structure of the output the same as the input “-- typically salient with programming (result will always be the same type)common error when subsetting: omitting drop=FALSE is a common mistake
_______________________________________________________________
0 notes
waxing--artistic--statistic · 10 years ago
Text
Subsetting Operators
So far it’s been established that: [] and $ selects columns/rows 
[[]] allows for more selective element operations--only allows you to pull out single values or columns in data frames--**note: when using lists, you can only use [[]]
we will create a variable (a) to understand this [[]] subset
a<-list(1:4,4:8,z=letters[1:4])
structurally: 
> str(a) List of 3 $  : int [1:4] 1 2 3 4 $  : int [1:5] 4 5 6 7 8 $ z: chr [1:4] "a" "b" "c" "d"
when using [[]]
it is most relevant for lists
you can specify a single element, or a string using c()
(1) using a single number or letter
> a[[1]] [1] 1 2 3 4
> a[['z']] [1] "a" "b" "c" "d"
> a$z [1] "a" "b" "c" "d"
0 notes
waxing--artistic--statistic · 10 years ago
Text
Subsetting: Data Frames
> DATA FRAMES
(if you only have 1 var in the data frame, treated like a list--with 2 or more vectors, df behaves like matrix)
Taken directly from site:  How many of your friends complain regularly about their bodies? ~None: 147 ~A few: 411 ~Some: 188 ~Most: 141 ~All: 53
construct an object that reflects it:
> body_complain<-data.frame(x=c('None','A few','Some','Most','All'),y=c(147,411,188,141,53))
object $ title of column
___________________________________________________________
>DATA FRAMES>STRUCTURAL QUALITIES
the structure of data frame subsets are unique 
examine the differences using this variable (a)
> a  x y z 1 1 1 a 2 2 2 b 3 3 3 c 4 4 4 d
> str(a) 'data.frame':   4 obs. of  3 variables: $ x: int  1 2 3 4 $ y: int  1 2 3 4 $ z: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
Now examine the structural differences which subsetting initiates in data frames
> str(a[c(1)]) 'data.frame':   4 obs. of  1 variable: $ x: int  1 2 3 4
> str(a[,c(1,3)]) 'data.frame':   4 obs. of  2 variables: $ x: int  1 2 3 4 $ z: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
> str(a[c(3),]) 'data.frame':   1 obs. of  3 variables: $ x: int 3 $ y: int 3 $ z: Factor w/ 4 levels "a","b","c","d": 3
To access a subscript of a data frame use $ and []
>a$y[3]
[1] 3
__________________________________________________________
>DATA FRAME > REASSIGNMENT
reassigning data frame elements combines subsets and reassignment and is discussed further in the section: Subsetting and Assignment
0 notes
waxing--artistic--statistic · 10 years ago
Text
Object Class
S3 Objects
(1) Atomic Vectors
(2) Arrays
(3) Lists
S4 Objects
More complex, need more subsetting operators
1 note · View note
waxing--artistic--statistic · 10 years ago
Text
Subsetting
brackets [] are known as subsetting 
This is an exciting concept that will require an understanding of:
3 subsetting operators
the 6 types of subsetting
behavioral differences among different objects
subsetting with assignment 
subsetting ranges from simple processes (atomic vectors) to more complicated using S3 objects and assignment operations
0 notes
waxing--artistic--statistic · 10 years ago
Text
Update
This is an important note: the source of information on this cite can be found in Advanced R, an awesome book written by Hadley Wickham. Wickham is a handsome programmer, educator, and Chief Scientist at RStudio
0 notes
waxing--artistic--statistic · 10 years ago
Text
Misc. & Shortcuts
abs(x)**absolute value of x
sqrt(x)**principal sqrt of x
x can be any numeric or complex vector display
for complex vectors
abs(x)==Mod(x) and sqrt(x)==x^0.5
$
indicative of ‘logical negation, or not’
for example: x<-rnorm(x),y<-rnorm(y), is.na(x)//should return all false, when perform y<-x[!is.na(x)], will recall a list of new y values (identical to x) because 
!indicates or as in present all index values that are not true
since there are no NA values, should display all x values
&, &&
signifies ‘logical AND 
|,||
signifies OR
0 notes
waxing--artistic--statistic · 10 years ago
Text
Attributes
Attributes and Names
all objects can store attributes or metadata about the object
attributes are essentially named lists
modifying a vector, is a reductive process, much information is lost
however only 3 attributes survive modification
1. names 2. dimensions 3. class
examining/recalling attributes:
attr() for individual recall
attributes() to recall the list at once
assigning attributes
use names(), dim(), and class() not attr()
______________________________________________________________
Names
there are 5 ways to name a vector
(1) during assignment  |  x<-c(a=1,b=2,c-3)
(2) Names() modify existing vector  | x<-1:3, names(x)<-c(’a’,’b’,’c’)
(3) setNames create a modified copy of existing vector  | x<-1:3, x<-setNames(1:3,c(’a’,’b’,’c’))
(4) colnames(matrix)<- c(’name1′,’name2′) or rownames(x)<-
(5) rename a named element with names(object)[indexnumber]<-c(’new name’)
names() to recall a name
if names missing, will return empty “ “ in place of missing elements
if no names, will return NULL
unname() or names(x)<- NULL to create a new vector without names or names(x)<-NULL to remove all names from obj
0 notes
waxing--artistic--statistic · 10 years ago
Text
Data Structures
all objects in R are organized by 2 qualities
1. dimension
2. homogeneous vs. heterogeneous
Dimension
1d, 2d, nd
Homogeneous
atomic vectors, matrix, arrays
all composed of the same type
Heterogeneous
lists, and data frames
can be a mix of categorical and numerical
5 major data types in R and their qualities
atomic vector- 1d, is homo
matrix- 2d, is homo
array- nd, homo
list- 1d, hetero
data frame- 2d, hetero
0 notes
waxing--artistic--statistic · 10 years ago
Text
Complex Matrix Expressions
There are multiple ways to program linear equations
If performing OLS, enter expression for y<-A+b*x– then enter solve(a)
lm() will be used more often then lsfit() for linear modeling
Eigenvalues and eigenvectors
If a matrix is symmetric Sm, then the function eigen(Sm)$values will generate the eignvalues
 The function while eigen(Sm)$vec are eigenvectors
Partitioned matrices
you can build a matrix from (1) another matrix (2) vectors using cbind and rbind functions
cbind() as in Matrix1<-cbind(vec_1,vec_2,vec_3) will add the arguments (vec 1-3) to Matrix 1, forming columns 
rbind() as in Matrix1<-rbind(vec_4,vec_5,vec_6) adds vec4-6 to Matrix1 forming rows
Coercing Arrays/Matrices/Vectors
c() will clear all dimensional attributes & dimnames
as.vector() will convert and array/matrix into a simple vector
0 notes
waxing--artistic--statistic · 10 years ago
Text
Array() Function
remember how to use the array function with a string of numbers? (ex: zeta<-array(1:30,dim=c(2,5,3)) where 2=rows,5=columns,3=#Of subscripts or sub-arrays)
array(x,dim=c(rows,columns,subscripts)) if x is a vector string
the array function can also be used with a variable object
its important to note that dim & vector length must be the same for all objects in order for a arithmetic expression to be performed
1 note · View note
waxing--artistic--statistic · 10 years ago
Photo
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
by shoko
1K notes · View notes
waxing--artistic--statistic · 10 years ago
Text
Index Matrix
The role of index vectors (i.e. combining a vector/object, an expression, and brackets--x[0>x])
In the previous section, I highlighted that arrays are salient with large sets of data. To recall arrays, commands generate individual (or groups of) elements. 
This is useful because of the assignment function! 
By combing these fundamentals, an index matrix expands the ability to extract, examine, or reconstruct elements of the array
0 notes
waxing--artistic--statistic · 10 years ago
Text
Arrays
array- is a “multiply sub scripted collection of data entries” i.e. an array  is a sub (inner) set of rows/columns from a bigger set of data entries. A single data entry may have multiple arrays, and subscripts which are subsections of the array
dim(obj name)<- c(#of rows, # of columns, #of arrays)
array(1:z,dim=c(rows,columns) (make sure row*columns=z) or
array(1:z//column1,1:b//column2,dim=c(rows,columns))
dimensional vector- vector of a positive integers, if the variable x is a length k then the array is k dimensional--(i.e. a 1 dimensional array is basically a vector)
matrix- is a 2 dimensional array, there are multiple ways to construct a matrix
Formulas/Ex for Arrays
if b<-1:160, then dim(b)<-c(2,4,20)---in this example, this will create an object of 160 elements, 19 sets of 2 rows, and 4 columns
hint: you do not have to include array information at all (leave space empty) to create a single array
or you can use the array function
objectname[row#,column#]
using the numeric row/column indices, this recalls one element from the data set
objectname[,column#] or objectname[row#,]
this recalls the entire elements in the specified row or column 
objectname[,,#of array]
this recalls a particular subscript of data for the object
> z[,,19]     [,1] [,2] [,3] [,4] [1,]  145  147  149  151 [2,]  146  148  150  152 > 
1 note · View note