r-haus
r-haus
Fun with data analysis using R
127 posts
Don't wanna be here? Send us removal request.
r-haus · 5 years ago
Text
Indicators
Risky sexual behaviour: Early sexual initiation, 5+ lifetime sexual partners, 2+ recent sexual partners (Risky sex behaviours among college students: The psychosocial profile)Early sexual initiation, 5+ lifetime sexual partners, 2+ recent sexual partners (Risky sex behaviours among college students: The psychosocial profile)IMR, CMR
infant mortality rates (IMR) for children aged 0–11 months, child mortality rates (CMR) for children aged 12–59 months, and under-five mortality rates (U5MR) for children aged 0–59 months,
0 notes
r-haus · 5 years ago
Text
Measurement of alcohol use: research
‘low risk’ drinkers, defined as only 1–2 drinks per day over the last 7 days and no more than 7 in total for women and 14 in total for men; and, ‘at risk’ drinkers, defined as either four or more drinks on any one day in the previous seven days for women and five or more drinks for men, or eight drinks in total for women and 15 for men over the previous seven days. Men and women who consumed 3 drinks per day and no more than 7 in total over the previous 7 days for women and 14 in total for men were not included in the “low risk” and “at risk” categories and were thus not included in the analysis. These definitions are based on the guide- lines from the US National Institute of Alcohol Abuse and Alcoholism (NIAAA) for adult men and women.
0 notes
r-haus · 5 years ago
Text
STATA META-ANALYSIS
https://www.statalist.org/forums/forum/general-stata-discussion/general/1302978-metaprop-by
metaprop num denom, random by(tgroup) label(namevar=author, yearvar=year)
// suppose you have 4 studies clear input study cases total 1 20 1000 2 40 5000 3 30 1500 4 25 3300 end gen p = . gen se = . // get proportions and std errors forv i =1(1)4 { cii total[`i’] cases[`i’] qui replace p = r(mean) in `i’ qui replace se = r(se) in `i’ } // get the inverse variance-weighted proportion // use the official Stata -vwls- command gen cons =1 vwls p cons, sd(se) // use the user written -metan- command // for fixed-effects meta-analysis metan p se, nograph fixed // for random-effects meta-analysis metan p se, nograph random
http://statalist.1588530.n2.nabble.com/st-meta-analysis-question-td3196024.html
0 notes
r-haus · 5 years ago
Text
HFIAS
Calculation of FS score: Calculate the Household Food Insecurity Access category for each household. 1 = Food Secure, 2=Mildly Food Insecure Access, 3=Moderately Food Insecure Access, 4=Severely Food Insecure Access HFIA category = 1 if [(Q1a=0 or Q1a=1) and Q2=0 and Q3=0 and Q4=0 and Q5=0 and Q6=0 and Q7=0 and Q8=0 and Q9=0] HFIA category = 2 if [(Q1a=2 or Q1a=3 or Q2a=1 or Q2a=2 or Q2a=3 or Q3a=1 or Q4a=1) and Q5=0 and Q6=0 and Q7=0 and Q8=0 and Q9=0] HFIA category = 3 if [(Q3a=2 or Q3a=3 or Q4a=2 or Q4a=3 or Q5a=1 or Q5a=2 or Q6a=1 or Q6a=2) and Q7=0 and Q8=0 and Q9=0] HFIA category = 4 if [Q5a=3 or Q6a=3 or Q7a=1 or Q7a=2 or Q7a=3 or Q8a=1 or Q8a=2 or Q8a=3 or Q9a=1 or Q9a=2 or Q9a=3].
g fs=.
replace fs=1 if (a==0 | a==1) & b==0 & c==0 & d==0 & e==0 & f==0 & g==0 & h==0 & i==0
replace fs=2 if [(a==2 | a==3 | b==1 | b==2 | b==3 | c==1 | d==1) & e==0 & f==0 & g==0 & h==0 & i==0]
replace fs=3 if [(c==2 | c==3 | d==2 | d==3 | e==1 | e==2 | f==1 | f==2) & g==0 & h==0 & i==0]
replace fs=4 if [e==3 | f==3 | g==1 | g==2 | g==3 | h==1 | h==2 | h==3 | i==1 | i==2 | i==3]
0 notes
r-haus · 5 years ago
Text
Forest plot R
1.
library(metafor) data(dat.bcg) dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
par(mfrow=c(1,2)) par(mar=c(5,4,1,1)) forest(dat$yi, dat$vi, annotate=FALSE, cex=.8, at=seq(-3,2,1), digits=1, xlim=c(-6,2)) text(0, 15, "Figure 1", cex=.8, font=2) par(mar=c(5,3,1,2)) forest(dat$yi, dat$vi, annotate=FALSE, slab=rep("",length(dat$yi)), cex=.8, at=seq(-3,2,1), digits=1, xlim=c(-5,3)) text(0, 15, "Figure 2", cex=.8, font=2)
...................................................................
2 .
library(forestplot) # Cochrane data from the 'rmeta'-package cochrane_from_rmeta <-  structure(list(    mean  = c(NA, NA, 0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017, NA, 0.531),    lower = c(NA, NA, 0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365, NA, 0.386),    upper = c(NA, NA, 0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831, NA, 0.731)),    .Names = c("mean", "lower", "upper"),    row.names = c(NA, -11L),    class = "data.frame") tabletext<-cbind(  c("", "Study", "Auckland", "Block",    "Doran", "Gamsu", "Morrison", "Papageorgiou",    "Tauesch", NA, "Summary"),  c("Deaths", "(steroid)", "36", "1",    "4", "14", "3", "1",    "8", NA, NA),  c("Deaths", "(placebo)", "60", "5",    "11", "20", "7", "7",    "10", NA, NA),  c("", "OR", "0.58", "0.16",    "0.25", "0.70", "0.35", "0.14",    "1.02", NA, "0.53")) forestplot(tabletext,           cochrane_from_rmeta,new_page = TRUE,           is.summary=c(TRUE,TRUE,rep(FALSE,8),TRUE),           clip=c(0.1,2.5),           xlog=TRUE,           col=fpColors(box="royalblue",line="darkblue", summary="royalblue"))
.............................................................
2.
test_data <- data.frame(coef1=c(1, 1.59, 1.3, 1.24, 1.4),                        coef2=c(1, 1.7, 1.4, 1.04, 1.5),                        low1=c(1, 1.3, 1.1, 0.99. 1.2),                        low2=c(1, 1.6, 1.2, 0.7, 1.3),                        high1=c(1, 1.94, 1.6, 1.55. 1,6),                        high2=c(1, 1.8, 1.55, 1.33, 1.9))
col_no <- grep("coef", colnames(test_data)) row_names <- list(    list("Category 1", "Category 2",  "Category 3", "Category 4", expression(Category >= 5)),    list("ref",         substitute(expression(bar(x) == val),                    list(val = round(rowMeans(test_data[2, col_no]), 2))),         substitute(expression(bar(x) == val),                    list(val = round(rowMeans(test_data[3, col_no]), 2))),
        substitute(expression(bar(x) == val),                    list(val = round(rowMeans(test_data[4, col_no]), 2))),        
        substitute(expression(bar(x) == val),                    list(val = round(rowMeans(test_data[5, col_no]), 2)))) ) coef <- with(test_data, cbind(coef1, coef2)) low <- with(test_data, cbind(low1, low2)) high <- with(test_data, cbind(high1, high2)) forestplot(row_names, coef, low, high,           title="Cool study",           zero = c(0.98, 1.02),           grid = structure(c(2^-.5, 2^.5), gp = gpar(col = "steelblue", lty=2)),           boxsize=0.25,           col=fpColors(box=c("royalblue", "gold"),                        line=c("darkblue", "orange"),                        summary=c("darkblue", "red")),           xlab="The estimates",           new_page = TRUE,           legend=c("Treatment", "Placebo"),           legend_args = fpLegend(pos = list("topright"),                                  title="Group",                                  r = unit(.1, "snpc"),                                  gp = gpar(col="#CCCCCC", lwd=1.5)))
3.
test_data <- data.frame(coef=c(2.45, 0.43), low=c(1.5, 0.25), high=c(4, 0.75), boxsize=c(0.5, 0.5)) row_names <- cbind(c(“Region”, “urban”, “rural”), c(“OR”, test_data$coef)) test_data <- rbind(rep(NA, 3), test_data)
forestplot(labeltext = row_names, test_data[,c(“coef”, “low”, “high”)], is.summary=c(TRUE, FALSE, FALSE), boxsize = test_data$boxsize, zero = 1, xlog = TRUE, col = fpColors(lines=”red”, box=”darkred”))
4.
Sweden <-  structure(            c(0.0408855062954068, -0.0551574080806885,              -0.0383305964199184, -0.0924757229652802,              0.0348395599810297, -0.0650808763059716,              -0.0472794647337126, -0.120200006386798,              0.046931452609784, -0.0452339398554054,              -0.0293817281061242, -0.0647514395437626),            .Dim = c(4L, 3L),            .Dimnames = list(c("Males vs Female", "85 vs 65 years",                              "Charlsons Medium vs Low", "Charlsons High vs Low"),                             c("coef", "lower", "upper"))) Denmark <-  structure(            c(0.0346284183072541, -0.0368279085760325,              -0.0433553672510346, -0.0685734649940999,              0.00349437418972517, -0.0833673052667752,              -0.0903366633240568, -0.280756832078775,              0.065762462424783, 0.00971148811471034,              0.00362592882198759, 0.143609902090575),            .Dim = c(4L, 3L),            .Dimnames = list(c("Males vs Female", "85 vs 65 years",                               "Charlsons Medium vs Low", "Charlsons High vs Low"),                             c("coef", "lower", "upper")))
0 notes
r-haus · 5 years ago
Text
dataframe preparation:
R
location <- c(“ontario”, “yukon”, “BC”, “alberta”) alcol <- c(4.2, 3.6, 6, 4) data.alcol <- data.frame(region,location, alcol)
load data:
newdata <- read.csv(file.choose(), header=T)
copy data from clipboard:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1          5.1         3.5          1.4         0.2  setosa 2          4.9         3.0          1.4         0.2  setosa 3          4.7         3.2          1.3         0.2  setosa 4          4.6         3.1          1.5         0.2  setosa 5          5.0         3.6          1.4         0.2  setosa 6          5.4         3.9          1.7         0.4  setosa
Now if I copy that data directly, and then run the following
library(overflow) soread() # data.frame “mydf” created in your workspace #   Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1          5.1         3.5          1.4         0.2  setosa # 2          4.9         3.0          1.4         0.2  setosa # 3          4.7         3.2          1.3         0.2  setosa # 4          4.6         3.1          1.5         0.2  setosa # 5          5.0         3.6          1.4         0.2  setosa # 6          5.4         3.9          1.7         0.4  setosa
…………………………………………………………………………………
https://www.r-graph-gallery.com/303-lollipop-plot-with-2-values/
DUMBBELL CHART: https://stackoverflow.com/questions/40011005/adding-a-legend-to-a-dumbbell-chart-in-r
(ggplot2) Exercising with (ggalt) dumbbells I follow the most excellent Pew Research folks on Twitter to stay in tune with what's happening (statistically…rud.is
Illustrating the Arc of European Colonialism Using a Dot Plot A while back I was thinking about European colonialism and the enormous impact it's had on world history. Wouldn't it…geovisualist.com
0 notes
r-haus · 5 years ago
Text
Basic commands 1
## NAMING COLUMNS fruit <- c(5, 10, 1, 20) names(fruit) <- c(“orange”, “banana”, “apple”, “peach”)
##renaming vars using strngr library(stringr) names(iris) #[1] “Sepal.Length” “Sepal.Width” “Petal.Length” “Petal.Width” “Species” names(iris) <- str_replace_all(names(iris), “[.]”, “_”) names(iris) #[1] “Sepal_Length” “Sepal_Width” “Petal_Length” “Petal_Width” “Species”
df$var1=str_to_title(df$var1)
##scan B<- scan() 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97
0 notes
r-haus · 5 years ago
Text
r-Haus
r-haus
##read.dta dat <- read.dta13(“E:/DHS2020/mydownload/ASIA 20-IR/AMIR72FL.DTA”)
##rename library(Hmisc)
var.labels = c(v013=”Age groups”, v025=”residency”)
label(df) = as.list(var.labels[match(names(df), names(var.labels))])
label(df)
## setwd setwd(“P:/R”)
##clear environment rm(list = ls())
##show column names names(df) colnames(df)
##missing vals mv[mv==””] <-NA
##counting missings vars sapply(df, function(x) sum(is.na(x))) colSums(is.na(df))
##show variable types str(df)
##drop missing vars dd<- Filter(function(x)!all(is.na(x)), df)
##rename dd<-rename(dd, c(Occ=”Occupation”, Residency=”Res”, Land=”Lan”))
##reorder dd <- dd[c(“v3”, “v1”, “v7”)]
## reorder & drop vars df=dd[c(“v4”, “v3”, “v6”, “v1”, “v7”)] df=df[c(“v1”, “v7”, “v6”)]
## factorise df[,] <- lapply(df[,] , factor)
##dummify df$v1 = dummy(df$v1)
##duplicate var
v2<-v1
## labels library(sjlabelled) library(sjmisc)
get_labels(v1) get_labels(list(v1, v2, v3))
fss<- set_labels(fs, labels = c(“Not-insecure” = 1, “Insecure” = 0)) frq(fss)
set missing:
set_na(x, na = c(9998))
test <- set_labels( dummies, dummy1, dummy2, labels = c(“very low”, “low”, “mid”, “hi”))
**** Age<- set_labels(Age, labels = c(“15–19” = 1, “20–24” =2 , “25–29” =3 , “30–34” =4 , “35–39” =5 , “40–44” = 6, “45–49” = 7)) Education<- set_labels(Education, labels = c(“None” = 0, “Primary” =1 , “Secondary” =2)) Residency<- set_labels(Residency, labels = c(“Urban” =1 , “Rural” =2)) Wealth<- set_labels(Age, labels = c(“Poorest(Q1)” = 1, “Poorer(Q2)” =2 , “Middle(Q3)” =3 , “Richer(Q4)” =4 , “Richest(Q5)” =5)) HHsex<- set_labels(HHsex, labels = c(“Male” =1 , “Female” =2))
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
##glm ml<- glm(v025 ~ v016 + v190 + v013, data = mv, family = “binomial”(link=”logit”))
0R
library(epiDisplay) ml<- glm(v025 ~ v016 + v190 + v013, data = mv, family = “binomial”(link=”logit”)) logistic.display(ml)
— — — — — — — — — — — — — — — — — — — ## R2wd packages:
devtools::install_github(“dkyleward/RDCOMClient”)
— — — — — — — — — — — — — — — — — — — — — —
web analysis: https://dash.datascienceplus.com/
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — - <<<Example 1>>> ##packages library(finalfit) library(dplyr)
##Selecting variables png<-PNG_FS[c(“Age”, “Education”,”Occ”,”Residency”, “Region”,”Wealth”,”Land”,”HHsex”,”fs2", “Spending”)]
##vars dv=”fs2" iv=c(“Age”, “Education”,”Occ”,”Residency”, “Region”,”Wealth”,”Land”,”HHsex”)
##Check data png %>% ff_glimpse(dv, iv)
## (so many mising cases) drop those: pp<- na.omit(png)
##reselecting variables after dropping missings png<-pp[c(“Age”, “Education”,”Occ”,”Residency”, “Region”,”Wealth”,”Land”,”HHsex”,”fs2", “Spending”)]
##factorise
png[,] <- lapply(png[,] , factor)
## Crosstab png %>% summary_factorlist(dv, iv, p=TRUE, na_include=TRUE, add_dependent_label = T)->table1
## glm png %>% finalfit(dv, iv, dependent_label_prefix = “”)->table2
## glm plot png %>% or_plot(dv, iv, breaks=c(0, 0.01, .1, .2, .3 , 1, 2, 3))
0 notes
r-haus · 5 years ago
Text
assigning value labels
https://scriptsandstatistics.wordpress.com/2017/11/29/how-to-assign-variable-labels-in-r/
0 notes
r-haus · 5 years ago
Text
ggplot bar charts
https://uc-r.github.io/barcharts
0 notes
r-haus · 5 years ago
Text
age pyramid
set.seed(1) age <- rep(1:90, 2) sex <- rep(c('Monsieur', 'Madame'), each = 90) pop <- rep(seq(100,11),2) + runif(180,0,10) df <- data.frame(age, sex, pop) %>%  mutate(abs_pop = pop) %>%  mutate(pop =ifelse(sex=='Monsieur',-pop,pop)) df %>%  plot_ly(x= ~pop, y=~age,color=~sex) %>%  add_bars(orientation = 'h', hoverinfo = 'text', text = ~abs_pop) %>%  layout(bargap = 0.1, barmode = 'overlay',         xaxis = list(tickmode = 'array', tickvals = c(-1000, -500, 0, 500, 1000),                      ticktext = c('1000', '500', '0', '500', '1000')))
0 notes
r-haus · 5 years ago
Text
file conversion/export
# create file to convert export(mtcars, "mtcars.dta")
# convert Stata to SPSS convert("mtcars.dta", "mtcars.sav")
https://cran.r-project.org/web/packages/rio/vignettes/rio.html
0 notes
r-haus · 5 years ago
Text
cleaning colnames
library(janitor)
#can be done by simply ctm2 <- clean_names(ctm2)
#or piping through `dplyr` ctm2 <- ctm2 %>%        clean_names()
0 notes
r-haus · 5 years ago
Text
delete word from colnames
names(df)<-str_replace_all(names(df), c("var1" = ""))
https://dnidzgorski.wordpress.com/2017/06/09/r-fix-column-names-spaces/
0 notes
r-haus · 5 years ago
Text
delete columns
df = subset(mydata, select = c(x,z))
0 notes
r-haus · 5 years ago
Text
remove character strings by position
names(df) <- substring(names(df),1,5)
https://stackoverflow.com/questions/34047552/remove-first-two-characters-for-all-column-names-in-a-data-frame-in-r/34047726
0 notes
r-haus · 5 years ago
Text
Tumblr media
https://stackoverflow.com/questions/39183203/grouped-bar-chart-with-grouping-in-plotly
0 notes