stata-haus
stata-haus
Fun with data analysis using STATA
89 posts
Don't wanna be here? Send us removal request.
stata-haus · 5 years ago
Text
split variable with multiple values
https://www.statalist.org/forums/forum/general-stata-discussion/general/1566170-how-do-you-deal-with-multi-value-cells-in-data-cleaning
split country, gen(land) parse(,)
OR
preserve split COUNTRY, parse(",") gen(c) keep c* name reshape long c, i(name) j(_j) tab c restore
0 notes
stata-haus · 5 years ago
Text
community variable
egen hhwealth= xtile(v191), by(psu) nq(3)
0 notes
stata-haus · 5 years ago
Text
replacing/deleting parts of colname
names(x)<-str_replace_all(names(x), c(" " = "." , "," = "" ))
names(df)<-str_replace_all(names(df), c("neighbourhood" = ""))
0 notes
stata-haus · 5 years ago
Text
batch append file
clear local filelist : dir "C:\Users\Administrator\Desktop\gam" files "*" foreach file of local filelist { append using "`file'", keep(v000) }
------------------------------------------------
use data1, clear   foreach num of numlist 2/30 {      append using data`num'    }
0 notes
stata-haus · 5 years ago
Text
saving ORs for each id
statsby bL=_b[v201] , by(id v190 ): regress v106 v190 v201 v151 , vce(robust)
https://stackoverflow.com/questions/51657443/regression-loop-and-store-specific-coefficient-in-new-dataset-stata
0 notes
stata-haus · 5 years ago
Text
loop regression
foreach x in v1 v2 v3 v4 {   logit `x' i.v5  }
0 notes
stata-haus · 5 years ago
Text
GLM for ORs and RRs
ORs(same as logit):
glm DV i.V1, family(binomial) link(logit) eform 
RRs:
glm DV i.V1, family(binomial) link(log) eform
0 notes
stata-haus · 5 years ago
Text
DV recode
g ev=. replace ev=0 if d103a==0 & d103b==0 & d103c==0 & d104==0   replace ev=1 if d103a>=1 | d103b>=1 | d103c>=1 | d104>=1   replace ev=. if d103a==. | d103b==. | d103c==. | d104==.
g pv=. replace pv=0 if d105a==0 & d105b==0 & d105c==0 & d105d==0 & d105e==0 & d105f==0 & d105j==0 replace pv=1 if d105a>=1 | d105b>=1 | d105c>=1 | d105d>=1 | d105e>=1 | d105f>=1 | d105j>=1 replace pv=. if d105a==. | d105b==. | d105c==. | d105d==. | d105e==. | d105f==. | d105j==.
g sv=. replace sv=0 if d105h==0 &  d108==0 replace sv=1 if d105h>=1 |  d108>=1 replace sv=. if d105h==. |  d108==.
g ipv=. replace ipv=0 if  ev==0 & pv==0 & sv==0 replace ipv=1 if  ev==1 | pv==1 | sv==1 replace ipv=. if  ev==. | pv==. | sv==.
lab def ipv 0 "No" 1 "Yes" lab val ipv ipv ta ipv
0 notes
stata-haus · 5 years ago
Text
DDS
recode v410-v414v (8=0)
g grn_rt_tbr=. replace grn_rt_tbr=1 if v412a== 1 | v414e== 1 | v414f== 1 replace grn_rt_tbr=0 if v412a== 0 & v414e== 0 & v414f== 0
g legm_nut=. replace legm_nut=1 if v414o== 1 replace legm_nut=0 if v414o ==0
g dairy=. replace dairy=1 if v411== 1 | v411a== 1 | v414v== 1 | v414p== 1 replace dairy=0 if v411== 0 & v411a== 0 & v414v== 0 & v414p== 0
g flsh=. replace flsh=1 if v414h== 1 | v414m== 1 | v414n== 1 replace flsh=0 if v414h== 0 & v414m== 0 & v414n== 0
g egg=. replace egg=1 if v414g== 1 replace egg=0 if v414g==0
g vt_A_frt=. replace vt_A_frt=1 if v414i== 1 | v414j==1 | v414k== 1 replace vt_A_frt=0 if v414i== 0 & v414j==0 & v414k== 0
g fnv=. replace fnv=1 if v414l==1 replace fnv=0 if v414l==0
g dd= grn_rt_tbr+legm_nut+dairy+flsh+egg+vt_A_frt+fnv ta dd
recode dd (min/3=0 "No")(4/max=1 "Yes"), g(dds) ta dds
0 notes
stata-haus · 5 years ago
Text
 blabel(group, position(base) color(bg))
-------------------------------------------------
0 notes
stata-haus · 5 years ago
Text
Age standardization
age-adjusted prevalence of certain diseases using direct method: 
. tab agestd        Age | Groups-Stan | dardization |      Freq.     Percent        Cum. ------------+-----------------------------------  18-24 yrs |     85,585       11.98       11.98  25-44 yrs |    252,055       35.29       47.28  45-64 yrs |    246,928       34.57       81.85    65+ yrs |    129,619       18.15      100.00 ------------+-----------------------------------      Total |    714,187      100.00
The second variable contains standard weights based on 2000 US population
svy,subpop (if occp==1 &  working==1): prop copd, stdize(agestd) stdweight(std_wt) svy,subpop (if occp==1 &  working==0): prop copd, stdize(agestd) stdweight(std_wt)
---------------------
To compare the two prevalence; 
svy, subpop (if occp==1) : prop copd, stdize(agestd) stdweight(std_wt), over(working)  nlcom _b[_subpop_2]/_b[_subpop_1] // Prevalence Ratio
. tab std_wt   Standard |     weight |      Freq.     Percent        Cum. ------------+-----------------------------------     .12881 |     85,585       11.98       11.98    .170271 |    129,619       18.15       30.13    .299194 |    246,928       34.57       64.71    .401725 |    252,055       35.29      100.00 ------------+-----------------------------------      Total |    714,187      100.00
0 notes
stata-haus · 5 years ago
Text
standardising vars
foreach var of varlist v1 v2 v3{ egen std`var'=std(`var') }
lincom [q75]weight-[q25]weight
0 notes
stata-haus · 6 years ago
Text
mypkg
ssc inst adjprop
ssc inst arrowplot
ssc inst asciiplot
ssc inst asdoc
ssc inst asrol
ssc inst
ssc inst astx
ssc inst balancetable
ssc inst basetable
ssc inst betafit
ssc inst bihist
ssc inst
ssc inst biplotvlab
ssc inst blindschemes
ssc inst bmjcip
ssc inst boottest
ssc inst brewscheme
ssc inst
ssc inst carryforward
ssc inst cart
ssc inst catplot
ssc inst cf2
ssc inst cf3
ssc inst
ssc inst cibar
ssc inst ciplot
ssc inst ciw
ssc inst codebook_ripper
ssc inst coefplot
ssc inst
ssc inst collin
ssc inst colorscatter
ssc inst combineplot
ssc inst concindc
ssc inst corrtable
ssc inst
ssc inst cpcorr
ssc inst crossplot
ssc inst dag
ssc inst dataex
ssc inst designplot
ssc inst
ssc inst devnplot
ssc inst diffpi
ssc inst diplot
ssc inst distinct
ssc inst dm0085_1
ssc inst
ssc inst dm88_1
ssc inst dm89_2
ssc inst dmerge
ssc inst dmout
ssc inst dta2sav
ssc inst
ssc inst eclplot
ssc inst egenmore
ssc inst ereplace
ssc inst expgen
ssc inst factortest
ssc inst
ssc inst filesearch
ssc inst findname
ssc inst fitstat
ssc inst fre
ssc inst fs
ssc inst
ssc inst ftest
ssc inst ftools
ssc inst full_palette
ssc inst g538schemes
ssc inst gciget
ssc inst
ssc inst gcode
ssc inst geivars
ssc inst genfreq
ssc inst genqreg
ssc inst genscore
ssc inst
ssc inst genstack
ssc inst geochart
ssc inst getmstatistic
ssc inst gformula
ssc inst ginidesc
ssc inst
ssc inst gllamm
ssc inst gmci
ssc inst gologit
ssc inst gologit2
ssc inst gologit29
ssc inst
ssc inst gr0001_3
ssc inst gr0002_3
ssc inst gr0033_1
ssc inst gr0054
ssc inst gr0065
ssc inst
ssc inst gr0066_1
ssc inst graph3d
ssc inst graphbinary
ssc inst grby
ssc inst grc1leg
ssc inst
ssc inst grcomb
ssc inst grcompare
ssc inst grep
ssc inst grfreq
ssc inst grlogit
ssc inst
ssc inst group_twoway
ssc inst grouplabs
ssc inst grqreg
ssc inst grstyle
ssc inst grtext
ssc inst
ssc inst gsreg
ssc inst gtools
ssc inst heckroc
ssc inst icio
ssc inst ip29_1
ssc inst
ssc inst ipdmetan
ssc inst isvar
ssc inst joinvars
ssc inst jrule
ssc inst khb
ssc inst
ssc inst kmatch
ssc inst kountry
ssc inst labsumm
ssc inst labutil
ssc inst labutil2
ssc inst
ssc inst listtab
ssc inst lrdrop1
ssc inst lstrfun
ssc inst maptile
ssc inst margprev
ssc inst
ssc inst mat2txt
ssc inst matrixtools
ssc inst metaan
ssc inst metabias
ssc inst metacum
ssc inst
ssc inst metafunnel
ssc inst metan
ssc inst metaprop
ssc inst metaprop_one
ssc inst metareg
ssc inst
ssc inst mif2dta
ssc inst moremata
ssc inst mrtab
ssc inst mulogit
ssc inst multibar
ssc inst
ssc inst multiline
ssc inst multimport
ssc inst muxplot
ssc inst muxyplot
ssc inst mvmeta
ssc inst
ssc inst mvnxpb
ssc inst mvsampsi
ssc inst mylabels
ssc inst mypkg
ssc inst nbvargr
ssc inst
ssc inst oaxaca
ssc inst outreg2
ssc inst pairplot
ssc inst palette_all
ssc inst palettes
ssc inst
ssc inst paramed
ssc inst parmest
ssc inst parplot
ssc inst partchart
ssc inst pdplot
ssc inst
ssc inst pieplot
ssc inst polychoric
ssc inst ppmlhdfe
ssc inst profileplot
ssc inst proprcspline
ssc inst
ssc inst ptrend
ssc inst putdocxcrosstab
ssc inst pvenn
ssc inst pyramid
ssc inst qcount
ssc inst
ssc inst qenv
ssc inst qic
ssc inst qregpd
ssc inst r2_mz
ssc inst rangestat
ssc inst
ssc inst reghdfe
ssc inst relyplot
ssc inst rowranks
ssc inst rsource
ssc inst rtfutil
ssc inst
ssc inst runmlwin
ssc inst runmplus
ssc inst safedrop
ssc inst savesome
ssc inst savespss
ssc inst
ssc inst sbe16_1
ssc inst sbe36_1
ssc inst sbplot5
ssc inst scandata
ssc inst scdensity
ssc inst
ssc inst scenreg
ssc inst scheme-burd
ssc inst scheme-mrc
ssc inst scheme-tfl
ssc inst scheme_scientific
ssc inst
ssc inst scheme_tufte
ssc inst scsomersd
ssc inst sencode
ssc inst sendtoslack
ssc inst seqlogit
ssc inst
ssc inst shapley
ssc inst shp2dta
ssc inst sigcoef
ssc inst simpplot
ssc inst sixplot
ssc inst
ssc inst sliceplot
ssc inst slideplot
ssc inst smileplot
ssc inst smithwelch
ssc inst smrtbl
ssc inst
ssc inst somersd
ssc inst sortl
ssc inst sortobs
ssc inst sparkline
ssc inst spikeplt
ssc inst
ssc inst spineplot
ssc inst spmap
ssc inst sqr
ssc inst st0045_2
ssc inst st0085_2
ssc inst
ssc inst st0182_1
ssc inst st0238
ssc inst st0243_1
ssc inst st0309
ssc inst st0427_1
ssc inst
ssc inst stack
ssc inst stat2data
ssc inst statacmds
ssc inst statflow
ssc inst statplot
ssc inst
ssc inst statsbyfast
ssc inst stcascoh
ssc inst stcmd
ssc inst std_beta
ssc inst stddiff
ssc inst
ssc inst storecmd
ssc inst strip
ssc inst stripplot
ssc inst subsave
ssc inst subsetplot
ssc inst
ssc inst sum2docx
ssc inst summout
ssc inst summtab
ssc inst sumstats
ssc inst sumup
ssc inst
ssc inst surloads
ssc inst surrog
ssc inst svvarlbl
ssc inst swapval
ssc inst swboot
ssc inst
ssc inst sxpose
ssc inst synth
ssc inst tab_chi
ssc inst tabcount
ssc inst table1
ssc inst
ssc inst tabout
ssc inst tabplot
ssc inst trellis
ssc inst ttable
ssc inst ttable2
ssc inst
ssc inst twitter2stata
ssc inst twoway_parea
ssc inst ulogit
ssc inst uninstall_asdoc
ssc inst unitab
ssc inst
ssc inst univar
ssc inst usesas
ssc inst usespss
ssc inst venndiagram
ssc inst vgsg
ssc inst
ssc inst wbopendata
ssc inst webimage
ssc inst wgttest
ssc inst wid
ssc inst worldstat
ssc inst
ssc inst wosload
ssc inst wtp
ssc inst xmiss
ssc inst xtdcce2134
ssc inst zscore06
0 notes
stata-haus · 6 years ago
Text
concat by var labels
egen NEWVA= concat(VAR1 VAR2), decode p(" ")
VAR1 VAR2
0 notes
stata-haus · 6 years ago
Text
Population attributable fraction
regpar   punaf punafcc 
regpar, at (smoke=0)
Tumblr media
In the real world (Scenario 0), 31.2% of babies are expected to have a low birthweight but that in the dream scenario where no mothers smoke and their races stay the same (Scenario 1), only 22.9% of babies are expected to have a low birthweight. The difference between these scenario percentages (PAR) is 8.4%, with confidence limits from 3.2% to 13.5%. The PAR can be interpreted as the proportion of all babies that have low birthweight because they were born in scenario 0 instead of in scenario 1
interpretation:  8.3% of the disease burden of low birthweight might be eliminated by eliminating maternal smoking, assuming that the racial mix stays the same, with confidence limits from 3.2% to 13.5%.
Alternatively, we might want to communicate our message to an audience of smoking mothers, who might want to know how much they could do for their children if only they quit smoking before pregnancy. To answer this, we might use regpar with a subpop() option to compute an exposed-population attributable risk for the subpopulation of smoking mothers:
regpar, at(smoke=0) subpop(if smoke==1)
Tumblr media
This time, the option subpop(if smoke==1) restricts the prediction to the subpopula- tion of smoking mothers, but scenarios 0 and 1 are defined as before. Once again, regpar displays the incomprehensible symmetric confidence intervals for the transformed pa- rameters followed by the asymmetric confidence intervals for the transformed parame- ters, which are probably more easily explained to smoking mothers. We see that the children of smoking mothers have a 40.1% prevalence of low birthweight, which might be reduced to 19.2% if their mothers quit smoking before pregnancy, while their racial mix remained the same. The difference is 21.3% with confidence limits from 7.8% to 34.1%.
Another possibility is to compare our zero-smoking dream scenario not with the intermediate world in which we live but with the nightmare scenario where all mothers started smoking. This is done by using the atzero() option, which can be used to reset scenario 0, as follows:
regpar, at(smoke=0) atzero(smoke=1)
Tumblr media
 We see that scenario 0 is set by the atzero() option to smoke=1, while scenario 1 is still smoke=0. Once again, regpar displays the symmetric confidence intervals for the transformed parameters followed by the asymmetric confidence intervals for the untransformed parameters. We see that if all mothers smoked and the racial mix stayed the same, then 45.8% of children might have low birthweight. The dream scenario prevalence, where no mothers smoke and the racial mix stays the same, is still 22.9%, as before. The difference in prevalence between the nightmare scenario 0 and the dream scenario 1 is 22.9% with confidence limits from 8.4% to 36.4%.
0 notes
stata-haus · 6 years ago
Text
Stata essential
drop *_02 *_03 *_04    *_05  *_06   *_07   *_08  *_09   *_10  *_11   *_12  *_13   *_14  *_15  *_16  *_17  *_18  *_19  *_20 ------------------------ tabout IV1-IV5 DV using ta.xls,  c( col ci)  svy stats(chi2) percent layout(row) npos(lab) f(2) mi append ---------------------- foreach v of var * { drop if missing(`v') } -------------------------------------------------- drop if mi (v1, v2) ------------------------------------------------ labvalch3 * , strfcn(proper(`"@"')) -------------------------------------------------------- capture rm "tables.xls" toxl(tmp.xls, Table 1, replace) --------------------------------------------------------
egen both = group( Age Residency ), label
----------------------------------------------------------------- egen x=group( Age Residency ) grouplabs   Age Residency, groupvar(x) val
15-19 Urban |      1,939        3.28        3.28 15-19 Rural |      4,978        8.42       11.70 20-24 Urban |      3,798        6.42       18.12 20-24 Rural |      7,878       13.33       31.45 25-29 Urban |      3,882        6.57       38.02 25-29 Rural |      7,650       12.94       50.96 30-34 Urban |      3,232        5.47       56.42 30-34 Rural |      6,209       10.50       66.92 35-39 Urban |      2,699        4.57       71.49 35-39 Rural |      4,983        8.43       79.92 40-44 Urban |      2,317        3.92       83.84 40-44 Rural |      4,150        7.02       90.86 45-49 Urban |      1,927        3.26       94.12 45-49 Rural |      3,478        5.88      100.00
egen float zbmi = std (bmi), mean (0) std (1) ----------------------------------------------------------------------------
blabel(bar, position(center) format(%3.1f))
-------------------------------------------------------------------- by id_firm id_molecule (Year), sort: gen byte wanted = (_n == 1) ------------------------------------------------------------------
graph bar (percent) yvar, by(byvar) over(groupvar, sort(order))
graph hbar (asis) v102 , over( v774b ) over( v774a ) asyvars scheme(mrc) aspect(1) blabel(bar) yla(, nogrid)
-------------------------------------------------------------------------------------------------- catplot  ChildAge if Fever ==1  , by (y) percent( y ) blabel(bar, position(center)  format(%3.1f) color(white)   ) yla(, nogrid) scheme (mrc) catplot  we y, by (pod) percent(y ) asyvars blabel(bar, position(center)  format(%3.1f) color(white)   ) yla(, nogrid) scheme (burd4) name (anct)
----------------------------------------------- capture drop A*
. capture drop v??
. capture drop v???
. capture drop v????
. capture drop v?????
. capture drop d?????
. capture drop d????
. capture drop d???
. capture keep h???
------------------------------------------------------------------------------------------------------------- http://repec.sowi.unibe.ch/stata/coefplot/markers.html
Finally used:
coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform     mlabposition(1)  mlabel(cond(@pval>.05, "+", cond(@pval<.001, "***",  cond(@pval<.01, "**",  cond(@pval<.05, "*", cond(@pval<.1, "+", ""))))))  note("+ p > .05, * p < .05, ** p < .01, *** p < .001") name (f)
. marginsplot, horizontal xline(0) yscale(reverse) recast(scatter)
catplot fever y, by (cs) blabel(bar, format(%4.1f) pos(top)) name (d, replace) catplot fever, by (y) blabel(bar, format(%4.1f) pos(top)) name (d1) vertical gr combine d d1
catplot cough y, by (cs) blabel(bar, format(%4.1f) pos(top)) name  (dd, replace) catplot cough, by (y) blabel(bar, format(%4.1f) pos(top)) name (dd1) vertical gr combine dd dd1
catplot rb y, by (cs) blabel(bar, format(%4.1f) pos(top)) name (ddd, replace) catplot rb, by (y) blabel(bar, format(%4.1f) pos(top)) name (ddd1) vertical gr combine ddd ddd1
. ta fever y if cs==1, col nofreq . ta fever y if cs==2, col nofreq . ta fever y, col nofreq
. ta rb y if cs==1, col nofreq . ta rb y if cs==2, col nofreq . ta rb y, col nofreq
logistic fever i.y  i.chage  i.bo  i.bfeed  i.mage  i.edu i.occ  i.bmi i.lcwanted i.parity if cs==1 eststo fvm
logistic cough i.y  if cs==1 eststo cm
coefplot, xline(0) drop(_cons) omitted baselevels   graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick)   headings(1995.y= "Year"  2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage= "Mother's age"  0.edu= "Mother's education" 1.occ= "Mother's occupation"  0.bmi= "Maternal BMI" 1.lcwanted= "Child was wanted" 2.parity= "Parity"  0.bmi = "bf:Body mass Index"  , labcolor(orange) labgap(-130)) name (fvm)
logistic fever i.y  i.chage  i.bo  i.bfeed  i.mage  i.edu i.occ  i.bmi i.lcwanted i.parity if cs==2 eststo fvm
coefplot, xline(0) drop(_cons) omitted baselevels   graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick)   headings(1995.y= "Year"  2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage= "Mother's age"  0.edu= "Mother's education" 1.occ= "Mother's occupation"  0.bmi= "Maternal BMI" 1.lcwanted= "Child was wanted" 2.parity= "Parity"  0.bmi = "bf:Body mass Index"  , labcolor(orange) labgap(-130)) name (fvf)
coefplot, xline(0) drop(_cons) omitted baselevels   graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick)   headings(1995.y= "Year"  2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage="Mother's age"   0.edu= "Mother's education" 1.occ= "Mother's occupation"  0.bmi= "Maternal BMI" 1.lcwanted="Child was wanted" 2.parity= "Parity"  0.bmi = "bf:Body mass Index", labsize(vsmall)   labcolor(orange) labgap(-130)) ylab(, labs(vsmall))
coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform  name(fv) coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform  name(fv) coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform  name(fv)
coefplot, xline(0) drop(headroom _cons) omitted baselevels   graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick)   headings(1.age = "{bf:MATERNAL age}"   2.parity = "{bf:Total births}" 3.edu = "{bf:MEDU}" 0.bmi = "{bf:BODY mass}"  , labcolor(orange) labgap(-130))
coefplot, xline(0) drop(_cons) omitted baselevels   graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick)   headings(1995.y= "Year"  2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage= "Mother's age"  0.edu= "Mother's education" 1.occ= "Mother's occupation"  0.bmi= "Maternal BMI" 1.lcwanted= "Child was wanted" 2.parity= "Parity"  0.bmi = "bf:Body mass Index"  , labcolor(orange) labgap(-130))  
Avec p-value numbered
coefplot, xline(0) mlabposition(1) mlabgap(*2)  mlabel("{it:p} = " + string(@pval,"%9.3f"))
Avec p-value as * coefplot, xline(1) eform  mlabposition(1)  mlabel(cond(@pval>.05, "+", cond(@pval<.001, "***",  cond(@pval<.01, "**",  cond(@pval<.05, "*", cond(@pval<.1, "+", ""))))))  note("+ p > .05, * p < .05, ** p < .01, *** p < .001")
Avec p-value as +/-/No association
weight: coefplot, weight(1/@se) ms(oh) drop(_cons) xline(0)
0 notes
stata-haus · 6 years ago
Text
string to numeric
egen(var)=group(newvar)
0 notes