stata-haus - Tumblr blog

stata-haus · 5 years ago

Text

split variable with multiple values

https://www.statalist.org/forums/forum/general-stata-discussion/general/1566170-how-do-you-deal-with-multi-value-cells-in-data-cleaning

split country, gen(land) parse(,)

OR

preserve split COUNTRY, parse(",") gen(c) keep c* name reshape long c, i(name) j(_j) tab c restore

0 notes

stata-haus · 5 years ago

Text

community variable

egen hhwealth= xtile(v191), by(psu) nq(3)

0 notes

stata-haus · 5 years ago

Text

replacing/deleting parts of colname

names(x)<-str_replace_all(names(x), c(" " = "." , "," = "" ))

names(df)<-str_replace_all(names(df), c("neighbourhood" = ""))

0 notes

stata-haus · 5 years ago

Text

batch append file

clear local filelist : dir "C:\Users\Administrator\Desktop\gam" files "*" foreach file of local filelist { append using "`file'", keep(v000) }

------------------------------------------------

use data1, clear foreach num of numlist 2/30 { append using data`num' }

0 notes

stata-haus · 5 years ago

Text

saving ORs for each id

statsby bL=_b[v201] , by(id v190 ): regress v106 v190 v201 v151 , vce(robust)

https://stackoverflow.com/questions/51657443/regression-loop-and-store-specific-coefficient-in-new-dataset-stata

0 notes

stata-haus · 5 years ago

Text

loop regression

foreach x in v1 v2 v3 v4 { logit `x' i.v5 }

0 notes

stata-haus · 5 years ago

Text

GLM for ORs and RRs

ORs(same as logit):

glm DV i.V1, family(binomial) link(logit) eform

RRs:

glm DV i.V1, family(binomial) link(log) eform

0 notes

stata-haus · 5 years ago

Text

DV recode

g ev=. replace ev=0 if d103a==0 & d103b==0 & d103c==0 & d104==0 replace ev=1 if d103a>=1 | d103b>=1 | d103c>=1 | d104>=1 replace ev=. if d103a==. | d103b==. | d103c==. | d104==.

g pv=. replace pv=0 if d105a==0 & d105b==0 & d105c==0 & d105d==0 & d105e==0 & d105f==0 & d105j==0 replace pv=1 if d105a>=1 | d105b>=1 | d105c>=1 | d105d>=1 | d105e>=1 | d105f>=1 | d105j>=1 replace pv=. if d105a==. | d105b==. | d105c==. | d105d==. | d105e==. | d105f==. | d105j==.

g sv=. replace sv=0 if d105h==0 & d108==0 replace sv=1 if d105h>=1 | d108>=1 replace sv=. if d105h==. | d108==.

g ipv=. replace ipv=0 if ev==0 & pv==0 & sv==0 replace ipv=1 if ev==1 | pv==1 | sv==1 replace ipv=. if ev==. | pv==. | sv==.

lab def ipv 0 "No" 1 "Yes" lab val ipv ipv ta ipv

0 notes

stata-haus · 5 years ago

Text

DDS

recode v410-v414v (8=0)

g grn_rt_tbr=. replace grn_rt_tbr=1 if v412a== 1 | v414e== 1 | v414f== 1 replace grn_rt_tbr=0 if v412a== 0 & v414e== 0 & v414f== 0

g legm_nut=. replace legm_nut=1 if v414o== 1 replace legm_nut=0 if v414o ==0

g dairy=. replace dairy=1 if v411== 1 | v411a== 1 | v414v== 1 | v414p== 1 replace dairy=0 if v411== 0 & v411a== 0 & v414v== 0 & v414p== 0

g flsh=. replace flsh=1 if v414h== 1 | v414m== 1 | v414n== 1 replace flsh=0 if v414h== 0 & v414m== 0 & v414n== 0

g egg=. replace egg=1 if v414g== 1 replace egg=0 if v414g==0

g vt_A_frt=. replace vt_A_frt=1 if v414i== 1 | v414j==1 | v414k== 1 replace vt_A_frt=0 if v414i== 0 & v414j==0 & v414k== 0

g fnv=. replace fnv=1 if v414l==1 replace fnv=0 if v414l==0

g dd= grn_rt_tbr+legm_nut+dairy+flsh+egg+vt_A_frt+fnv ta dd

recode dd (min/3=0 "No")(4/max=1 "Yes"), g(dds) ta dds

0 notes

stata-haus · 5 years ago

Text

blabel(group, position(base) color(bg))

-------------------------------------------------

0 notes

stata-haus · 5 years ago

Text

Age standardization

age-adjusted prevalence of certain diseases using direct method:

. tab agestd Age | Groups-Stan | dardization | Freq. Percent Cum. ------------+----------------------------------- 18-24 yrs | 85,585 11.98 11.98 25-44 yrs | 252,055 35.29 47.28 45-64 yrs | 246,928 34.57 81.85 65+ yrs | 129,619 18.15 100.00 ------------+----------------------------------- Total | 714,187 100.00

The second variable contains standard weights based on 2000 US population

svy,subpop (if occp==1 & working==1): prop copd, stdize(agestd) stdweight(std_wt) svy,subpop (if occp==1 & working==0): prop copd, stdize(agestd) stdweight(std_wt)

---------------------

To compare the two prevalence;

svy, subpop (if occp==1) : prop copd, stdize(agestd) stdweight(std_wt), over(working) nlcom _b[_subpop_2]/_b[_subpop_1] // Prevalence Ratio

. tab std_wt Standard | weight | Freq. Percent Cum. ------------+----------------------------------- .12881 | 85,585 11.98 11.98 .170271 | 129,619 18.15 30.13 .299194 | 246,928 34.57 64.71 .401725 | 252,055 35.29 100.00 ------------+----------------------------------- Total | 714,187 100.00

0 notes

stata-haus · 5 years ago

Text

standardising vars

foreach var of varlist v1 v2 v3{ egen std`var'=std(`var') }

lincom [q75]weight-[q25]weight

0 notes

stata-haus · 6 years ago

Text

mypkg

ssc inst adjprop

ssc inst arrowplot

ssc inst asciiplot

ssc inst asdoc

ssc inst asrol

ssc inst

ssc inst astx

ssc inst balancetable

ssc inst basetable

ssc inst betafit

ssc inst bihist

ssc inst

ssc inst biplotvlab

ssc inst blindschemes

ssc inst bmjcip

ssc inst boottest

ssc inst brewscheme

ssc inst

ssc inst carryforward

ssc inst cart

ssc inst catplot

ssc inst cf2

ssc inst cf3

ssc inst

ssc inst cibar

ssc inst ciplot

ssc inst ciw

ssc inst codebook_ripper

ssc inst coefplot

ssc inst

ssc inst collin

ssc inst colorscatter

ssc inst combineplot

ssc inst concindc

ssc inst corrtable

ssc inst

ssc inst cpcorr

ssc inst crossplot

ssc inst dag

ssc inst dataex

ssc inst designplot

ssc inst

ssc inst devnplot

ssc inst diffpi

ssc inst diplot

ssc inst distinct

ssc inst dm0085_1

ssc inst

ssc inst dm88_1

ssc inst dm89_2

ssc inst dmerge

ssc inst dmout

ssc inst dta2sav

ssc inst

ssc inst eclplot

ssc inst egenmore

ssc inst ereplace

ssc inst expgen

ssc inst factortest

ssc inst

ssc inst filesearch

ssc inst findname

ssc inst fitstat

ssc inst fre

ssc inst fs

ssc inst

ssc inst ftest

ssc inst ftools

ssc inst full_palette

ssc inst g538schemes

ssc inst gciget

ssc inst

ssc inst gcode

ssc inst geivars

ssc inst genfreq

ssc inst genqreg

ssc inst genscore

ssc inst

ssc inst genstack

ssc inst geochart

ssc inst getmstatistic

ssc inst gformula

ssc inst ginidesc

ssc inst

ssc inst gllamm

ssc inst gmci

ssc inst gologit

ssc inst gologit2

ssc inst gologit29

ssc inst

ssc inst gr0001_3

ssc inst gr0002_3

ssc inst gr0033_1

ssc inst gr0054

ssc inst gr0065

ssc inst

ssc inst gr0066_1

ssc inst graph3d

ssc inst graphbinary

ssc inst grby

ssc inst grc1leg

ssc inst

ssc inst grcomb

ssc inst grcompare

ssc inst grep

ssc inst grfreq

ssc inst grlogit

ssc inst

ssc inst group_twoway

ssc inst grouplabs

ssc inst grqreg

ssc inst grstyle

ssc inst grtext

ssc inst

ssc inst gsreg

ssc inst gtools

ssc inst heckroc

ssc inst icio

ssc inst ip29_1

ssc inst

ssc inst ipdmetan

ssc inst isvar

ssc inst joinvars

ssc inst jrule

ssc inst khb

ssc inst

ssc inst kmatch

ssc inst kountry

ssc inst labsumm

ssc inst labutil

ssc inst labutil2

ssc inst

ssc inst listtab

ssc inst lrdrop1

ssc inst lstrfun

ssc inst maptile

ssc inst margprev

ssc inst

ssc inst mat2txt

ssc inst matrixtools

ssc inst metaan

ssc inst metabias

ssc inst metacum

ssc inst

ssc inst metafunnel

ssc inst metan

ssc inst metaprop

ssc inst metaprop_one

ssc inst metareg

ssc inst

ssc inst mif2dta

ssc inst moremata

ssc inst mrtab

ssc inst mulogit

ssc inst multibar

ssc inst

ssc inst multiline

ssc inst multimport

ssc inst muxplot

ssc inst muxyplot

ssc inst mvmeta

ssc inst

ssc inst mvnxpb

ssc inst mvsampsi

ssc inst mylabels

ssc inst mypkg

ssc inst nbvargr

ssc inst

ssc inst oaxaca

ssc inst outreg2

ssc inst pairplot

ssc inst palette_all

ssc inst palettes

ssc inst

ssc inst paramed

ssc inst parmest

ssc inst parplot

ssc inst partchart

ssc inst pdplot

ssc inst

ssc inst pieplot

ssc inst polychoric

ssc inst ppmlhdfe

ssc inst profileplot

ssc inst proprcspline

ssc inst

ssc inst ptrend

ssc inst putdocxcrosstab

ssc inst pvenn

ssc inst pyramid

ssc inst qcount

ssc inst

ssc inst qenv

ssc inst qic

ssc inst qregpd

ssc inst r2_mz

ssc inst rangestat

ssc inst

ssc inst reghdfe

ssc inst relyplot

ssc inst rowranks

ssc inst rsource

ssc inst rtfutil

ssc inst

ssc inst runmlwin

ssc inst runmplus

ssc inst safedrop

ssc inst savesome

ssc inst savespss

ssc inst

ssc inst sbe16_1

ssc inst sbe36_1

ssc inst sbplot5

ssc inst scandata

ssc inst scdensity

ssc inst

ssc inst scenreg

ssc inst scheme-burd

ssc inst scheme-mrc

ssc inst scheme-tfl

ssc inst scheme_scientific

ssc inst

ssc inst scheme_tufte

ssc inst scsomersd

ssc inst sencode

ssc inst sendtoslack

ssc inst seqlogit

ssc inst

ssc inst shapley

ssc inst shp2dta

ssc inst sigcoef

ssc inst simpplot

ssc inst sixplot

ssc inst

ssc inst sliceplot

ssc inst slideplot

ssc inst smileplot

ssc inst smithwelch

ssc inst smrtbl

ssc inst

ssc inst somersd

ssc inst sortl

ssc inst sortobs

ssc inst sparkline

ssc inst spikeplt

ssc inst

ssc inst spineplot

ssc inst spmap

ssc inst sqr

ssc inst st0045_2

ssc inst st0085_2

ssc inst

ssc inst st0182_1

ssc inst st0238

ssc inst st0243_1

ssc inst st0309

ssc inst st0427_1

ssc inst

ssc inst stack

ssc inst stat2data

ssc inst statacmds

ssc inst statflow

ssc inst statplot

ssc inst

ssc inst statsbyfast

ssc inst stcascoh

ssc inst stcmd

ssc inst std_beta

ssc inst stddiff

ssc inst

ssc inst storecmd

ssc inst strip

ssc inst stripplot

ssc inst subsave

ssc inst subsetplot

ssc inst

ssc inst sum2docx

ssc inst summout

ssc inst summtab

ssc inst sumstats

ssc inst sumup

ssc inst

ssc inst surloads

ssc inst surrog

ssc inst svvarlbl

ssc inst swapval

ssc inst swboot

ssc inst

ssc inst sxpose

ssc inst synth

ssc inst tab_chi

ssc inst tabcount

ssc inst table1

ssc inst

ssc inst tabout

ssc inst tabplot

ssc inst trellis

ssc inst ttable

ssc inst ttable2

ssc inst

ssc inst twitter2stata

ssc inst twoway_parea

ssc inst ulogit

ssc inst uninstall_asdoc

ssc inst unitab

ssc inst

ssc inst univar

ssc inst usesas

ssc inst usespss

ssc inst venndiagram

ssc inst vgsg

ssc inst

ssc inst wbopendata

ssc inst webimage

ssc inst wgttest

ssc inst wid

ssc inst worldstat

ssc inst

ssc inst wosload

ssc inst wtp

ssc inst xmiss

ssc inst xtdcce2134

ssc inst zscore06

0 notes

stata-haus · 6 years ago

Text

concat by var labels

egen NEWVA= concat(VAR1 VAR2), decode p(" ")

VAR1 VAR2

0 notes

stata-haus · 6 years ago

Text

Population attributable fraction

regpar punaf punafcc

regpar, at (smoke=0)

In the real world (Scenario 0), 31.2% of babies are expected to have a low birthweight but that in the dream scenario where no mothers smoke and their races stay the same (Scenario 1), only 22.9% of babies are expected to have a low birthweight. The difference between these scenario percentages (PAR) is 8.4%, with confidence limits from 3.2% to 13.5%. The PAR can be interpreted as the proportion of all babies that have low birthweight because they were born in scenario 0 instead of in scenario 1

interpretation: 8.3% of the disease burden of low birthweight might be eliminated by eliminating maternal smoking, assuming that the racial mix stays the same, with confidence limits from 3.2% to 13.5%.

Alternatively, we might want to communicate our message to an audience of smoking mothers, who might want to know how much they could do for their children if only they quit smoking before pregnancy. To answer this, we might use regpar with a subpop() option to compute an exposed-population attributable risk for the subpopulation of smoking mothers:

regpar, at(smoke=0) subpop(if smoke==1)

This time, the option subpop(if smoke==1) restricts the prediction to the subpopula- tion of smoking mothers, but scenarios 0 and 1 are defined as before. Once again, regpar displays the incomprehensible symmetric confidence intervals for the transformed pa- rameters followed by the asymmetric confidence intervals for the transformed parame- ters, which are probably more easily explained to smoking mothers. We see that the children of smoking mothers have a 40.1% prevalence of low birthweight, which might be reduced to 19.2% if their mothers quit smoking before pregnancy, while their racial mix remained the same. The difference is 21.3% with confidence limits from 7.8% to 34.1%.

Another possibility is to compare our zero-smoking dream scenario not with the intermediate world in which we live but with the nightmare scenario where all mothers started smoking. This is done by using the atzero() option, which can be used to reset scenario 0, as follows:

regpar, at(smoke=0) atzero(smoke=1)

We see that scenario 0 is set by the atzero() option to smoke=1, while scenario 1 is still smoke=0. Once again, regpar displays the symmetric confidence intervals for the transformed parameters followed by the asymmetric confidence intervals for the untransformed parameters. We see that if all mothers smoked and the racial mix stayed the same, then 45.8% of children might have low birthweight. The dream scenario prevalence, where no mothers smoke and the racial mix stays the same, is still 22.9%, as before. The difference in prevalence between the nightmare scenario 0 and the dream scenario 1 is 22.9% with confidence limits from 8.4% to 36.4%.

0 notes

stata-haus · 6 years ago

Text

Stata essential

drop *_02 *_03 *_04 *_05 *_06 *_07 *_08 *_09 *_10 *_11 *_12 *_13 *_14 *_15 *_16 *_17 *_18 *_19 *_20 ------------------------ tabout IV1-IV5 DV using ta.xls, c( col ci) svy stats(chi2) percent layout(row) npos(lab) f(2) mi append ---------------------- foreach v of var * { drop if missing(`v') } -------------------------------------------------- drop if mi (v1, v2) ------------------------------------------------ labvalch3 * , strfcn(proper(`"@"')) -------------------------------------------------------- capture rm "tables.xls" toxl(tmp.xls, Table 1, replace) --------------------------------------------------------

egen both = group( Age Residency ), label

----------------------------------------------------------------- egen x=group( Age Residency ) grouplabs Age Residency, groupvar(x) val

egen float zbmi = std (bmi), mean (0) std (1) ----------------------------------------------------------------------------

blabel(bar, position(center) format(%3.1f))

-------------------------------------------------------------------- by id_firm id_molecule (Year), sort: gen byte wanted = (_n == 1) ------------------------------------------------------------------

graph bar (percent) yvar, by(byvar) over(groupvar, sort(order))

graph hbar (asis) v102 , over( v774b ) over( v774a ) asyvars scheme(mrc) aspect(1) blabel(bar) yla(, nogrid)

-------------------------------------------------------------------------------------------------- catplot ChildAge if Fever ==1 , by (y) percent( y ) blabel(bar, position(center) format(%3.1f) color(white) ) yla(, nogrid) scheme (mrc) catplot we y, by (pod) percent(y ) asyvars blabel(bar, position(center) format(%3.1f) color(white) ) yla(, nogrid) scheme (burd4) name (anct)

----------------------------------------------- capture drop A*

. capture drop v??

. capture drop v???

. capture drop v????

. capture drop v?????

. capture drop d?????

. capture drop d????

. capture drop d???

. capture keep h???

------------------------------------------------------------------------------------------------------------- http://repec.sowi.unibe.ch/stata/coefplot/markers.html

Finally used:

coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform mlabposition(1) mlabel(cond(@pval>.05, "+", cond(@pval<.001, "***", cond(@pval<.01, "**", cond(@pval<.05, "*", cond(@pval<.1, "+", "")))))) note("+ p > .05, * p < .05, ** p < .01, *** p < .001") name (f)

. marginsplot, horizontal xline(0) yscale(reverse) recast(scatter)

catplot fever y, by (cs) blabel(bar, format(%4.1f) pos(top)) name (d, replace) catplot fever, by (y) blabel(bar, format(%4.1f) pos(top)) name (d1) vertical gr combine d d1

catplot cough y, by (cs) blabel(bar, format(%4.1f) pos(top)) name (dd, replace) catplot cough, by (y) blabel(bar, format(%4.1f) pos(top)) name (dd1) vertical gr combine dd dd1

catplot rb y, by (cs) blabel(bar, format(%4.1f) pos(top)) name (ddd, replace) catplot rb, by (y) blabel(bar, format(%4.1f) pos(top)) name (ddd1) vertical gr combine ddd ddd1

. ta fever y if cs==1, col nofreq . ta fever y if cs==2, col nofreq . ta fever y, col nofreq

. ta rb y if cs==1, col nofreq . ta rb y if cs==2, col nofreq . ta rb y, col nofreq

logistic fever i.y i.chage i.bo i.bfeed i.mage i.edu i.occ i.bmi i.lcwanted i.parity if cs==1 eststo fvm

logistic cough i.y if cs==1 eststo cm

coefplot, xline(0) drop(_cons) omitted baselevels graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick) headings(1995.y= "Year" 2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage= "Mother's age" 0.edu= "Mother's education" 1.occ= "Mother's occupation" 0.bmi= "Maternal BMI" 1.lcwanted= "Child was wanted" 2.parity= "Parity" 0.bmi = "bf:Body mass Index" , labcolor(orange) labgap(-130)) name (fvm)

logistic fever i.y i.chage i.bo i.bfeed i.mage i.edu i.occ i.bmi i.lcwanted i.parity if cs==2 eststo fvm

coefplot, xline(0) drop(_cons) omitted baselevels graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick) headings(1995.y= "Year" 2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage= "Mother's age" 0.edu= "Mother's education" 1.occ= "Mother's occupation" 0.bmi= "Maternal BMI" 1.lcwanted= "Child was wanted" 2.parity= "Parity" 0.bmi = "bf:Body mass Index" , labcolor(orange) labgap(-130)) name (fvf)

coefplot, xline(0) drop(_cons) omitted baselevels graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick) headings(1995.y= "Year" 2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage="Mother's age" 0.edu= "Mother's education" 1.occ= "Mother's occupation" 0.bmi= "Maternal BMI" 1.lcwanted="Child was wanted" 2.parity= "Parity" 0.bmi = "bf:Body mass Index", labsize(vsmall) labcolor(orange) labgap(-130)) ylab(, labs(vsmall))

coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform name(fv) coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform name(fv) coefplot (fvm, label(Male)) (fvf, label(Femle)), drop(_cons) xline(1) eform name(fv)

coefplot, xline(0) drop(headroom _cons) omitted baselevels graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick) headings(1.age = "{bf:MATERNAL age}" 2.parity = "{bf:Total births}" 3.edu = "{bf:MEDU}" 0.bmi = "{bf:BODY mass}" , labcolor(orange) labgap(-130))

coefplot, xline(0) drop(_cons) omitted baselevels graphregion(margin(l=65)) yscale(alt noline) coeflabels(, labgap(-125) notick) headings(1995.y= "Year" 2.chage= "Age of Child" 0.bfeed= "Being Breastfed" 2.mage= "Mother's age" 0.edu= "Mother's education" 1.occ= "Mother's occupation" 0.bmi= "Maternal BMI" 1.lcwanted= "Child was wanted" 2.parity= "Parity" 0.bmi = "bf:Body mass Index" , labcolor(orange) labgap(-130))

Avec p-value numbered

coefplot, xline(0) mlabposition(1) mlabgap(*2) mlabel("{it:p} = " + string(@pval,"%9.3f"))

Avec p-value as * coefplot, xline(1) eform mlabposition(1) mlabel(cond(@pval>.05, "+", cond(@pval<.001, "***", cond(@pval<.01, "**", cond(@pval<.05, "*", cond(@pval<.1, "+", "")))))) note("+ p > .05, * p < .05, ** p < .01, *** p < .001")

Avec p-value as +/-/No association

weight: coefplot, weight(1/@se) ms(oh) drop(_cons) xline(0)

0 notes

stata-haus · 6 years ago

Text

string to numeric

egen(var)=group(newvar)

0 notes