phillust-blog - Tumblr blog

phillust-blog · 8 years ago

Text

Data Analysis Tools Week 4 Assignment

I'm looking into whether the region someone lives in moderates someone's depression and associated thoughts of suicide. The output shows that region is a significant moderator to the interaction between depression and associated suicidal ideation as all p-values are significant.

Code:

# -*- coding: utf-8 -*- """ @author: Phil Lust """

import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi

#Data Read in data = pandas.read_csv('nesarc_pds.csv', low_memory=False)

#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)

#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)

# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)

#set variables of interest to numeric data['S4AQ4A17'] = pandas.to_numeric(data['S4AQ4A17'], errors='coerce') data['S1Q213'] = pandas.to_numeric(data['S1Q213'], errors='coerce') data['REGION'] = pandas.to_numeric(data['REGION'], errors='coerce')

#Replacing unknown values in the dataframe data['S4AQ4A17']=data['S4AQ4A17'].replace(9,numpy.nan) data['S1Q213']=data['S1Q213'].replace(9,numpy.nan)

#Recode of values for logical consistency recode1 = {1:1, 2:0} data['S4AQ4A17'] = data['S4AQ4A17'].map(recode1) recode2 = {1:4, 2:3, 3:2, 4:1, 5:0} data['S1Q213'] = data['S1Q213'].map(recode2)

#data subsets for moderator exploration of Region sub1 = data[(data['REGION']==1)] sub2 = data[(data['REGION']==2)] sub3 = data[(data['REGION']==3)] sub4 = data[(data['REGION']==4)]

#crosstab table of observed observations ct1=pandas.crosstab(sub1['S4AQ4A17'], sub1['S1Q213']) print (ct1)

#column percentages colsum1 = ct1.sum(axis=0) colpct1 = ct1/colsum1 print (colpct1)

#chi-square on Northwest moderator print('association between depression and suicidal thoughts in the Northwest') cs1 = scipy.stats.chi2_contingency(ct1) print(cs1)

#crosstab table of observed observations ct2=pandas.crosstab(sub2['S4AQ4A17'], sub2['S1Q213']) print (ct2)

#column percentages colsum2 = ct2.sum(axis=0) colpct2 = ct2/colsum2 print (colpct2)

#chi-square on Midwest moderator print('association between depression and suicidal thoughts in the Midwest') cs2 = scipy.stats.chi2_contingency(ct2) print(cs2)

#crosstab table of observed observations ct3=pandas.crosstab(sub3['S4AQ4A17'], sub3['S1Q213']) print (ct3)

#column percentages colsum3 = ct3.sum(axis=0) colpct3 = ct3/colsum3 print (colpct3)

#chi-square on South moderator print('association between depression and suicidal thoughts in the South') cs3 = scipy.stats.chi2_contingency(ct3) print(cs3)

#crosstab table of observed observations ct4=pandas.crosstab(sub4['S4AQ4A17'], sub4['S1Q213']) print (ct4)

#column percentages colsum4 = ct4.sum(axis=0) colpct4 = ct4/colsum4 print (colpct4)

#chi-square on West moderator print('association between depression and suicidal thoughts in the West') cs4 = scipy.stats.chi2_contingency(ct4) print(cs4)

Output:

S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 672 616 448 152 53 1.000000 131 236 179 104 49 S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 0.836862 0.723005 0.714514 0.593750 0.519608 1.000000 0.163138 0.276995 0.285486 0.406250 0.480392 association between depression and suicidal thoughts in the Northwest (95.327124345434868, 9.7091644408750795e-20, 4, array([[ 590.3875 , 626.41363636, 460.9875 , 188.21818182, 74.99318182], [ 212.6125 , 225.58636364, 166.0125 , 67.78181818, 27.00681818]]))

S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 852 759 426 181 59 1.000000 160 277 216 105 43 S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 0.841897 0.732625 0.663551 0.632867 0.578431 1.000000 0.158103 0.267375 0.336449 0.367133 0.421569 association between depression and suicidal thoughts in the Midwest (105.24586010821909, 7.507621975326203e-22, 4, array([[ 748.64327485, 766.39766082, 474.92982456, 211.57309942, 75.45614035], [ 263.35672515, 269.60233918, 167.07017544, 74.42690058, 26.54385965]]))

S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 1358 1062 796 341 155 1.000000 233 330 298 188 115 S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 0.853551 0.762931 0.727605 0.644612 0.574074 1.000000 0.146449 0.237069 0.272395 0.355388 0.425926 association between depression and suicidal thoughts in the South (173.07239474939345, 2.2908395479186054e-36, 4, array([[ 1211.19606235, 1059.70139459, 832.84003281, 402.71698113, 205.54552912], [ 379.80393765, 332.29860541, 261.15996719, 126.28301887, 64.45447088]]))

S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 734 745 458 165 49 1.000000 194 315 225 115 48 S1Q213 0.000000 1.000000 2.000000 3.000000 4.000000 S4AQ4A17 0.000000 0.790948 0.702830 0.670571 0.589286 0.505155 1.000000 0.209052 0.297170 0.329429 0.410714 0.494845 association between depression and suicidal thoughts in the West (73.628402483107678, 3.8855318766982616e-15, 4, array([[ 654.8976378 , 748.0511811 , 481.99901575, 197.5984252 , 68.45374016], [ 273.1023622 , 311.9488189 , 201.00098425, 82.4015748 , 28.54625984]]))

0 notes

phillust-blog · 8 years ago

Text

Data Analysis Tools Week3 Assignment

I made a small adjustment this week to allow for calculation of a correlation coefficient. My selected data elements only has categorical data. I looked into the relationship between quantity of alcoholic beverages consumed and depression rates. I found that the correlation factor is .087 and a p-value of .055

As the p-value is not below .05, we fail to reject the null-hypothesis.

Output:

association between alcohol use and depression (0.08677304304640121, 0.055418860920723026)

Code Follows:

# -*- coding: utf-8 -*- """ Created on Fri Oct 27 19:01:29 2017

@author: 585441 """

# -*- coding: utf-8 -*- """ Created on Sat Oct 7 17:29:57 2017

@author: Phil Lust """

import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi

data = pandas.read_csv('nesarc_pds.csv', low_memory=False)

#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)

#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)

# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)

# Converts data entries to number format, correct python data read in errors data["S2AQ22"] = pandas.to_numeric(data["S2AQ22"], errors='coerce') data["S1Q213"] = pandas.to_numeric(data["S1Q213"], errors='coerce')

#Replacing unknown values in the dataframe data['S2AQ22']=data['S2AQ22'].replace(99,numpy.nan) data['S1Q213']=data['S1Q213'].replace(9,numpy.nan)

#recode alcohol use from low to high recode1 = {11:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10} data['S2AQ22'] = data['S2AQ22'].map(recode1)

#recode depression from low to high recode2 = {5:0, 1:1, 2:2, 3:3, 4:4} data['S1Q213'] = data['S1Q213'].map(recode2)

#drop blank and nan entries to enable correlation calculation data_clean=data.dropna()

#Correlation calculation command print ('association between alcohol use and depression') print (scipy.stats.pearsonr(data_clean['S2AQ22'], data_clean['S1Q213']))

0 notes

phillust-blog · 8 years ago

Text

Data Analysis Tools Week2 Assignment

I'm looking at the relationship between suicidal thoughts [S4AQ4A17] and seeking medical help [S4AQ16] from the NESARC data set, both are two level categorical data entries. The Chi-squared test yielded a p-value of 2.43 x 10^-93, effectively 0. This means that we reject the null hypothesis that there is no relationship between suicidal ideation and seeking help; and except the alternate that there is a relationship. Since both variable are only 2 levels there is no need for a post-hoc test to determine the pairwise relationship.

Output from code:

S4AQ4A17 1.000000 2.000000 S4AQ16 1.000000 2135 2401 2.000000 1098 3138 S4AQ4A17 1.000000 2.000000 S4AQ16 1.000000 0.660377 0.433472 2.000000 0.339623 0.566528 chi-square value, p value, expected counts (420.00948971287778, 2.4287382795898561e-93, 1, array([[ 1671.78385773, 2864.21614227], [ 1561.21614227, 2674.78385773]]))

The code follows:

import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi

data = pandas.read_csv('nesarc_pds.csv', low_memory=False)

#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)

#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)

# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)

# Converts data entries to number format, correct python data read in errors data["S4AQ16"] = pandas.to_numeric(data["S4AQ16"], errors='coerce') data["S4AQ4A17"] = pandas.to_numeric(data["S4AQ4A17"], errors='coerce')

#Replacing unknown values in the dataframe data['S4AQ16']=data['S4AQ16'].replace(9,numpy.nan) data['S4AQ4A17']=data['S4AQ4A17'].replace(9,numpy.nan)

#contingency table of observed results ct1=pandas.crosstab(data['S4AQ16'], data['S4AQ4A17']) print (ct1)

#column percentages colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)

#Chi-square print('chi-square value, p value, expected counts') cs1 = scipy.stats.chi2_contingency(ct1) print (cs1)

0 notes

phillust-blog · 8 years ago

Text

Data Analysis Tools Week 1 Assignment

My study questions does not have any quantitative variables so I did an ANOVA on age of alcohol dependence and house hold income.

What the ANOVA resulted in a relationship between the age of alcohol dependence and house hold income (p=0.0146). However the Tukey test showed that there are no differences when taken in a pairwise comparison.

The program I used is as follows:

import pandas import numpy import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi

data = pandas.read_csv('nesarc_pds.csv', low_memory=False)

#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)

#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)

# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)

# Converts data entries to number format, correct python data read in errors data["S1Q12B"] = data["S1Q12B"].convert_objects(convert_numeric=True) data["S2BQ2D"] = data["S2BQ2D"].convert_objects(convert_numeric=True) data["S2BQ2FR"] = data["S2BQ2FR"].convert_objects(convert_numeric=True) data["ALCABDEPP12DX"] = data["ALCABDEPP12DX"].convert_objects(convert_numeric=True)

#Replacing unknown values in the dataframe data['S2BQ2D']=data['S2BQ2D'].replace(99,numpy.nan) data['S2BQ2FR']=data['S2BQ2FR'].replace(999,numpy.nan)

#subset of data to only include those who have had alcohol dependence in the #last 12 months sub1 = data[(data['ALCABDEPP12DX']!=0)]

model1 = smf.ols(formula='S2BQ2D ~ C(S1Q12B)', data=sub1) results1 = model1.fit() print (results1.summary())

mc1 = multi.MultiComparison(sub1['S2BQ2D'], sub1['S1Q12B']) res1 = mc1.tukeyhsd() print (res1.summary())

The output is:

OLS Regression Results ============================================================================== Dep. Variable: S2BQ2D R-squared: 0.008 Model: OLS Adj. R-squared: 0.003 Method: Least Squares F-statistic: 1.813 Date: Thu, 02 Nov 2017 Prob (F-statistic): 0.0146 Time: 18:22:33 Log-Likelihood: -16996. No. Observations: 4671 AIC: 3.403e+04 Df Residuals: 4650 BIC: 3.417e+04 Df Model: 20 Covariance Type: nonrobust =================================================================================== coef std err t P>|t| [0.025 0.975] ----------------------------------------------------------------------------------- Intercept 24.9000 0.780 31.937 0.000 23.372 26.428 C(S1Q12B)[T.2] 1.8708 1.025 1.825 0.068 -0.139 3.881 C(S1Q12B)[T.3] 1.5815 1.113 1.421 0.155 -0.600 3.763 C(S1Q12B)[T.4] 0.4700 1.017 0.462 0.644 -1.523 2.463 C(S1Q12B)[T.5] -1.0440 1.135 -0.920 0.358 -3.270 1.182 C(S1Q12B)[T.6] 0.0695 0.947 0.073 0.941 -1.787 1.926 C(S1Q12B)[T.7] 0.2514 0.936 0.269 0.788 -1.584 2.087 C(S1Q12B)[T.8] 0.2472 0.932 0.265 0.791 -1.580 2.075 C(S1Q12B)[T.9] -0.9623 0.916 -1.051 0.293 -2.758 0.833 C(S1Q12B)[T.10] -0.8889 0.960 -0.926 0.355 -2.771 0.993 C(S1Q12B)[T.11] -0.3715 0.891 -0.417 0.677 -2.119 1.376 C(S1Q12B)[T.12] -1.0150 0.890 -1.140 0.254 -2.760 0.730 C(S1Q12B)[T.13] -1.5657 0.930 -1.684 0.092 -3.388 0.257 C(S1Q12B)[T.14] -0.4894 0.965 -0.507 0.612 -2.381 1.403 C(S1Q12B)[T.15] -1.1780 1.011 -1.165 0.244 -3.161 0.805 C(S1Q12B)[T.16] -0.7991 1.178 -0.678 0.498 -3.109 1.511 C(S1Q12B)[T.17] -0.0615 1.124 -0.055 0.956 -2.264 2.141 C(S1Q12B)[T.18] -0.4493 1.344 -0.334 0.738 -3.084 2.186 C(S1Q12B)[T.19] -0.6170 1.188 -0.519 0.603 -2.945 1.712 C(S1Q12B)[T.20] -0.1169 1.278 -0.091 0.927 -2.622 2.389 C(S1Q12B)[T.21] 0.3235 1.268 0.255 0.799 -2.163 2.810 ============================================================================== Omnibus: 1472.119 Durbin-Watson: 1.987 Prob(Omnibus): 0.000 Jarque-Bera (JB): 4157.875 Skew: 1.670 Prob(JB): 0.00 Kurtosis: 6.195 Cond. No. 27.8 ==============================================================================

Tukey Test:

Multiple Comparison of Means - Tukey HSD,FWER=0.05 ========================================= group1 group2 meandiff lower upper reject ----------------------------------------- 1 2 nan nan nan False 1 3 nan nan nan False 1 4 nan nan nan False 1 5 nan nan nan False 1 6 nan nan nan False 1 7 nan nan nan False 1 8 nan nan nan False 1 9 nan nan nan False 1 10 nan nan nan False 1 11 nan nan nan False 1 12 nan nan nan False 1 13 nan nan nan False 1 14 nan nan nan False 1 15 nan nan nan False 1 16 nan nan nan False 1 17 nan nan nan False 1 18 nan nan nan False 1 19 nan nan nan False 1 20 nan nan nan False 1 21 nan nan nan False 2 3 nan nan nan False 2 4 nan nan nan False 2 5 nan nan nan False 2 6 nan nan nan False 2 7 nan nan nan False 2 8 nan nan nan False 2 9 nan nan nan False 2 10 nan nan nan False 2 11 nan nan nan False 2 12 nan nan nan False 2 13 nan nan nan False 2 14 nan nan nan False 2 15 nan nan nan False 2 16 nan nan nan False 2 17 nan nan nan False 2 18 nan nan nan False 2 19 nan nan nan False 2 20 nan nan nan False 2 21 nan nan nan False 3 4 nan nan nan False 3 5 nan nan nan False 3 6 nan nan nan False 3 7 nan nan nan False 3 8 nan nan nan False 3 9 nan nan nan False 3 10 nan nan nan False 3 11 nan nan nan False 3 12 nan nan nan False 3 13 nan nan nan False 3 14 nan nan nan False 3 15 nan nan nan False 3 16 nan nan nan False 3 17 nan nan nan False 3 18 nan nan nan False 3 19 nan nan nan False 3 20 nan nan nan False 3 21 nan nan nan False 4 5 nan nan nan False 4 6 nan nan nan False 4 7 nan nan nan False 4 8 nan nan nan False 4 9 nan nan nan False 4 10 nan nan nan False 4 11 nan nan nan False 4 12 nan nan nan False 4 13 nan nan nan False 4 14 nan nan nan False 4 15 nan nan nan False 4 16 nan nan nan False 4 17 nan nan nan False 4 18 nan nan nan False 4 19 nan nan nan False 4 20 nan nan nan False 4 21 nan nan nan False 5 6 nan nan nan False 5 7 nan nan nan False 5 8 nan nan nan False 5 9 nan nan nan False 5 10 nan nan nan False 5 11 nan nan nan False 5 12 nan nan nan False 5 13 nan nan nan False 5 14 nan nan nan False 5 15 nan nan nan False 5 16 nan nan nan False 5 17 nan nan nan False 5 18 nan nan nan False 5 19 nan nan nan False 5 20 nan nan nan False 5 21 nan nan nan False 6 7 nan nan nan False 6 8 nan nan nan False 6 9 nan nan nan False 6 10 nan nan nan False 6 11 nan nan nan False 6 12 nan nan nan False 6 13 nan nan nan False 6 14 nan nan nan False 6 15 nan nan nan False 6 16 nan nan nan False 6 17 nan nan nan False 6 18 nan nan nan False 6 19 nan nan nan False 6 20 nan nan nan False 6 21 nan nan nan False 7 8 nan nan nan False 7 9 nan nan nan False 7 10 nan nan nan False 7 11 nan nan nan False 7 12 nan nan nan False 7 13 nan nan nan False 7 14 nan nan nan False 7 15 nan nan nan False 7 16 nan nan nan False 7 17 nan nan nan False 7 18 nan nan nan False 7 19 nan nan nan False 7 20 nan nan nan False 7 21 nan nan nan False 8 9 nan nan nan False 8 10 nan nan nan False 8 11 nan nan nan False 8 12 nan nan nan False 8 13 nan nan nan False 8 14 nan nan nan False 8 15 nan nan nan False 8 16 nan nan nan False 8 17 nan nan nan False 8 18 nan nan nan False 8 19 nan nan nan False 8 20 nan nan nan False 8 21 nan nan nan False 9 10 nan nan nan False 9 11 nan nan nan False 9 12 nan nan nan False 9 13 nan nan nan False 9 14 nan nan nan False 9 15 nan nan nan False 9 16 nan nan nan False 9 17 nan nan nan False 9 18 nan nan nan False 9 19 nan nan nan False 9 20 nan nan nan False 9 21 nan nan nan False 10 11 nan nan nan False 10 12 nan nan nan False 10 13 nan nan nan False 10 14 nan nan nan False 10 15 nan nan nan False 10 16 nan nan nan False 10 17 nan nan nan False 10 18 nan nan nan False 10 19 nan nan nan False 10 20 nan nan nan False 10 21 nan nan nan False 11 12 nan nan nan False 11 13 nan nan nan False 11 14 nan nan nan False 11 15 nan nan nan False 11 16 nan nan nan False 11 17 nan nan nan False 11 18 nan nan nan False 11 19 nan nan nan False 11 20 nan nan nan False 11 21 nan nan nan False 12 13 nan nan nan False 12 14 nan nan nan False 12 15 nan nan nan False 12 16 nan nan nan False 12 17 nan nan nan False 12 18 nan nan nan False 12 19 nan nan nan False 12 20 nan nan nan False 12 21 nan nan nan False 13 14 nan nan nan False 13 15 nan nan nan False 13 16 nan nan nan False 13 17 nan nan nan False 13 18 nan nan nan False 13 19 nan nan nan False 13 20 nan nan nan False 13 21 nan nan nan False 14 15 nan nan nan False 14 16 nan nan nan False 14 17 nan nan nan False 14 18 nan nan nan False 14 19 nan nan nan False 14 20 nan nan nan False 14 21 nan nan nan False 15 16 nan nan nan False 15 17 nan nan nan False 15 18 nan nan nan False 15 19 nan nan nan False 15 20 nan nan nan False 15 21 nan nan nan False 16 17 nan nan nan False 16 18 nan nan nan False 16 19 nan nan nan False 16 20 nan nan nan False 16 21 nan nan nan False 17 18 nan nan nan False 17 19 nan nan nan False 17 20 nan nan nan False 17 21 nan nan nan False 18 19 nan nan nan False 18 20 nan nan nan False 18 21 nan nan nan False 19 20 nan nan nan False 19 21 nan nan nan False 20 21 nan nan nan False -----------------------------------------

0 notes

phillust-blog · 8 years ago

Text

Data Analysis Course Week 4 Assignment

I created 4 bar charts for my variables. The first shows the number of service members who reported being depressed in the last 4 weeks. It is a categorical variable and only takes the values of 1-4. I removed those individuals who did not report being depressed.

The chart has a negative trend concerning the quantity of people reporting being depressed in the last 4 weeks.

The second chart is similar to the first except it is the remaining records for people who are not service members.

This chart shows the same trend as that of the service members

The next two charts are bivariate charts. The first shows the percentage of service member who reported being depressed and also had suicidal ideation.

There is no real trend shown in the data as there are only 4 categories. Although the chart does show that half or more of service members with more severe depression thought about suicide.

The second chart shows the percentage of service members who reported being depressed and sought medical help.

It is positive to note that service members with more severe depression did seek medical help at a 50% or higher rate. Although these numbers can be misleading because of low populations in the more severe levels.

0 notes

phillust-blog · 8 years ago

Text

Data Analysis Course Week 3 Assignment

Adjusted my script to make three changes to the data structure.

1. Replaced unknown values in all interested data fields with a unknown data identifier.

2. S1Q213 (reports of depression) as recoded to make more high levels of reported depression a higher number. Original code was 1 (high) to 5 (low); recoded to 4 (high) to 0 (low) levels.

3. Mapped S4AQ16 & S4AQ4A17 from 1 (Yes) and 2 (No) to 1 (Yes) and 0 (No) to allow for a binary variable that when multiplied will identify those who sought medical help and had suicidal thoughts.

I would also like to bin my results to combine those who sought help in two secondary variable and those who didn't but I was getting a syntax error even though the code mirrored the example.

Frequency Distributions

Counts for 4wkDprsd for all individuals who answered being depressed a little or more in the last 4 weeks 2.000000 49 0.000000 258 1.000000 97 4.000000 8 3.000000 15 nan 4 Name: 4wkDprsd, dtype: int64

% for 4wkDprsd for all individuals who answered being depressed a little or more in the last 4 weeks 2.000000 0.113689 0.000000 0.598608 1.000000 0.225058 4.000000 0.018561 3.000000 0.034803 nan 0.009281 Name: 4wkDprsd, dtype: float64

Discussion: ~60% of the surveyed military populace did not report being depressed at all during the past 4 weeks. With up to ~5% reporting being depressed most or all of the time.

Counts for DprsdSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks and saw a doctor 4.000000 2 3.000000 7 0.000000 25 2.000000 4 1.000000 3 Name: DprsdSawDoc, dtype: int64

% for DprsdSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks and saw a doctor 4.000000 0.048780 3.000000 0.170732 0.000000 0.609756 2.000000 0.097561 1.000000 0.073171 Name: DprsdSawDoc, dtype: float64

Discussion: ~39% of those who stated they were depressed at least a little sought medical help. Later I will probably explore the ratio of those who sought help within each level of reported depression.

Counts for SuicideSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks, thought of suicide, and saw a doctor 0.000000 63 1.000000 9 3.000000 6 4.000000 2 2.000000 2 Name: SuicideSawDoc, dtype: int64

% for SuicideSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks, thought of suicide, and saw a doctor 0.000000 0.768293 1.000000 0.109756 3.000000 0.073171 4.000000 0.024390 2.000000 0.024390 Name: SuicideSawDoc, dtype: float64

Discussion: ~24% of those who stated they were thought of suicide sought medical help. Later I will probably explore the ratio of those who sought help within each level of reported depression for those who thought about suicide.

0 notes

phillust-blog · 8 years ago

Text

Military Service and depression frequency distributions

Code:

Spyder Editor

"""

import pandas import numpy

data = pandas.read_csv('nesarc_pds.csv', low_memory=False)

#print(len(data)) #number of observations (rows) #print(len(data.columns)) #number of variables (columns)

# sets strings to uppercase all Dataframe column names data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)

# Converts data entries to number format, correct python data read in errors

data["S1Q9A"] = data["S1Q9A"].convert_objects(convert_numeric=True) data["S1Q9B"] = data["S1Q9B"].convert_objects(convert_numeric=True) data["S1Q9C"] = data["S1Q9C"].convert_objects(convert_numeric=True) data["S1Q213"] = data["S1Q213"].convert_objects(convert_numeric=True) data["S4AQ1"] = data["S4AQ1"].convert_objects(convert_numeric=True) data["S4AQ2"] = data["S4AQ2"].convert_objects(convert_numeric=True) data["S4AQ4A12"] = data["S4AQ4A12"].convert_objects(convert_numeric=True) data["S4AQ4A17"] = data["S4AQ4A17"].convert_objects(convert_numeric=True) data["S4AQ4A18"] = data["S4AQ4A18"].convert_objects(convert_numeric=True) data["S4AQ51"] = data["S4AQ51"].convert_objects(convert_numeric=True) data["S4AQ16"] = data["S4AQ16"].convert_objects(convert_numeric=True)

# Frequency and normalized counts for variables print("counts S1Q9A - Business or Industry: Current or most recent job, 14 = Armed Forces") c1 = data["S1Q9A"].value_counts(sort=False) print(c1)

print("% for S1Q9A - Business or Industry: Current or most recent job, 14 = Armed Forces") p1 = data["S1Q9A"].value_counts(sort=False, normalize=True) print(p1)

print("counts S1Q9B - Occupation: Current or most recent job, 14 = Military") c2 = data["S1Q9B"].value_counts(sort=False) print(c2)

print("% S1Q9B - Occupation: Current or most recent job, 14 = Military") p2 = data["S1Q9B"].value_counts(sort=False, normalize=True) print(p2)

print("counts S1Q9C - Type of Employer: Current or most recnet job, 6 = Armed Forces") c3 = data["S1Q9C"].value_counts(sort=False) print (c3)

print("% S1Q9C - Type of Employer: Current or most recnet job, 6 = Armed Forces") p3 = data["S1Q9C"].value_counts(sort=False, normalize=True) print (p3)

print("counts S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time") c4 = data["S1Q213"].value_counts(sort=False) print (c4)

print("% S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time") p4 = data["S1Q213"].value_counts(sort=False, normalize=True) print (p4)

print("counts S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes") c5 = data["S4AQ1"].value_counts(sort=False) print (c5)

print("% S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes") p5 = data["S4AQ1"].value_counts(sort=False, normalize=True) print (p5)

print("counts S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes") c6 = data["S4AQ2"].value_counts(sort=False) print (c6)

print("% S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes") p6 = data["S4AQ2"].value_counts(sort=False, normalize=True) print (p6)

print("counts S4AQ4A12 - Felt worthless most ofg the time for 2+ weeks, 1 = Yes") c7 = data["S4AQ4A12"].value_counts(sort=False) print (c7)

print("% S4AQ4A12 - Felt worthless most ofg the time for 2+ weeks, 1 = Yes") p7 = data["S4AQ4A12"].value_counts(sort=False, normalize=True) print (p7)

print("counts S4AQ4A17 - Thought about committing suicide, 1 = Yes") c8 = data["S4AQ4A17"].value_counts(sort=False) print (c8)

print("% S4AQ4A17 - Thought about committing suicide, 1 = Yes") p8 = data["S4AQ4A17"].value_counts(sort=False, normalize=True) print (p8)

print("counts S4AQ4A18 - Felt like wanted to die, 1 = Yes") c9 = data["S4AQ4A18"].value_counts(sort=False) print (c9)

print("% S4AQ4A18 - Felt like wanted to die, 1 = Yes") p9 = data["S4AQ4A18"].value_counts(sort=False, normalize=True) print (p9)

print("counts S4AQ51 - Felt uncomfortable or upset by low mood, 1 = Yes") c10 = data["S4AQ51"].value_counts(sort=False) print (c10)

print("% S4AQ51 - Felt uncomfortable or upset by low mood, 1 = Yes") p10 = data["S4AQ51"].value_counts(sort=False, normalize=True) print (p10)

print("counts S4AQ16 - Went to counselor/therapist/doctor to help mood, 1 = Yes") c11 = data["S4AQ16"].value_counts(sort=False) print (c11)

print("% S4AQ16 - Went to counselor/therapist/doctor to help mood, 1 = Yes") p11 = data["S4AQ16"].value_counts(sort=False, normalize=True) print (p11)

OUTPUTS

counts S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time 1 907 2 2051 3 6286 4 11305 5 22127 9 417 Name: S1Q213, dtype: int64

% S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time 1 0.021048 2 0.047595 3 0.145871 4 0.262340 5 0.513471 9 0.009677 Name: S1Q213, dtype: float64

counts S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes 1 12785 2 29416 9 892 Name: S4AQ1, dtype: int64

% S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes 1 0.296684 2 0.682617 9 0.020699 Name: S4AQ1, dtype: float64

counts S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes 1 10533 2 31618 9 942 Name: S4AQ2, dtype: int64

% S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes 1 0.244425 2 0.733715 9 0.021860 Name: S4AQ2, dtype: float64

The output shows the count and normalized data for 3 questions from the NESARC data set. The final number (9) in each output set shows missing data. For S1Q213, the scale is a likert scale with 1-3 the values that are of interest. S4AQ1 and S4AQ2 are yes/no questions. There is not much to say about the distribution of the data until it is charted.

0 notes