phillust-blog
phillust-blog
Data Analysis Course
8 posts
Don't wanna be here? Send us removal request.
phillust-blog · 8 years ago
Text
Data Analysis Tools Week 4 Assignment
I'm looking into whether the region someone lives in moderates someone's depression and associated thoughts of suicide.  The output shows that region is a significant moderator to the interaction between depression and associated suicidal ideation as all p-values are significant.
Code:
# -*- coding: utf-8 -*- """ @author: Phil Lust """
import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi
#Data Read in data = pandas.read_csv('nesarc_pds.csv', low_memory=False)
#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)
#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)
# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)
#set variables of interest to numeric data['S4AQ4A17'] = pandas.to_numeric(data['S4AQ4A17'], errors='coerce') data['S1Q213'] = pandas.to_numeric(data['S1Q213'], errors='coerce') data['REGION'] = pandas.to_numeric(data['REGION'], errors='coerce')
#Replacing unknown values in the dataframe data['S4AQ4A17']=data['S4AQ4A17'].replace(9,numpy.nan) data['S1Q213']=data['S1Q213'].replace(9,numpy.nan)
#Recode of values for logical consistency recode1 = {1:1, 2:0} data['S4AQ4A17'] = data['S4AQ4A17'].map(recode1) recode2 = {1:4, 2:3, 3:2, 4:1, 5:0} data['S1Q213'] = data['S1Q213'].map(recode2)
#data subsets for moderator exploration of Region sub1 = data[(data['REGION']==1)] sub2 = data[(data['REGION']==2)] sub3 = data[(data['REGION']==3)] sub4 = data[(data['REGION']==4)]
#crosstab table of observed observations ct1=pandas.crosstab(sub1['S4AQ4A17'], sub1['S1Q213']) print (ct1)
#column percentages colsum1 = ct1.sum(axis=0) colpct1 = ct1/colsum1 print (colpct1)
#chi-square on Northwest moderator print('association between depression and suicidal thoughts in the Northwest') cs1 = scipy.stats.chi2_contingency(ct1) print(cs1)
#crosstab table of observed observations ct2=pandas.crosstab(sub2['S4AQ4A17'], sub2['S1Q213']) print (ct2)
#column percentages colsum2 = ct2.sum(axis=0) colpct2 = ct2/colsum2 print (colpct2)
#chi-square on Midwest moderator print('association between depression and suicidal thoughts in the Midwest') cs2 = scipy.stats.chi2_contingency(ct2) print(cs2)
#crosstab table of observed observations ct3=pandas.crosstab(sub3['S4AQ4A17'], sub3['S1Q213']) print (ct3)
#column percentages colsum3 = ct3.sum(axis=0) colpct3 = ct3/colsum3 print (colpct3)
#chi-square on South moderator print('association between depression and suicidal thoughts in the South') cs3 = scipy.stats.chi2_contingency(ct3) print(cs3)
#crosstab table of observed observations ct4=pandas.crosstab(sub4['S4AQ4A17'], sub4['S1Q213']) print (ct4)
#column percentages colsum4 = ct4.sum(axis=0) colpct4 = ct4/colsum4 print (colpct4)
#chi-square on West moderator print('association between depression and suicidal thoughts in the West') cs4 = scipy.stats.chi2_contingency(ct4) print(cs4)
Output:
S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000       672       616       448       152        53 1.000000       131       236       179       104        49 S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000  0.836862  0.723005  0.714514  0.593750  0.519608 1.000000  0.163138  0.276995  0.285486  0.406250  0.480392 association between depression and suicidal thoughts in the Northwest (95.327124345434868, 9.7091644408750795e-20, 4, array([[ 590.3875    ,  626.41363636,  460.9875    ,  188.21818182,           74.99318182],        [ 212.6125    ,  225.58636364,  166.0125    ,   67.78181818,           27.00681818]]))
S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000       852       759       426       181        59 1.000000       160       277       216       105        43 S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000  0.841897  0.732625  0.663551  0.632867  0.578431 1.000000  0.158103  0.267375  0.336449  0.367133  0.421569 association between depression and suicidal thoughts in the Midwest (105.24586010821909, 7.507621975326203e-22, 4, array([[ 748.64327485,  766.39766082,  474.92982456,  211.57309942,           75.45614035],        [ 263.35672515,  269.60233918,  167.07017544,   74.42690058,           26.54385965]]))
S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000      1358      1062       796       341       155 1.000000       233       330       298       188       115 S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000  0.853551  0.762931  0.727605  0.644612  0.574074 1.000000  0.146449  0.237069  0.272395  0.355388  0.425926 association between depression and suicidal thoughts in the South (173.07239474939345, 2.2908395479186054e-36, 4, array([[ 1211.19606235,  1059.70139459,   832.84003281,   402.71698113,           205.54552912],        [  379.80393765,   332.29860541,   261.15996719,   126.28301887,            64.45447088]]))
S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000       734       745       458       165        49 1.000000       194       315       225       115        48 S1Q213    0.000000  1.000000  2.000000  3.000000  4.000000 S4AQ4A17                                                  0.000000  0.790948  0.702830  0.670571  0.589286  0.505155 1.000000  0.209052  0.297170  0.329429  0.410714  0.494845 association between depression and suicidal thoughts in the West (73.628402483107678, 3.8855318766982616e-15, 4, array([[ 654.8976378 ,  748.0511811 ,  481.99901575,  197.5984252 ,           68.45374016],        [ 273.1023622 ,  311.9488189 ,  201.00098425,   82.4015748 ,           28.54625984]]))
0 notes
phillust-blog · 8 years ago
Text
Data Analysis Tools Week3 Assignment
I made a small adjustment this week to allow for calculation of a correlation coefficient.  My selected data elements only has categorical data.  I looked into the relationship between quantity of alcoholic beverages consumed and depression rates.  I found that the correlation factor is .087 and a p-value of .055
As the p-value is not below .05, we fail to reject the null-hypothesis.
Output:
association between alcohol use and depression (0.08677304304640121, 0.055418860920723026)
Code Follows:
# -*- coding: utf-8 -*- """ Created on Fri Oct 27 19:01:29 2017
@author: 585441 """
# -*- coding: utf-8 -*- """ Created on Sat Oct  7 17:29:57 2017
@author: Phil Lust """
import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi
data = pandas.read_csv('nesarc_pds.csv', low_memory=False)
#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)
#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)
# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)
# Converts data entries to number format, correct python data read in errors data["S2AQ22"] = pandas.to_numeric(data["S2AQ22"], errors='coerce') data["S1Q213"] = pandas.to_numeric(data["S1Q213"], errors='coerce')
#Replacing unknown values in the dataframe data['S2AQ22']=data['S2AQ22'].replace(99,numpy.nan) data['S1Q213']=data['S1Q213'].replace(9,numpy.nan)
#recode alcohol use from low to high recode1 = {11:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10} data['S2AQ22'] = data['S2AQ22'].map(recode1)
#recode depression from low to high recode2 = {5:0, 1:1, 2:2, 3:3, 4:4} data['S1Q213'] = data['S1Q213'].map(recode2)
#drop blank and nan entries to enable correlation calculation data_clean=data.dropna()
#Correlation calculation command print ('association between alcohol use and depression') print (scipy.stats.pearsonr(data_clean['S2AQ22'], data_clean['S1Q213']))
0 notes
phillust-blog · 8 years ago
Text
Data Analysis Tools Week2 Assignment
I'm looking at the relationship between suicidal thoughts [S4AQ4A17] and seeking medical help [S4AQ16] from the NESARC data set, both are two level categorical data entries.  The Chi-squared test yielded a p-value of 2.43 x 10^-93, effectively 0.  This means that we reject the null hypothesis that there is no relationship between suicidal ideation and seeking help; and except the alternate that there is a relationship.  Since both variable are only 2 levels there is no need for a post-hoc test to determine the pairwise relationship.
Output from code:
S4AQ4A17  1.000000  2.000000 S4AQ16                      1.000000      2135      2401 2.000000      1098      3138 S4AQ4A17  1.000000  2.000000 S4AQ16                      1.000000  0.660377  0.433472 2.000000  0.339623  0.566528 chi-square value, p value, expected counts (420.00948971287778, 2.4287382795898561e-93, 1, array([[ 1671.78385773,  2864.21614227],        [ 1561.21614227,  2674.78385773]]))
The code follows:
import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi
data = pandas.read_csv('nesarc_pds.csv', low_memory=False)
#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)
#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)
# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)
# Converts data entries to number format, correct python data read in errors data["S4AQ16"] = pandas.to_numeric(data["S4AQ16"], errors='coerce') data["S4AQ4A17"] = pandas.to_numeric(data["S4AQ4A17"], errors='coerce')
#Replacing unknown values in the dataframe data['S4AQ16']=data['S4AQ16'].replace(9,numpy.nan) data['S4AQ4A17']=data['S4AQ4A17'].replace(9,numpy.nan)
#contingency table of observed results ct1=pandas.crosstab(data['S4AQ16'], data['S4AQ4A17']) print (ct1)
#column percentages colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)
#Chi-square print('chi-square value, p value, expected counts') cs1 = scipy.stats.chi2_contingency(ct1) print (cs1)
0 notes
phillust-blog · 8 years ago
Text
Data Analysis Tools Week 1 Assignment
My study questions does not have any quantitative variables so I did an ANOVA on age of alcohol dependence and house hold income.
What the ANOVA resulted in a relationship between the age of alcohol dependence and house hold income (p=0.0146).  However the Tukey test showed that there are no differences when taken in a pairwise comparison.
The program I used is as follows:
import pandas import numpy import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi
data = pandas.read_csv('nesarc_pds.csv', low_memory=False)
#set pandas to show all columns in dataframe pandas.set_option('display.max_columns', None)
#set pandas to show all rows in dataframe pandas.set_option('display.max_rows', None)
# sets strings to uppercase in all Dataframe column names data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)
# Converts data entries to number format, correct python data read in errors data["S1Q12B"] = data["S1Q12B"].convert_objects(convert_numeric=True) data["S2BQ2D"] = data["S2BQ2D"].convert_objects(convert_numeric=True) data["S2BQ2FR"] = data["S2BQ2FR"].convert_objects(convert_numeric=True) data["ALCABDEPP12DX"] = data["ALCABDEPP12DX"].convert_objects(convert_numeric=True)
#Replacing unknown values in the dataframe data['S2BQ2D']=data['S2BQ2D'].replace(99,numpy.nan) data['S2BQ2FR']=data['S2BQ2FR'].replace(999,numpy.nan)
#subset of data to only include those who have had alcohol dependence in the #last 12 months sub1 = data[(data['ALCABDEPP12DX']!=0)]
model1 = smf.ols(formula='S2BQ2D ~ C(S1Q12B)', data=sub1) results1 = model1.fit() print (results1.summary())
mc1 = multi.MultiComparison(sub1['S2BQ2D'], sub1['S1Q12B']) res1 = mc1.tukeyhsd() print (res1.summary())
The output is:
                            OLS Regression Results                            ============================================================================== Dep. Variable:                 S2BQ2D   R-squared:                       0.008 Model:                            OLS   Adj. R-squared:                  0.003 Method:                 Least Squares   F-statistic:                     1.813 Date:                Thu, 02 Nov 2017   Prob (F-statistic):             0.0146 Time:                        18:22:33   Log-Likelihood:                -16996. No. Observations:                4671   AIC:                         3.403e+04 Df Residuals:                    4650   BIC:                         3.417e+04 Df Model:                          20                                         Covariance Type:            nonrobust                                         ===================================================================================                       coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------- Intercept          24.9000      0.780     31.937      0.000      23.372      26.428 C(S1Q12B)[T.2]      1.8708      1.025      1.825      0.068      -0.139       3.881 C(S1Q12B)[T.3]      1.5815      1.113      1.421      0.155      -0.600       3.763 C(S1Q12B)[T.4]      0.4700      1.017      0.462      0.644      -1.523       2.463 C(S1Q12B)[T.5]     -1.0440      1.135     -0.920      0.358      -3.270       1.182 C(S1Q12B)[T.6]      0.0695      0.947      0.073      0.941      -1.787       1.926 C(S1Q12B)[T.7]      0.2514      0.936      0.269      0.788      -1.584       2.087 C(S1Q12B)[T.8]      0.2472      0.932      0.265      0.791      -1.580       2.075 C(S1Q12B)[T.9]     -0.9623      0.916     -1.051      0.293      -2.758       0.833 C(S1Q12B)[T.10]    -0.8889      0.960     -0.926      0.355      -2.771       0.993 C(S1Q12B)[T.11]    -0.3715      0.891     -0.417      0.677      -2.119       1.376 C(S1Q12B)[T.12]    -1.0150      0.890     -1.140      0.254      -2.760       0.730 C(S1Q12B)[T.13]    -1.5657      0.930     -1.684      0.092      -3.388       0.257 C(S1Q12B)[T.14]    -0.4894      0.965     -0.507      0.612      -2.381       1.403 C(S1Q12B)[T.15]    -1.1780      1.011     -1.165      0.244      -3.161       0.805 C(S1Q12B)[T.16]    -0.7991      1.178     -0.678      0.498      -3.109       1.511 C(S1Q12B)[T.17]    -0.0615      1.124     -0.055      0.956      -2.264       2.141 C(S1Q12B)[T.18]    -0.4493      1.344     -0.334      0.738      -3.084       2.186 C(S1Q12B)[T.19]    -0.6170      1.188     -0.519      0.603      -2.945       1.712 C(S1Q12B)[T.20]    -0.1169      1.278     -0.091      0.927      -2.622       2.389 C(S1Q12B)[T.21]     0.3235      1.268      0.255      0.799      -2.163       2.810 ============================================================================== Omnibus:                     1472.119   Durbin-Watson:                   1.987 Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4157.875 Skew:                           1.670   Prob(JB):                         0.00 Kurtosis:                       6.195   Cond. No.                         27.8 ==============================================================================
Tukey Test:
Multiple Comparison of Means - Tukey HSD,FWER=0.05 ========================================= group1 group2 meandiff lower upper reject -----------------------------------------   1      2      nan     nan   nan  False   1      3      nan     nan   nan  False   1      4      nan     nan   nan  False   1      5      nan     nan   nan  False   1      6      nan     nan   nan  False   1      7      nan     nan   nan  False   1      8      nan     nan   nan  False   1      9      nan     nan   nan  False   1      10     nan     nan   nan  False   1      11     nan     nan   nan  False   1      12     nan     nan   nan  False   1      13     nan     nan   nan  False   1      14     nan     nan   nan  False   1      15     nan     nan   nan  False   1      16     nan     nan   nan  False   1      17     nan     nan   nan  False   1      18     nan     nan   nan  False   1      19     nan     nan   nan  False   1      20     nan     nan   nan  False   1      21     nan     nan   nan  False   2      3      nan     nan   nan  False   2      4      nan     nan   nan  False   2      5      nan     nan   nan  False   2      6      nan     nan   nan  False   2      7      nan     nan   nan  False   2      8      nan     nan   nan  False   2      9      nan     nan   nan  False   2      10     nan     nan   nan  False   2      11     nan     nan   nan  False   2      12     nan     nan   nan  False   2      13     nan     nan   nan  False   2      14     nan     nan   nan  False   2      15     nan     nan   nan  False   2      16     nan     nan   nan  False   2      17     nan     nan   nan  False   2      18     nan     nan   nan  False   2      19     nan     nan   nan  False   2      20     nan     nan   nan  False   2      21     nan     nan   nan  False   3      4      nan     nan   nan  False   3      5      nan     nan   nan  False   3      6      nan     nan   nan  False   3      7      nan     nan   nan  False   3      8      nan     nan   nan  False   3      9      nan     nan   nan  False   3      10     nan     nan   nan  False   3      11     nan     nan   nan  False   3      12     nan     nan   nan  False   3      13     nan     nan   nan  False   3      14     nan     nan   nan  False   3      15     nan     nan   nan  False   3      16     nan     nan   nan  False   3      17     nan     nan   nan  False   3      18     nan     nan   nan  False   3      19     nan     nan   nan  False   3      20     nan     nan   nan  False   3      21     nan     nan   nan  False   4      5      nan     nan   nan  False   4      6      nan     nan   nan  False   4      7      nan     nan   nan  False   4      8      nan     nan   nan  False   4      9      nan     nan   nan  False   4      10     nan     nan   nan  False   4      11     nan     nan   nan  False   4      12     nan     nan   nan  False   4      13     nan     nan   nan  False   4      14     nan     nan   nan  False   4      15     nan     nan   nan  False   4      16     nan     nan   nan  False   4      17     nan     nan   nan  False   4      18     nan     nan   nan  False   4      19     nan     nan   nan  False   4      20     nan     nan   nan  False   4      21     nan     nan   nan  False   5      6      nan     nan   nan  False   5      7      nan     nan   nan  False   5      8      nan     nan   nan  False   5      9      nan     nan   nan  False   5      10     nan     nan   nan  False   5      11     nan     nan   nan  False   5      12     nan     nan   nan  False   5      13     nan     nan   nan  False   5      14     nan     nan   nan  False   5      15     nan     nan   nan  False   5      16     nan     nan   nan  False   5      17     nan     nan   nan  False   5      18     nan     nan   nan  False   5      19     nan     nan   nan  False   5      20     nan     nan   nan  False   5      21     nan     nan   nan  False   6      7      nan     nan   nan  False   6      8      nan     nan   nan  False   6      9      nan     nan   nan  False   6      10     nan     nan   nan  False   6      11     nan     nan   nan  False   6      12     nan     nan   nan  False   6      13     nan     nan   nan  False   6      14     nan     nan   nan  False   6      15     nan     nan   nan  False   6      16     nan     nan   nan  False   6      17     nan     nan   nan  False   6      18     nan     nan   nan  False   6      19     nan     nan   nan  False   6      20     nan     nan   nan  False   6      21     nan     nan   nan  False   7      8      nan     nan   nan  False   7      9      nan     nan   nan  False   7      10     nan     nan   nan  False   7      11     nan     nan   nan  False   7      12     nan     nan   nan  False   7      13     nan     nan   nan  False   7      14     nan     nan   nan  False   7      15     nan     nan   nan  False   7      16     nan     nan   nan  False   7      17     nan     nan   nan  False   7      18     nan     nan   nan  False   7      19     nan     nan   nan  False   7      20     nan     nan   nan  False   7      21     nan     nan   nan  False   8      9      nan     nan   nan  False   8      10     nan     nan   nan  False   8      11     nan     nan   nan  False   8      12     nan     nan   nan  False   8      13     nan     nan   nan  False   8      14     nan     nan   nan  False   8      15     nan     nan   nan  False   8      16     nan     nan   nan  False   8      17     nan     nan   nan  False   8      18     nan     nan   nan  False   8      19     nan     nan   nan  False   8      20     nan     nan   nan  False   8      21     nan     nan   nan  False   9      10     nan     nan   nan  False   9      11     nan     nan   nan  False   9      12     nan     nan   nan  False   9      13     nan     nan   nan  False   9      14     nan     nan   nan  False   9      15     nan     nan   nan  False   9      16     nan     nan   nan  False   9      17     nan     nan   nan  False   9      18     nan     nan   nan  False   9      19     nan     nan   nan  False   9      20     nan     nan   nan  False   9      21     nan     nan   nan  False   10     11     nan     nan   nan  False   10     12     nan     nan   nan  False   10     13     nan     nan   nan  False   10     14     nan     nan   nan  False   10     15     nan     nan   nan  False   10     16     nan     nan   nan  False   10     17     nan     nan   nan  False   10     18     nan     nan   nan  False   10     19     nan     nan   nan  False   10     20     nan     nan   nan  False   10     21     nan     nan   nan  False   11     12     nan     nan   nan  False   11     13     nan     nan   nan  False   11     14     nan     nan   nan  False   11     15     nan     nan   nan  False   11     16     nan     nan   nan  False   11     17     nan     nan   nan  False   11     18     nan     nan   nan  False   11     19     nan     nan   nan  False   11     20     nan     nan   nan  False   11     21     nan     nan   nan  False   12     13     nan     nan   nan  False   12     14     nan     nan   nan  False   12     15     nan     nan   nan  False   12     16     nan     nan   nan  False   12     17     nan     nan   nan  False   12     18     nan     nan   nan  False   12     19     nan     nan   nan  False   12     20     nan     nan   nan  False   12     21     nan     nan   nan  False   13     14     nan     nan   nan  False   13     15     nan     nan   nan  False   13     16     nan     nan   nan  False   13     17     nan     nan   nan  False   13     18     nan     nan   nan  False   13     19     nan     nan   nan  False   13     20     nan     nan   nan  False   13     21     nan     nan   nan  False   14     15     nan     nan   nan  False   14     16     nan     nan   nan  False   14     17     nan     nan   nan  False   14     18     nan     nan   nan  False   14     19     nan     nan   nan  False   14     20     nan     nan   nan  False   14     21     nan     nan   nan  False   15     16     nan     nan   nan  False   15     17     nan     nan   nan  False   15     18     nan     nan   nan  False   15     19     nan     nan   nan  False   15     20     nan     nan   nan  False   15     21     nan     nan   nan  False   16     17     nan     nan   nan  False   16     18     nan     nan   nan  False   16     19     nan     nan   nan  False   16     20     nan     nan   nan  False   16     21     nan     nan   nan  False   17     18     nan     nan   nan  False   17     19     nan     nan   nan  False   17     20     nan     nan   nan  False   17     21     nan     nan   nan  False   18     19     nan     nan   nan  False   18     20     nan     nan   nan  False   18     21     nan     nan   nan  False   19     20     nan     nan   nan  False   19     21     nan     nan   nan  False   20     21     nan     nan   nan  False -----------------------------------------
0 notes
phillust-blog · 8 years ago
Text
Data Analysis Course Week 4 Assignment
I created 4 bar charts for my variables.  The first shows the number of service members who reported being depressed in the last 4 weeks.  It is a categorical variable and only takes the values of 1-4.  I removed those individuals who did not report being depressed. 
The chart has a negative trend concerning the quantity of people reporting being depressed in the last 4 weeks.
The second chart is similar to the first except it is the remaining records for people who are not service members.
This chart shows the same trend as that of the service members
The next two charts are bivariate charts.  The first shows the percentage of service member who reported being depressed and also had suicidal ideation.
There is no real trend shown in the data as there are only 4 categories.  Although the chart does show that half or more of service members with more severe depression thought about suicide.
The second chart shows the percentage of service members who reported being depressed and sought medical help.
It is positive to note that service members with more severe depression did seek medical help at a 50% or higher rate.  Although these numbers can be misleading because of low populations in the more severe levels.
0 notes
phillust-blog · 8 years ago
Text
Data Analysis Course Week 3 Assignment
Adjusted my script to make three changes to the data structure.
1.  Replaced unknown values in all interested data fields with a unknown data identifier.
2.  S1Q213 (reports of depression) as recoded to make more high levels of reported depression a higher number.  Original code was 1 (high) to 5 (low); recoded to 4 (high) to 0 (low) levels.
3.  Mapped S4AQ16 & S4AQ4A17 from 1 (Yes) and 2 (No) to 1 (Yes) and 0 (No) to allow for a binary variable that when multiplied will identify those who sought medical help and had suicidal thoughts.
I would also like to bin my results to combine those who sought help in two secondary variable and those who didn't but I was getting a syntax error even though the code mirrored the example.
Frequency Distributions
Counts for 4wkDprsd for all individuals who answered being depressed a little or more in the last 4 weeks 2.000000     49 0.000000    258 1.000000     97 4.000000      8 3.000000     15 nan           4 Name: 4wkDprsd, dtype: int64
% for 4wkDprsd for all individuals who answered being depressed a little or more in the last 4 weeks 2.000000   0.113689 0.000000   0.598608 1.000000   0.225058 4.000000   0.018561 3.000000   0.034803 nan        0.009281 Name: 4wkDprsd, dtype: float64
Discussion: ~60% of the surveyed military populace did not report being depressed at all during the past 4 weeks.  With up to ~5% reporting being depressed most or all of the time.
Counts for DprsdSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks and saw a doctor 4.000000     2 3.000000     7 0.000000    25 2.000000     4 1.000000     3 Name: DprsdSawDoc, dtype: int64
% for DprsdSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks and saw a doctor 4.000000   0.048780 3.000000   0.170732 0.000000   0.609756 2.000000   0.097561 1.000000   0.073171 Name: DprsdSawDoc, dtype: float64
Discussion:  ~39% of those who stated they were depressed at least a little sought medical help.  Later I will probably explore the ratio of those who sought help within each level of reported depression.
Counts for SuicideSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks, thought of suicide, and saw a doctor 0.000000    63 1.000000     9 3.000000     6 4.000000     2 2.000000     2 Name: SuicideSawDoc, dtype: int64
% for SuicideSawDoc for all individuals who answered being depressed a little or more in the last 4 weeks, thought of suicide, and saw a doctor 0.000000   0.768293 1.000000   0.109756 3.000000   0.073171 4.000000   0.024390 2.000000   0.024390 Name: SuicideSawDoc, dtype: float64
Discussion:  ~24% of those who stated they were thought of suicide sought medical help.  Later I will probably explore the ratio of those who sought help within each level of reported depression for those who thought about suicide.
0 notes
phillust-blog · 8 years ago
Text
Military Service and depression frequency distributions
Code:
Spyder Editor
"""
import pandas import numpy
data = pandas.read_csv('nesarc_pds.csv', low_memory=False)
#print(len(data)) #number of observations (rows) #print(len(data.columns)) #number of variables (columns)
# sets strings to uppercase all Dataframe column names data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid runtime errors pandas.set_option('display.float_format', lambda x:'%f'%x)
# Converts data entries to number format, correct python data read in errors
data["S1Q9A"] = data["S1Q9A"].convert_objects(convert_numeric=True) data["S1Q9B"] = data["S1Q9B"].convert_objects(convert_numeric=True) data["S1Q9C"] = data["S1Q9C"].convert_objects(convert_numeric=True) data["S1Q213"] = data["S1Q213"].convert_objects(convert_numeric=True) data["S4AQ1"] = data["S4AQ1"].convert_objects(convert_numeric=True) data["S4AQ2"] = data["S4AQ2"].convert_objects(convert_numeric=True) data["S4AQ4A12"] = data["S4AQ4A12"].convert_objects(convert_numeric=True) data["S4AQ4A17"] = data["S4AQ4A17"].convert_objects(convert_numeric=True) data["S4AQ4A18"] = data["S4AQ4A18"].convert_objects(convert_numeric=True) data["S4AQ51"] = data["S4AQ51"].convert_objects(convert_numeric=True) data["S4AQ16"] = data["S4AQ16"].convert_objects(convert_numeric=True)
# Frequency and normalized counts for variables print("counts S1Q9A - Business or Industry: Current or most recent job, 14 = Armed Forces") c1 = data["S1Q9A"].value_counts(sort=False) print(c1)
print("% for S1Q9A - Business or Industry: Current or most recent job, 14 = Armed Forces") p1 = data["S1Q9A"].value_counts(sort=False, normalize=True) print(p1)
print("counts S1Q9B - Occupation: Current or most recent job, 14 = Military") c2 = data["S1Q9B"].value_counts(sort=False) print(c2)
print("% S1Q9B - Occupation: Current or most recent job, 14 = Military") p2 = data["S1Q9B"].value_counts(sort=False, normalize=True) print(p2)
print("counts S1Q9C - Type of Employer:  Current or most recnet job, 6 = Armed Forces") c3 = data["S1Q9C"].value_counts(sort=False) print (c3)
print("% S1Q9C - Type of Employer:  Current or most recnet job, 6 = Armed Forces") p3 = data["S1Q9C"].value_counts(sort=False, normalize=True) print (p3)
print("counts S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time") c4 = data["S1Q213"].value_counts(sort=False) print (c4)
print("% S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time") p4 = data["S1Q213"].value_counts(sort=False, normalize=True) print (p4)
print("counts S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes") c5 = data["S4AQ1"].value_counts(sort=False) print (c5)
print("% S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes") p5 = data["S4AQ1"].value_counts(sort=False, normalize=True) print (p5)
print("counts S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes") c6 = data["S4AQ2"].value_counts(sort=False) print (c6)
print("% S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes") p6 = data["S4AQ2"].value_counts(sort=False, normalize=True) print (p6)
print("counts S4AQ4A12 - Felt worthless most ofg the time for 2+ weeks, 1 = Yes") c7 = data["S4AQ4A12"].value_counts(sort=False) print (c7)
print("% S4AQ4A12 - Felt worthless most ofg the time for 2+ weeks, 1 = Yes") p7 = data["S4AQ4A12"].value_counts(sort=False, normalize=True) print (p7)
print("counts S4AQ4A17 - Thought about committing suicide, 1 = Yes") c8 = data["S4AQ4A17"].value_counts(sort=False) print (c8)
print("% S4AQ4A17 - Thought about committing suicide, 1 = Yes") p8 = data["S4AQ4A17"].value_counts(sort=False, normalize=True) print (p8)
print("counts S4AQ4A18 - Felt like wanted to die, 1 = Yes") c9 = data["S4AQ4A18"].value_counts(sort=False) print (c9)
print("% S4AQ4A18 - Felt like wanted to die, 1 = Yes") p9 = data["S4AQ4A18"].value_counts(sort=False, normalize=True) print (p9)
print("counts S4AQ51 - Felt uncomfortable or upset by low mood, 1 = Yes") c10 = data["S4AQ51"].value_counts(sort=False) print (c10)
print("% S4AQ51 - Felt uncomfortable or upset by low mood, 1 = Yes") p10 = data["S4AQ51"].value_counts(sort=False, normalize=True) print (p10)
print("counts S4AQ16 - Went to counselor/therapist/doctor to help mood, 1 = Yes") c11 = data["S4AQ16"].value_counts(sort=False) print (c11)
print("% S4AQ16 - Went to counselor/therapist/doctor to help mood, 1 = Yes") p11 = data["S4AQ16"].value_counts(sort=False, normalize=True) print (p11)
OUTPUTS
counts S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time 1      907 2     2051 3     6286 4    11305 5    22127 9      417 Name: S1Q213, dtype: int64
% S1Q213 - During past 4 weeks, how often felt downhearted or depressed, 1 - 3 = All to Some of the time 1   0.021048 2   0.047595 3   0.145871 4   0.262340 5   0.513471 9   0.009677 Name: S1Q213, dtype: float64
counts S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes 1    12785 2    29416 9      892 Name: S4AQ1, dtype: int64
% S4AQ1 - Ever had 2-week period felt sad, blue, depressed or down most of the time, 1 = Yes 1   0.296684 2   0.682617 9   0.020699 Name: S4AQ1, dtype: float64
counts S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes 1    10533 2    31618 9      942 Name: S4AQ2, dtype: int64
% S4AQ2 - Ever had 2-week period when didn't care about things, 1 = Yes 1   0.244425 2   0.733715 9   0.021860 Name: S4AQ2, dtype: float64
The output shows the count and normalized data for 3 questions from the NESARC data set.  The final number (9) in each output set shows missing data.  For S1Q213, the scale is a likert scale with 1-3 the values that are of interest.  S4AQ1 and S4AQ2 are yes/no questions.  There is not much to say about the distribution of the data until it is charted.
0 notes
phillust-blog · 8 years ago
Text
Week 1 Assignment
Data Set:  National Epidemiological Survey of Drug Use and Health
Topic of Interest:  What is the association between military service and depression in comparison to the general population?
Secondary Topic:  Does this the association change as based upon age of service member?
Second Topic variables are the same as first topic.
Literature Search:
                 Terms:  Military service and depression (2003-2010)
The literature search revealed that most efforts have tried to measure the impact of the Global War on Terror (OIF & OEF) deployments on service member mental health.  There were no direct comparisons of military service to the general US population.  Closest parallel is a 21-year study started in 2003 to understand the impact of service on mental health. (Millennium Cohort: enrollment begins a 21-year contribution to understanding the impact of military service - https://doi.org/10.1016/j.jclinepi.2006.05.009)
 Hypothesis:  There is no significant difference in depression rates between service members and the general population; as the services are a reflection of the population due to an all-volunteer force.  The lack of association may not hold across ages.
0 notes