Don't wanna be here? Send us removal request.
Text
Running an analysis of variance
Is there any difference between people in their 30′s and them suicide attempts??
import numpy as np import pandas as pdimport statsmodels.formula.api as smfimport statsmodels.stats.multicomp as multi data = pd.read_csv(‘week_1/nesarc_pds.csv’, low_memory=False)print(data.head())#setting variables you will be working with to numericdata['AGE’] = pd.to_numeric(data['AGE’], errors=“coerce”)data['S4AQ4A16’] = pd.to_numeric(data['S4AQ4A16’], errors=“coerce”)#subset data to adults that have ATTEMPTED SUICIDEsub1=data[(data['AGE’]>=30) & (data['AGE’]<=40)]#SETTING MISSING DATAsub1['AGE’]=sub1['AGE’].replace(np.nan)ct1 = sub1.groupby('AGE’).size()print (ct1)# using ols function for calculating the F-statistic and associated p valuemodel1 = smf.ols(formula='AGE ~ C(S4AQ4A16)’, data=sub1)results1 = model1.fit()print (results1.summary())sub2 = sub1[['AGE’, 'S4AQ4A16’]].dropna()print ('means for AGE by SUICIDE ATTEMPT’)m1= sub2.groupby('S4AQ4A16’).mean()print (m1)print ('standard deviations for AGE by SUICIDE ATTEMPT’)sd1 = sub2.groupby('S4AQ4A16’).std()print (sd1)#i will call it sub3sub3 = sub1[['AGE’, 'MARITAL’]].dropna()model2 = smf.ols(formula='AGE ~ C(MARITAL)’, data=sub3).fit()print (model2.summary())print ('means for AGE by SUICIDE ATTEMPT’)m2= sub3.groupby('MARITAL’).mean()print (m2)print ('standard deviations for AGE by SUICIDE ATTEMPT’)sd2 = sub3.groupby('MARITAL’).std()print (sd2)mc1 = multi.MultiComparison(sub3['AGE’], sub3['MARITAL’])res1 = mc1.tukeyhsd()print(res1.summary())
AGE 30 869 31 861 32 873 33 846 34 843 35 861 36 885 37 991 38 989 39 924 40 992 dtype: int64 OLS Regression Results ============================================================================== Dep. Variable: AGE R-squared: 0.000 Model: OLS Adj. R-squared: -0.000 Method: Least Squares F-statistic: 0.6121 Date: Wed, 25 Aug 2021 Prob (F-statistic): 0.542 Time: 16:07:36 Log-Likelihood: -8038.3 No. Observations: 3121 AIC: 1.608e+04 Df Residuals: 3118 BIC: 1.610e+04 Df Model: 2 Covariance Type: nonrobust ====================================================================================== coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------------- Intercept 35.0321 0.190 184.299 0.000 34.659 35.405 C(S4AQ4A16)[T.2.0] 0.2039 0.199 1.023 0.306 -0.187 0.595 C(S4AQ4A16)[T.9.0] 0.4942 0.754 0.655 0.512 -0.984 1.973 ============================================================================== Omnibus: 2984.560 Durbin-Watson: 2.013 Prob(Omnibus): 0.000 Jarque-Bera (JB): 197.038 Skew: -0.099 Prob(JB): 1.64e-43 Kurtosis: 1.785 Cond. No. 18.1 ==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for AGE by SUICIDE ATTEMPT AGE S4AQ4A16 1.0 35.032143 2.0 35.236003 9.0 35.526316 standard deviations for AGE by SUICIDE ATTEMPT AGE S4AQ4A16 1.0 3.114719 2.0 3.187001 9.0 3.203616
The p. value is equals to 0.542, so we accept the n OLS Regression Results ============================================================================== Dep. Variable: AGE R-squared: 0.017 Model: OLS Adj. R-squared: 0.016 Method: Least Squares F-statistic: 33.33 Date: Wed, 25 Aug 2021 Prob (F-statistic): 7.45e-34 Time: 16:07:36 Log-Likelihood: -25516. No. Observations: 9934 AIC: 5.104e+04 Df Residuals: 9928 BIC: 5.109e+04 Df Model: 5 Covariance Type: nonrobust =================================================================================== coef std err t P>|t| [0.025 0.975] ----------------------------------------------------------------------------------- Intercept 35.2382 0.042 834.773 0.000 35.155 35.321 C(MARITAL)[T.2] -0.6069 0.164 -3.695 0.000 -0.929 -0.285 C(MARITAL)[T.3] 0.9449 0.377 2.505 0.012 0.206 1.684 C(MARITAL)[T.4] 0.7070 0.102 6.957 0.000 0.508 0.906 C(MARITAL)[T.5] -0.0567 0.151 -0.376 0.707 -0.353 0.239 C(MARITAL)[T.6] -0.6478 0.079 -8.189 0.000 -0.803 -0.493 ============================================================================== Omnibus: 8112.149 Durbin-Watson: 1.981 Prob(Omnibus): 0.000 Jarque-Bera (JB): 596.251 Skew: -0.066 Prob(JB): 3.35e-130 Kurtosis: 1.807 Cond. No. 12.4 ==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for AGE by SUICIDE ATTEMPT AGE MARITAL 1 35.238163 2 34.631313 3 36.183099 4 35.945159 5 35.181435 6 34.590399 standard deviations for AGE by SUICIDE ATTEMPT AGE MARITAL 1 3.145777 2 3.112316 3 3.352227 4 3.065115 5 3.218734 6 3.224892 Multiple Comparison of Means - Tukey HSD, FWER=0.05 ==================================================== group1 group2 meandiff p-adj lower upper reject ---------------------------------------------------- 1 2 -0.6069 0.003 -1.0749 -0.1388 True 1 3 0.9449 0.1228 -0.1301 2.0199 False 1 4 0.707 0.001 0.4173 0.9967 True 1 5 -0.0567 0.9 -0.4873 0.3739 False 1 6 -0.6478 0.001 -0.8732 -0.4223 True 2 3 1.5518 0.0019 0.3917 2.7119 True 2 4 1.3138 0.001 0.7904 1.8373 True 2 5 0.5501 0.1078 -0.0627 1.1629 False 2 6 -0.0409 0.9 -0.5318 0.45 False 3 4 -0.2379 0.9 -1.3382 0.8623 False 3 5 -1.0017 0.126 -2.1471 0.1438 False 3 6 -1.5927 0.001 -2.6778 -0.5076 True 4 5 -0.7637 0.001 -1.254 -0.2735 True 4 6 -1.3548 0.001 -1.68 -1.0295 True 5 6 -0.591 0.003 -1.0463 -0.1358 True ----------------------------------------------------
We accept the null hypothesis because the means are statistically equal and no association between the age differences between 30′s and the suicide attempts, the p. value is 0.542.
But there is difference between marital status, and age when people attempt suicide, with a p. value of 7.45e-34. We reject the null hypothesis, because ages between suicidal attempts are different, differencies are explained in Multiple Comparison of Means - Tukey
1 note
·
View note