macluriv
macluriv
Untitled
1 post
Don't wanna be here? Send us removal request.
macluriv · 4 years ago
Text
Running an analysis of variance
Is there any difference between people in their 30′s and them suicide attempts??
import numpy as np import pandas as pdimport statsmodels.formula.api as smfimport statsmodels.stats.multicomp as multi data = pd.read_csv(‘week_1/nesarc_pds.csv’, low_memory=False)print(data.head())#setting variables you will be working with to numericdata['AGE’] = pd.to_numeric(data['AGE’], errors=“coerce”)data['S4AQ4A16’] = pd.to_numeric(data['S4AQ4A16’], errors=“coerce”)#subset data to adults that have ATTEMPTED SUICIDEsub1=data[(data['AGE’]>=30) & (data['AGE’]<=40)]#SETTING MISSING DATAsub1['AGE’]=sub1['AGE’].replace(np.nan)ct1 = sub1.groupby('AGE’).size()print (ct1)# using ols function for calculating the F-statistic and associated p valuemodel1 = smf.ols(formula='AGE ~ C(S4AQ4A16)’, data=sub1)results1 = model1.fit()print (results1.summary())sub2 = sub1[['AGE’, 'S4AQ4A16’]].dropna()print ('means for AGE by SUICIDE ATTEMPT’)m1= sub2.groupby('S4AQ4A16’).mean()print (m1)print ('standard deviations for AGE by SUICIDE ATTEMPT’)sd1 = sub2.groupby('S4AQ4A16’).std()print (sd1)#i will call it sub3sub3 = sub1[['AGE’, 'MARITAL’]].dropna()model2 = smf.ols(formula='AGE ~ C(MARITAL)’, data=sub3).fit()print (model2.summary())print ('means for AGE by SUICIDE ATTEMPT’)m2= sub3.groupby('MARITAL’).mean()print (m2)print ('standard deviations for AGE by SUICIDE ATTEMPT’)sd2 = sub3.groupby('MARITAL’).std()print (sd2)mc1 = multi.MultiComparison(sub3['AGE’], sub3['MARITAL’])res1 = mc1.tukeyhsd()print(res1.summary())
AGE 30    869 31    861 32    873 33    846 34    843 35    861 36    885 37    991 38    989 39    924 40    992 dtype: int64                            OLS Regression Results                             ============================================================================== Dep. Variable:                    AGE   R-squared:                       0.000 Model:                            OLS   Adj. R-squared:                 -0.000 Method:                 Least Squares   F-statistic:                    0.6121 Date:                Wed, 25 Aug 2021   Prob (F-statistic):              0.542 Time:                        16:07:36   Log-Likelihood:                -8038.3 No. Observations:                3121   AIC:                         1.608e+04 Df Residuals:                    3118   BIC:                         1.610e+04 Df Model:                           2                                         Covariance Type:            nonrobust                                         ======================================================================================                         coef    std err          t      P>|t|      [0.025      0.975] -------------------------------------------------------------------------------------- Intercept             35.0321      0.190    184.299      0.000      34.659      35.405 C(S4AQ4A16)[T.2.0]     0.2039      0.199      1.023      0.306      -0.187       0.595 C(S4AQ4A16)[T.9.0]     0.4942      0.754      0.655      0.512      -0.984       1.973 ============================================================================== Omnibus:                     2984.560   Durbin-Watson:                   2.013 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              197.038 Skew:                          -0.099   Prob(JB):                     1.64e-43 Kurtosis:                       1.785   Cond. No.                         18.1 ==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for AGE by SUICIDE ATTEMPT                AGE S4AQ4A16           1.0       35.032143 2.0       35.236003 9.0       35.526316 standard deviations for AGE by SUICIDE ATTEMPT               AGE S4AQ4A16           1.0       3.114719 2.0       3.187001 9.0       3.203616
The p. value is equals to 0.542, so we accept the n                            OLS Regression Results                             ============================================================================== Dep. Variable:                    AGE   R-squared:                       0.017 Model:                            OLS   Adj. R-squared:                  0.016 Method:                 Least Squares   F-statistic:                     33.33 Date:                Wed, 25 Aug 2021   Prob (F-statistic):           7.45e-34 Time:                        16:07:36   Log-Likelihood:                -25516. No. Observations:                9934   AIC:                         5.104e+04 Df Residuals:                    9928   BIC:                         5.109e+04 Df Model:                           5                                         Covariance Type:            nonrobust                                         ===================================================================================                      coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------- Intercept          35.2382      0.042    834.773      0.000      35.155      35.321 C(MARITAL)[T.2]    -0.6069      0.164     -3.695      0.000      -0.929      -0.285 C(MARITAL)[T.3]     0.9449      0.377      2.505      0.012       0.206       1.684 C(MARITAL)[T.4]     0.7070      0.102      6.957      0.000       0.508       0.906 C(MARITAL)[T.5]    -0.0567      0.151     -0.376      0.707      -0.353       0.239 C(MARITAL)[T.6]    -0.6478      0.079     -8.189      0.000      -0.803      -0.493 ============================================================================== Omnibus:                     8112.149   Durbin-Watson:                   1.981 Prob(Omnibus):                  0.000   Jarque-Bera (JB):              596.251 Skew:                          -0.066   Prob(JB):                    3.35e-130 Kurtosis:                       1.807   Cond. No.                         12.4 ==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for AGE by SUICIDE ATTEMPT               AGE MARITAL           1        35.238163 2        34.631313 3        36.183099 4        35.945159 5        35.181435 6        34.590399 standard deviations for AGE by SUICIDE ATTEMPT              AGE MARITAL           1        3.145777 2        3.112316 3        3.352227 4        3.065115 5        3.218734 6        3.224892 Multiple Comparison of Means - Tukey HSD, FWER=0.05 ==================================================== group1 group2 meandiff p-adj   lower   upper  reject ----------------------------------------------------     1      2  -0.6069  0.003 -1.0749 -0.1388   True     1      3   0.9449 0.1228 -0.1301  2.0199  False     1      4    0.707  0.001  0.4173  0.9967   True     1      5  -0.0567    0.9 -0.4873  0.3739  False     1      6  -0.6478  0.001 -0.8732 -0.4223   True     2      3   1.5518 0.0019  0.3917  2.7119   True     2      4   1.3138  0.001  0.7904  1.8373   True     2      5   0.5501 0.1078 -0.0627  1.1629  False     2      6  -0.0409    0.9 -0.5318    0.45  False     3      4  -0.2379    0.9 -1.3382  0.8623  False     3      5  -1.0017  0.126 -2.1471  0.1438  False     3      6  -1.5927  0.001 -2.6778 -0.5076   True     4      5  -0.7637  0.001  -1.254 -0.2735   True     4      6  -1.3548  0.001   -1.68 -1.0295   True     5      6   -0.591  0.003 -1.0463 -0.1358   True ----------------------------------------------------
We accept the null hypothesis because the means are statistically equal and no association between the age differences between 30′s and the suicide attempts, the p. value is 0.542.
But there is difference between marital status, and age when people attempt suicide, with a p. value of 7.45e-34. We reject the null hypothesis, because ages between suicidal attempts are different, differencies are explained in Multiple Comparison of Means - Tukey
1 note · View note