Don't wanna be here? Send us removal request.
Text
Exploring Statistical Interactions
Table of TAB12MDX by USQUAN
TAB12MDX(Tobacco Dependence Past 12 Months)
USQUAN
Frequency Percent Row Pct Col Pct
03
Total
0630 50.00 91.84 52.72
56 4.44 8.16 86.15
686 54.44
1
565 44.84 98.43 47.28
9 0.71 1.57 13.85
574 45.56
Total
1195 94.84
65 5.16
1260 100.00
Statistics for Table of TAB12MDX by USQUAN
Statistic
DF
Value
Prob
Chi-Square
1
27.7842
<.0001
Likelihood Ratio Chi-Square
1
31.3968
<.0001
Continuity Adj. Chi-Square
1
26.4525
<.0001
Mantel-Haenszel Chi-Square
1
27.7621
<.0001
Phi Coefficient
-0.1485
Contingency Coefficient
0.1469
Cramer's V
-0.1485
Fisher's Exact Test
Cell (1,1) Frequency (F)
630
Left-sided Pr <= F
<.0001
Right-sided Pr >= F
1.0000
Table Probability (P)
<.0001
Two-sided Pr <= P
<.0001
Sample Size = 1260
Table of TAB12MDX by USQUAN
TAB12MDX(Tobacco Dependence Past 12 Months)
USQUAN
Frequency Percent Row Pct Col Pct
03
Total
0110 24.66 88.71 25.70
14 3.14 11.29 77.78
124 27.80
1
318 71.30 98.76 74.30
4 0.90 1.24 22.22
322 72.20
Total
428 95.96
18 4.04
446 100.00
Statistics for Table of TAB12MDX by USQUAN
Statistic
DF
Value
Prob
Chi-Square
1
23.3380
<.0001
Likelihood Ratio Chi-Square
1
20.3350
<.0001
Continuity Adj. Chi-Square
1
20.8157
<.0001
Mantel-Haenszel Chi-Square
1
23.2856
<.0001
Phi Coefficient
-0.2288
Contingency Coefficient
0.2230
Cramer's V
-0.2288
Fisher's Exact Test
Cell (1,1) Frequency (F)
110
Left-sided Pr <= F
<.0001
Right-sided Pr >= F
1.0000
Table Probability (P)
<.0001
Two-sided Pr <= P
<.0001
Sample Size = 446
2 Variables:
urbanrate internetuserate
Simple Statistics
Variable
N
Mean
Std Dev
Sum
Minimum
Maximum
Label
urbanrate
47
33.50553
12.83453
1575
10.40000
66.60000
urbanrate
internetuserate
45
8.22063
8.70764
369.92856
0.21007
40.12223
INTERNETUSERATE
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations
urbanrate
internetuserate
urbanrate urbanrate
1.00000 47
0.11558 0.4496 45
internetuserate INTERNETUSERATE
0.11558 0.4496 45
1.00000 45
2 Variables:
urbanrate internetuserate
Simple Statistics
Variable
N
Mean
Std Dev
Sum
Minimum
Maximum
Label
urbanrate
94
56.58000
18.64588
5319
12.54000
93.32000
urbanrate
internetuserate
92
30.76527
19.76304
2830
1.28005
79.88978
INTERNETUSERATE
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations
urbanrate
internetuserate
urbanrate urbanrate
1.00000 94
0.32516 0.0017 91
internetuserate INTERNETUSERATE
0.32516 0.0017 91
1.00000 92
2 Variables:
urbanrate internetuserate
Simple Statistics
Variable
N
Mean
Std Dev
Sum
Minimum
Maximum
Label
urbanrate
48
78.20417
19.82159
3754
13.22000
100.00000
urbanrate
internetuserate
46
70.58821
15.94746
3247
36.00033
95.63811
INTERNETUSERATE
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations
urbanrate
internetuserate
urbanrate urbanrate
1.00000 48
0.07514 0.6197 46
internetuserate INTERNETUSERATE
0.07514 0.6197 46
1.00000 46
Now, let's evaluate third variables as
potential moderators in the context of chi-squared test of independence. For this, we're gonna return to our
original SAS program using the NESARC data and asking the question, is smoking
associated with nicotine dependence? We're going to create another
smoking variable for this purpose, reflecting how many cigarettes each
young adult smoker smokes per day. 0 will indicate non-daily smokers. 3 indicates those smoking
1 to 5 cigarettes per day. 8 indicates 6 to 10 cigarettes per day. 13 indicates 11 to 15 cigarettes per day. 18 indicates 16 to 20 cigarettes per day,
and 37 indicates greater than
20 cigarettes per day.Categorical explanatory variable
USQUAN/discrete tells SAS that we want levels of our categorical explanatory
variable to be represented on the x-axis.
TAB12MDX to be displayed as a mean on the y-axis. And this gives us the graphic
representation of this positive linear relationship. As smoking quantity increases, so does the proportion of individuals
with nicotine dependence. This finding is accurate with regard
to the larger population of young adult smokers. Though might a third variable
moderate the relationship between smoking quantity and nicotine? Put another way,
might there be a statistical interaction between a third
variable in smoking behavior and predicting our response variable,
nicotine dependence? We're going to evaluate major depressive
disorder as the third variable. Our question will be, does Major
Depression affect either the strength or the direction of the relationship
between smoking and nicotine dependence? Put another way, might a third
variable moderate the relationship between smoking and nicotine dependence? Is smoking related to nicotine dependence
for each level of this third variable, that is, for those with major depression
and those without major depression? Similar to our anova example, syntax to be added to the PROC FREQ
code is circled here in red. We need to first sort the data, according
to the categorical third variable, then include a bistatement,
telling SAS to run a chi-square for each level of the third
variable separately. The specific syntax for this example is shown here, PROC SORT; BY MAJORDEPLIFE;
PROC FREQ; TABLES TAB12MDX*USQUAN/CHISQ; BY MAJORDEPLIFE;. When this syntax is added to the SAS
program, here are the results. You can see the cross tabs or
cross tabulation table, looking at usual quantity by tobacco
dependence in the past 12 months. First, for major depression equal to 0,
which is those without major depression, the chi-square value is large and
the P-value is quite small. In addition, the column percents
reveal what seems to be a positive linear relationship with percentages
of nicotine dependency increasing between lower levels of smoking and
higher levels of smoking. So we can say that this is a statistically
significant relationship for those without major depression. For those with major depression,
we find a large chi-square value and small P value,
which is statistically significant. These column percents also reveal what
seems to be a positive linear relationship with percentages of nicotine
dependence increasing between lower levels of smoking and
higher levels of smoking. Using a line graph to examine the rates of
nicotine dependence by different levels of smoking, it seems that
both the direction and size of the relationship is
similar between smoking and nicotine dependence for those with
major depression and for those without. Although, those with major depression show
higher rates of nicotine dependence at every level of smoking quantity. In this case, we would say a diagnosis
of major depression does not moderate the relationship between smoking and
nicotine dependence. For both young adult smokers with major
depression and those without, higher levels of smoking behavior is associated
with higher rates of nicotine dependence.
0 notes
Text
3 Variables:
urbanrate incomeperperson internetuserate
Simple Statistics
Variable
N
Mean
Std Dev
Sum
Minimum
Maximum
Label
urbanrate
203
56.76936
23.84493
11524
10.40000
100.00000
urbanrate
incomeperperson
190
8741
14263
1660784
103.77586
105147
INCOMEPERPERSON
internetuserate
192
35.63272
27.78028
6841
0.21007
95.63811
INTERNETUSERATE
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations
urbanrate
incomeperperson
internetuserate
urbanrate urbanrate
1.00000 203
0.49009 <.0001 189
0.61395 <.0001 190
incomeperperson INCOMEPERPERSON
0.49009 <.0001 189
1.00000 190
0.75094 <.0001 183
internetuserate INTERNETUSERATE
0.61395 <.0001 190
0.75094 <.0001 183
1.00000 192
We're going to be looking at two different correlations, that between Internet use and urban rate, and between Internet use and income per person. We can actually list these together.
To locate the correlation coefficients of interest and the associated p values, we need to examine the Pearson Correlation Coefficient table here, and find the row and column where our two variables of interest intersect.
For the association between urbanrate and internetuserate, the correlation coefficient is approximately 0.61 with a p-value of 0.0001. This tells us that the relationship is statistically significant.
For the association between incomeperperson and internetuserate, the correlation coefficient is approximately 0.75 and also has a significant p-value.Now we can actually interpret the scatter plots and the coefficients together.
The association between internetuserate and income is fairly strong and it's also positive, as the scatter plot had already shown us. The association between internetuserate and urbanrate is also positive but slightly more modest at 0.61. Both are statistically significant. That is, for both associations, it's highly unlikely that a relationship of this magnitude would be due to chance alone.
Here's some good news. Post hoc tests are not necessary when conducting Pearson correlation.Post hoc tests are needed only when your research question includes a categorical explanatory variable with more than two levels. Because our explanatory variable and the context of correlation coefficient is quantitative, there's never a need to perform a post hoc test.Another interesting and useful aspect of the correlation coefficient is if we square the correlation coefficient. That is, we multiply it by itself, we get a value that also helps our understanding of the association between the two quantitative variables.
Small r squared is the fraction of the variability of one variable that can be predicted by the other. For example, when looking at the relationship between urban rate and Internet use rate, if we square our correlation coefficient of 0.61, we get 0.37. This could be interpreted the following way. If we know the urban rate, we can predict 37% of the variability we will see in the rate of Internet use. Of course, that also means that 63% of the variability is unaccounted for.
If we square the correlation coefficient for income per person and Internet use rate we get a value of 0.56. This suggests, if we know income per person, we can predict 56% of the variability we'll see in the rate of Internet use.
This is a little bit more impressive, because we can predict over half the variability.
Again, correlation coefficients are commonly denoted with a lowercase r, and they're squared to determine the amount of variability that can be predicted. [MUSIC] You might be wondering how much variability in Internet use rates can be predicted if we consider both urban rate and income per person. A multivariate inferential tool called multiple regression can be used to answer this question and we'll discuss that in the future.
0 notes
Text
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
1
2.5
5
14
22
30
Total
064 3.76 7.93 90.14
53 3.11 6.57 81.54
69 4.05 8.55 78.41
59 3.46 7.31 64.84
41 2.41 5.08 60.29
521 30.59 64.56 39.47
807 47.39
1
7 0.41 0.78 9.86
12 0.70 1.34 18.46
19 1.12 2.12 21.59
32 1.88 3.57 35.16
27 1.59 3.01 39.71
799 46.92 89.17 60.53
896 52.61
Total
71 4.17
65 3.82
88 5.17
91 5.34
68 3.99
1320 77.51
1703 100.00
Frequency Missing = 3
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
5
165.2732
<.0001
Likelihood Ratio Chi-Square
5
176.1834
<.0001
Mantel-Haenszel Chi-Square
1
162.8952
<.0001
Phi Coefficient
0.3115
Contingency Coefficient
0.2974
Cramer's V
0.3115
Sample Size = 1703 Frequency Missing = 3
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
1
2.5
Total
064 47.06 54.70 90.14
53 38.97 45.30 81.54
117 86.03
1
7 5.15 36.84 9.86
12 8.82 63.16 18.46
19 13.97
Total
71 52.21
65 47.79
136 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
2.0893
0.1483
Likelihood Ratio Chi-Square
1
2.1023
0.1471
Continuity Adj. Chi-Square
1
1.4349
0.2310
Mantel-Haenszel Chi-Square
1
2.0740
0.1498
Phi Coefficient
0.1239
Contingency Coefficient
0.1230
Cramer's V
0.1239
Fisher's Exact Test
Cell (1,1) Frequency (F)
64
Left-sided Pr <= F
0.9553
Right-sided Pr >= F
0.1154
Table Probability (P)
0.0707
Two-sided Pr <= P
0.2153
Sample Size = 136
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
1
Total
064 90.14 100.00 90.14
64 90.14
1
7 9.86 100.00 9.86
7 9.86
Total
71 100.00
71 100.00
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
1
14
Total
064 39.51 52.03 90.14
59 36.42 47.97 64.84
123 75.93
1
7 4.32 17.95 9.86
32 19.75 82.05 35.16
39 24.07
Total
71 43.83
91 56.17
162 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
13.9727
0.0002
Likelihood Ratio Chi-Square
1
15.0854
0.0001
Continuity Adj. Chi-Square
1
12.6226
0.0004
Mantel-Haenszel Chi-Square
1
13.8865
0.0002
Phi Coefficient
0.2937
Contingency Coefficient
0.2818
Cramer's V
0.2937
Fisher's Exact Test
Cell (1,1) Frequency (F)
64
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
0.0001
Table Probability (P)
<.0001
Two-sided Pr <= P
0.0002
Sample Size = 162
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
1
22
Total
064 46.04 60.95 90.14
41 29.50 39.05 60.29
105 75.54
1
7 5.04 20.59 9.86
27 19.42 79.41 39.71
34 24.46
Total
71 51.08
68 48.92
139 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
16.7459
<.0001
Likelihood Ratio Chi-Square
1
17.5739
<.0001
Continuity Adj. Chi-Square
1
15.1695
<.0001
Mantel-Haenszel Chi-Square
1
16.6254
<.0001
Phi Coefficient
0.3471
Contingency Coefficient
0.3279
Cramer's V
0.3471
Fisher's Exact Test
Cell (1,1) Frequency (F)
64
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
<.0001
Table Probability (P)
<.0001
Two-sided Pr <= P
<.0001
Sample Size = 139
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
1
30
Total
064 4.60 10.94 90.14
521 37.46 89.06 39.47
585 42.06
1
7 0.50 0.87 9.86
799 57.44 99.13 60.53
806 57.94
Total
71 5.10
1320 94.90
1391 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
70.9888
<.0001
Likelihood Ratio Chi-Square
1
76.4339
<.0001
Continuity Adj. Chi-Square
1
68.9247
<.0001
Mantel-Haenszel Chi-Square
1
70.9378
<.0001
Phi Coefficient
0.2259
Contingency Coefficient
0.2204
Cramer's V
0.2259
Fisher's Exact Test
Cell (1,1) Frequency (F)
64
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
<.0001
Table Probability (P)
<.0001
Two-sided Pr <= P
<.0001
Sample Size = 1391
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
2.5
Total
053 81.54 100.00 81.54
53 81.54
1
12 18.46 100.00 18.46
12 18.46
Total
65 100.00
65 100.00
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
2.5
14
Total
053 33.97 47.32 81.54
59 37.82 52.68 64.84
112 71.79
1
12 7.69 27.27 18.46
32 20.51 72.73 35.16
44 28.21
Total
65 41.67
91 58.33
156 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
5.2241
0.0223
Likelihood Ratio Chi-Square
1
5.4011
0.0201
Continuity Adj. Chi-Square
1
4.4318
0.0353
Mantel-Haenszel Chi-Square
1
5.1906
0.0227
Phi Coefficient
0.1830
Contingency Coefficient
0.1800
Cramer's V
0.1830
Fisher's Exact Test
Cell (1,1) Frequency (F)
53
Left-sided Pr <= F
0.9939
Right-sided Pr >= F
0.0166
Table Probability (P)
0.0105
Two-sided Pr <= P
0.0299
Sample Size = 156
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
2.5
22
Total
053 39.85 56.38 81.54
41 30.83 43.62 60.29
94 70.68
1
12 9.02 30.77 18.46
27 20.30 69.23 39.71
39 29.32
Total
65 48.87
68 51.13
133 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
7.2372
0.0071
Likelihood Ratio Chi-Square
1
7.3891
0.0066
Continuity Adj. Chi-Square
1
6.2484
0.0124
Mantel-Haenszel Chi-Square
1
7.1827
0.0074
Phi Coefficient
0.2333
Contingency Coefficient
0.2272
Cramer's V
0.2333
Fisher's Exact Test
Cell (1,1) Frequency (F)
53
Left-sided Pr <= F
0.9982
Right-sided Pr >= F
0.0059
Table Probability (P)
0.0041
Two-sided Pr <= P
0.0080
Sample Size = 133
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
2.5
30
Total
053 3.83 9.23 81.54
521 37.62 90.77 39.47
574 41.44
1
12 0.87 1.48 18.46
799 57.69 98.52 60.53
811 58.56
Total
65 4.69
1320 95.31
1385 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
45.1777
<.0001
Likelihood Ratio Chi-Square
1
46.1611
<.0001
Continuity Adj. Chi-Square
1
43.4608
<.0001
Mantel-Haenszel Chi-Square
1
45.1451
<.0001
Phi Coefficient
0.1806
Contingency Coefficient
0.1777
Cramer's V
0.1806
Fisher's Exact Test
Cell (1,1) Frequency (F)
53
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
<.0001
Table Probability (P)
<.0001
Two-sided Pr <= P
<.0001
Sample Size = 1385
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
14
Total
059 64.84 100.00 64.84
59 64.84
1
32 35.16 100.00 35.16
32 35.16
Total
91 100.00
91 100.00
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
22
Total
041 60.29 100.00 60.29
41 60.29
1
27 39.71 100.00 39.71
27 39.71
Total
68 100.00
68 100.00
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
30
Total
0521 39.47 100.00 39.47
521 39.47
1
799 60.53 100.00 60.53
799 60.53
Total
1320 100.00
1320 100.00
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
14
22
Total
059 37.11 59.00 64.84
41 25.79 41.00 60.29
100 62.89
1
32 20.13 54.24 35.16
27 16.98 45.76 39.71
59 37.11
Total
91 57.23
68 42.77
159 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
0.3439
0.5576
Likelihood Ratio Chi-Square
1
0.3432
0.5580
Continuity Adj. Chi-Square
1
0.1768
0.6741
Mantel-Haenszel Chi-Square
1
0.3417
0.5588
Phi Coefficient
0.0465
Contingency Coefficient
0.0465
Cramer's V
0.0465
Fisher's Exact Test
Cell (1,1) Frequency (F)
59
Left-sided Pr <= F
0.7743
Right-sided Pr >= F
0.3365
Table Probability (P)
0.1108
Two-sided Pr <= P
0.6198
Sample Size = 159
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
14
30
Total
059 4.18 10.17 64.84
521 36.92 89.83 39.47
580 41.11
1
32 2.27 3.85 35.16
799 56.63 96.15 60.53
831 58.89
Total
91 6.45
1320 93.55
1411 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
22.6255
<.0001
Likelihood Ratio Chi-Square
1
22.2336
<.0001
Continuity Adj. Chi-Square
1
21.5899
<.0001
Mantel-Haenszel Chi-Square
1
22.6095
<.0001
Phi Coefficient
0.1266
Contingency Coefficient
0.1256
Cramer's V
0.1266
Fisher's Exact Test
Cell (1,1) Frequency (F)
59
Left-sided Pr <= F
1.0000
Right-sided Pr >= F
<.0001
Table Probability (P)
<.0001
Two-sided Pr <= P
<.0001
Sample Size = 1411
Table of TAB12MDX by USFREQMO
TAB12MDX(Tobacco Dependence Past 12 Months)
USFREQMO
Frequency Percent Row Pct Col Pct
22
30
Total
041 2.95 7.30 60.29
521 37.54 92.70 39.47
562 40.49
1
27 1.95 3.27 39.71
799 57.56 96.73 60.53
826 59.51
Total
68 4.90
1320 95.10
1388 100.00
Statistics for Table of TAB12MDX by USFREQMO
Statistic
DF
Value
Prob
Chi-Square
1
11.6386
0.0006
Likelihood Ratio Chi-Square
1
11.3718
0.0007
Continuity Adj. Chi-Square
1
10.7904
0.0010
Mantel-Haenszel Chi-Square
1
11.6302
0.0006
Phi Coefficient
0.0916
Contingency Coefficient
0.0912
Cramer's V
0.0916
Fisher's Exact Test
Cell (1,1) Frequency (F)
41
Left-sided Pr <= F
0.9998
Right-sided Pr >= F
0.0006
Table Probability (P)
0.0003
Two-sided Pr <= P
0.0009
Sample Size = 1388
To determine which groups are different from the others, we will again, need to perform a post hoc test. By conducting post hoc comparisons between pairs of rates, in a way that avoids excessive type one error. In other words, avoids rejecting the null hypothesis, when the null hypothesis is true. We will be much better able to appropriately describe which population rates are different from the others.
If we reject the null hypothesis, we need to perform comparisons for each pair of nicotine dependent's rates across the six smoking frequency categories. In the case of 6 groups, we actually need to perform 15 pairwise comparisons.
The goal of using the Bonferroni Adjustment is to control a family-wise error rates, also known as the maximum overall type 1 error rate. So, that we can evaluate which pairs of nicotine dependents rate are different from one another.Briefly, the process would be to conduct each of the 15 paired comparisons. But rather than evaluating significance at the p .05 level, we would adjust the p value to make it more difficult to reject the null hypothesis. The adjusted p value is calculated by dividing p .05 by the number of comparisons that we plan to make. So, if we make 3 comparisons, we would only reject null hypothesis if the p value were .017 or less. For the 15 paired comparisons that we plan to make to better understand the association between smoking frequency and nicotine dependence, our adjusted p value is .003. Adjusting the p value is definitely the easy part of the process. Now, for the more challenging piece. For the actual post hoc testing, we need to run a Chi-Square test for each of the 15 paired comparisons. To do this, I can add syntax to my program at the end of the data step just before the PROC SORT statement. Where I choose two smoking frequency groups at a time. So, I'm going to start by comparing my usual smoking frequency per month group 1, and my usual smoking frequency per month group 2.5. If I save and run this program, I get a new Chi-Square table that includes only those two frequency groups by the presence or absence of nicotine dependence.
Again, I wanna focus here on the column percentages. 9.86 and 18.46, are these two rates significantly different from one another? If I look down at my Chi-Square value and probability, I can see that they aren't.
So, I want to accept the null hypothesis, since this probability value is not only not less than 0.05. It is definitely not less than my Bonferroni Adjusted p value of .003. This is just the first step at our post hoc analysis. Now, we need to run two by two Chi-Squares for each of the remaining 14 paired comparisons.To do this, we could continue one comparison at a time, choosing two frequency groups. By adding syntax to the program, at the end of the data step, and just before the PROC SORT statement. Use the syntax requesting a Chi-Square analysis, comparing those smoking one day per month and those smoking six days per month.
For our code, we can see that only smoking frequency groups equal to 1 and equal to 6 are included in the Chi-Square table and analysis.
The nicotine dependence rates are 9.86 and 21.59. The p value associated with the Chi-Square statistic is .0468. Initially, we might wanna say that this is a significant finding because it's less than a p value of .05. Remember though, that the adjusted p value for these comparison is .003. So, this is not significant.
Going back to the graph of nicotine dependence rates, we now know that frequency group equal to 1 and equal to 6, do not have significantly different rates of nicotine dependence.
This can get pretty tedious and of course, it would be very easy to get a big confused and to skip one or more paired comparisons. Let's look at how to use syntax that you already know to this systematically. And to get results for all of the paired comparisons simultaneously.
First, we'll get rid of the lines of syntax that subsets our data to specific smoking frequency groups. Now, we'll create a series of new data sets in associated Chi-Square tables that call in the original working data set. And select each of the various combination of smoking frequency groups. We've just added quite a bit of syntax to this program.
You'll see however, that it's repetitive, and that in each group of syntax, only the logic statements that select the specific frequency groups and the working data set name need to change.
Following the data step and the request for output that we've been working with. We're going to add a new data step, and call it COMPARISON1.
We set the original working data set, which was called NEW, above. And then subset the data to the two smoking frequency categories. Smoking frequency category 1, and 2.5. We end this data step by sorting by the unique identifier, and requesting a Chi-Square analysis with a PROC FREQ procedure. Then we end this new syntax group with a RUN statement.
Next, we repeat those same lines of syntax for each of the remaining 14 paired comparisons. Changing only the name of the new working data set, and the selected smoking frequency groups.
Here, we call the DATA COMPARISON2, and select smoking frequency group 1, compared to smoking frequency group 6. Here, the data is COMPARISON3. And we're comparing smoking frequency group 1 to smoking group 14. COMPARISON4 is comparing smoking frequency group 1, and smoking frequency group 22. We see these lines of syntax repeated with new data names, and the additional paired comparisons for the smoking frequency groups. All the way down to COMPARISON15, smoking frequency group 22 versus smoking frequency group 30. Well, we run this program, the results include the overall Chi-Squared table. That is, the sixth level smoking frequency variable by the nicotine dependence response variable. And then Chi-Square tables for each of the paired comparisons. 1 versus 2.5, 1 versus 6, 1 versus 14, 1 versus 22, 1 versus 30, 2.5 versus 6, and so on.
The goal is to examine the p value for each of the paired comparisons. And to use the adjusted Bonferroni p value of .003 to evaluate significance. Here, we've created a table that shows the p values of each of the paired comparisons from the output. Obviously, there are several that are less that p is .05. Here are the p values that are less than .003. As we can see, smoking frequency group 30 that is, those who smoke 30 days in a usual month, is significantly different from each of the other smoke frequency levels. In addition, smoking frequency group 1 has significantly different nicotine dependence rates, than smoke frequency groups 14, and 22.
0 notes
Text
Class Level Information
Class
Levels
Values
MAJORDEPLIFE
2
0 1
Number of Observations Read
1706
Number of Observations Used
1697
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
266974.1
266974.1
3.55
0.0597
Error
1695
127468189.9
75202.5
Corrected Total
1696
127735164.0
R-Square
Coeff Var
Root MSE
NUMCIGMO_EST Mean
0.002090
85.61566
274.2307
320.3044
Source
DF
Anova SS
Mean Square
F Value
Pr > F
MAJORDEPLIFE
1
266974.1235
266974.1235
3.55
0.0597
Level of MAJORDEPLIFE
N
NUMCIGMO_EST
Mean
Std Dev
01253
312.837989
269.002344
1
444
341.375000
288.495118
Class Level Information
Class
Levels
Values
ETHRACE2A
5
1 2 3 4 5
Number of Observations Read
1706
Number of Observations Used
1697
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
4
6965167.9
1741292.0
24.40
<.0001
Error
1692
120769996.1
71377.1
Corrected Total
1696
127735164.0
R-Square
Coeff Var
Root MSE
NUMCIGMO_EST Mean
0.054528
83.40969
267.1649
320.3044
Source
DF
Anova SS
Mean Square
F Value
Pr > F
ETHRACE2A
4
6965167.919
1741291.980
24.40
<.0001
Note:
This test controls the Type I comparisonwise error rate, not the experimentwise error rate.
Alpha
0.05
Error Degrees of Freedom
1692
Error Mean Square
71377.07
Harmonic Mean of Cell Sizes
100.4731
Note:
Cell sizes are not equal.
Number of Means
2
3
4
5
Critical Range
73.93
77.84
80.46
82.39
Class Level Information
Class
Levels
Values
MAJORDEPLIFE
2
0 1
Number of Observations Read
1706
Number of Observations Used
1697
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
266974.1
266974.1
3.55
0.0597
Error
1695
127468189.9
75202.5
Corrected Total
1696
127735164.0
R-Square
Coeff Var
Root MSE
NUMCIGMO_EST Mean
0.002090
85.61566
274.2307
320.3044
Source
DF
Anova SS
Mean Square
F Value
Pr > F
MAJORDEPLIFE
1
266974.1235
266974.1235
3.55
0.0597
Level of MAJORDEPLIFE
N
NUMCIGMO_EST
Mean
Std Dev
01253
312.837989
269.002344
1
444
341.375000
288.495118
3:39: >> Proc ANOVA first displays a table that includes the following. The name of the variable in the class statement. The number of different values or levels of the class variable. The value of the class variable. And the number of observations in the data set and the number of observations excluded from the analysis because of missing data,
3:39: So here we see our categorical explanatory variable, MAJORDEPLIFE, as two levels.
3:47: And the values are 0 and 1. Of the 1706 observations, 1697 were included in the analysis.
Proc ANOVA then displays an analysis of variance table for the response variable, also known as the dependent variable from the MODEL statement. >> In this case, our response or dependent variable was NUMCIGMO_EST.
3.54: Our calculated F statistic, called the F Value in this output, is. The significance, probability, or P value associated with this F statistic, is labeled Pr > F. And as you can see, the P value is .0601, just over our P value .05 cut point. If we look at the means table, we see that young adult smokers without major depression, as indicated by a value of 0, smoke an average of 312 cigarettes per month. And that those with major depression, indicated by a value of one, smoke on average 341.5 cigarettes per month. Because the P value is greater than 0.05, actually 0.06, we must accept the null hypothesis and say that these means are statistically equal. And that there’s no association between the presence or absence of major depression in the number of cigarettes smoked per month among young adult smokers. >> If I chose to reject the null hypothesis, I would be wrong six out of 100 times. And again, by normal scientific standards, this is not adequate certainty to reject the null hypothesis and say that there is an association.
Play video starting at.
5:37: Instead, we’re going to accept the null hypothesis and say that there is no association. Had the P value been less than .05, I would know that there was a significant association and to interpret that as significant. I would look at the means table, if P would’ve been less than .05, I can see that individuals with major depression smoke more than individuals without. And again, with a significant P value, I could have said that young adult smokers with major depression smoke significantly more cigarettes per month than young adult smokers without major depression. >> So, we’ve shown you the ropes in terms of a categorical explanatory variable that has two levels, as it did here with depression. For this interpretation, all we need to know is the P value and the means for each of the two groups.
1 note
·
View note