waitarajo-blog - Tumblr blog

waitarajo-blog · 5 years ago

Text

Exploring Statistical Interactions

Table of TAB12MDX by USQUAN

TAB12MDX(Tobacco Dependence Past 12 Months)

USQUAN

Frequency Percent Row Pct Col Pct

Total

0630 50.00 91.84 52.72

56 4.44 8.16 86.15

686 54.44

565 44.84 98.43 47.28

9 0.71 1.57 13.85

574 45.56

Total

1195 94.84

65 5.16

1260 100.00

Statistics for Table of TAB12MDX by USQUAN

Statistic

Value

Prob

Chi-Square

27.7842

<.0001

Likelihood Ratio Chi-Square

31.3968

<.0001

Continuity Adj. Chi-Square

26.4525

<.0001

Mantel-Haenszel Chi-Square

27.7621

<.0001

Phi Coefficient

-0.1485

Contingency Coefficient

0.1469

Cramer's V

-0.1485

Fisher's Exact Test

Cell (1,1) Frequency (F)

630

Left-sided Pr <= F

<.0001

Right-sided Pr >= F

1.0000

Table Probability (P)

<.0001

Two-sided Pr <= P

<.0001

Sample Size = 1260

Table of TAB12MDX by USQUAN

TAB12MDX(Tobacco Dependence Past 12 Months)

USQUAN

Frequency Percent Row Pct Col Pct

Total

0110 24.66 88.71 25.70

14 3.14 11.29 77.78

124 27.80

318 71.30 98.76 74.30

4 0.90 1.24 22.22

322 72.20

Total

428 95.96

18 4.04

446 100.00

Statistics for Table of TAB12MDX by USQUAN

Statistic

Value

Prob

Chi-Square

23.3380

<.0001

Likelihood Ratio Chi-Square

20.3350

<.0001

Continuity Adj. Chi-Square

20.8157

<.0001

Mantel-Haenszel Chi-Square

23.2856

<.0001

Phi Coefficient

-0.2288

Contingency Coefficient

0.2230

Cramer's V

-0.2288

Fisher's Exact Test

Cell (1,1) Frequency (F)

110

Left-sided Pr <= F

<.0001

Right-sided Pr >= F

1.0000

Table Probability (P)

<.0001

Two-sided Pr <= P

<.0001

Sample Size = 446

2 Variables:

urbanrate internetuserate

Simple Statistics

Variable

Mean

Std Dev

Sum

Minimum

Maximum

Label

urbanrate

33.50553

12.83453

1575

10.40000

66.60000

urbanrate

internetuserate

8.22063

8.70764

369.92856

0.21007

40.12223

INTERNETUSERATE

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations

urbanrate

internetuserate

urbanrate urbanrate

1.00000 47

0.11558 0.4496 45

internetuserate INTERNETUSERATE

0.11558 0.4496 45

1.00000 45

2 Variables:

urbanrate internetuserate

Simple Statistics

Variable

Mean

Std Dev

Sum

Minimum

Maximum

Label

urbanrate

56.58000

18.64588

5319

12.54000

93.32000

urbanrate

internetuserate

30.76527

19.76304

2830

1.28005

79.88978

INTERNETUSERATE

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations

urbanrate

internetuserate

urbanrate urbanrate

1.00000 94

0.32516 0.0017 91

internetuserate INTERNETUSERATE

0.32516 0.0017 91

1.00000 92

2 Variables:

urbanrate internetuserate

Simple Statistics

Variable

Mean

Std Dev

Sum

Minimum

Maximum

Label

urbanrate

78.20417

19.82159

3754

13.22000

100.00000

urbanrate

internetuserate

70.58821

15.94746

3247

36.00033

95.63811

INTERNETUSERATE

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations

urbanrate

internetuserate

urbanrate urbanrate

1.00000 48

0.07514 0.6197 46

internetuserate INTERNETUSERATE

0.07514 0.6197 46

1.00000 46

Now, let's evaluate third variables as

potential moderators in the context of chi-squared test of independence. For this, we're gonna return to our

original SAS program using the NESARC data and asking the question, is smoking

associated with nicotine dependence? We're going to create another

smoking variable for this purpose, reflecting how many cigarettes each

young adult smoker smokes per day. 0 will indicate non-daily smokers. 3 indicates those smoking

1 to 5 cigarettes per day. 8 indicates 6 to 10 cigarettes per day. 13 indicates 11 to 15 cigarettes per day. 18 indicates 16 to 20 cigarettes per day,

and 37 indicates greater than

20 cigarettes per day.Categorical explanatory variable

USQUAN/discrete tells SAS that we want levels of our categorical explanatory

variable to be represented on the x-axis.

TAB12MDX to be displayed as a mean on the y-axis. And this gives us the graphic

representation of this positive linear relationship. As smoking quantity increases, so does the proportion of individuals

with nicotine dependence. This finding is accurate with regard

to the larger population of young adult smokers. Though might a third variable

moderate the relationship between smoking quantity and nicotine? Put another way,

might there be a statistical interaction between a third

variable in smoking behavior and predicting our response variable,

nicotine dependence? We're going to evaluate major depressive

disorder as the third variable. Our question will be, does Major

Depression affect either the strength or the direction of the relationship

between smoking and nicotine dependence? Put another way, might a third

variable moderate the relationship between smoking and nicotine dependence? Is smoking related to nicotine dependence

for each level of this third variable, that is, for those with major depression

and those without major depression? Similar to our anova example, syntax to be added to the PROC FREQ

code is circled here in red. We need to first sort the data, according

to the categorical third variable, then include a bistatement,

telling SAS to run a chi-square for each level of the third

variable separately. The specific syntax for this example is shown here, PROC SORT; BY MAJORDEPLIFE;

PROC FREQ; TABLES TAB12MDX*USQUAN/CHISQ; BY MAJORDEPLIFE;. When this syntax is added to the SAS

program, here are the results. You can see the cross tabs or

cross tabulation table, looking at usual quantity by tobacco

dependence in the past 12 months. First, for major depression equal to 0,

which is those without major depression, the chi-square value is large and

the P-value is quite small. In addition, the column percents

reveal what seems to be a positive linear relationship with percentages

of nicotine dependency increasing between lower levels of smoking and

higher levels of smoking. So we can say that this is a statistically

significant relationship for those without major depression. For those with major depression,

we find a large chi-square value and small P value,

which is statistically significant. These column percents also reveal what

seems to be a positive linear relationship with percentages of nicotine

dependence increasing between lower levels of smoking and

higher levels of smoking. Using a line graph to examine the rates of

nicotine dependence by different levels of smoking, it seems that

both the direction and size of the relationship is

similar between smoking and nicotine dependence for those with

major depression and for those without. Although, those with major depression show

higher rates of nicotine dependence at every level of smoking quantity. In this case, we would say a diagnosis

of major depression does not moderate the relationship between smoking and

nicotine dependence. For both young adult smokers with major

depression and those without, higher levels of smoking behavior is associated

with higher rates of nicotine dependence.

0 notes

waitarajo-blog · 5 years ago

Text

3 Variables:

urbanrate incomeperperson internetuserate

Simple Statistics

Variable

Mean

Std Dev

Sum

Minimum

Maximum

Label

urbanrate

203

56.76936

23.84493

11524

10.40000

100.00000

urbanrate

incomeperperson

190

8741

14263

1660784

103.77586

105147

INCOMEPERPERSON

internetuserate

192

35.63272

27.78028

6841

0.21007

95.63811

INTERNETUSERATE

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations

urbanrate

incomeperperson

internetuserate

urbanrate urbanrate

1.00000 203

0.49009 <.0001 189

0.61395 <.0001 190

incomeperperson INCOMEPERPERSON

0.49009 <.0001 189

1.00000 190

0.75094 <.0001 183

internetuserate INTERNETUSERATE

0.61395 <.0001 190

0.75094 <.0001 183

1.00000 192

We're going to be looking at two different correlations, that between Internet use and urban rate, and between Internet use and income per person. We can actually list these together.

To locate the correlation coefficients of interest and the associated p values, we need to examine the Pearson Correlation Coefficient table here, and find the row and column where our two variables of interest intersect.

For the association between urbanrate and internetuserate, the correlation coefficient is approximately 0.61 with a p-value of 0.0001. This tells us that the relationship is statistically significant.

For the association between incomeperperson and internetuserate, the correlation coefficient is approximately 0.75 and also has a significant p-value.Now we can actually interpret the scatter plots and the coefficients together.

The association between internetuserate and income is fairly strong and it's also positive, as the scatter plot had already shown us. The association between internetuserate and urbanrate is also positive but slightly more modest at 0.61. Both are statistically significant. That is, for both associations, it's highly unlikely that a relationship of this magnitude would be due to chance alone.

Here's some good news. Post hoc tests are not necessary when conducting Pearson correlation.Post hoc tests are needed only when your research question includes a categorical explanatory variable with more than two levels. Because our explanatory variable and the context of correlation coefficient is quantitative, there's never a need to perform a post hoc test.Another interesting and useful aspect of the correlation coefficient is if we square the correlation coefficient. That is, we multiply it by itself, we get a value that also helps our understanding of the association between the two quantitative variables.

Small r squared is the fraction of the variability of one variable that can be predicted by the other. For example, when looking at the relationship between urban rate and Internet use rate, if we square our correlation coefficient of 0.61, we get 0.37. This could be interpreted the following way. If we know the urban rate, we can predict 37% of the variability we will see in the rate of Internet use. Of course, that also means that 63% of the variability is unaccounted for.

If we square the correlation coefficient for income per person and Internet use rate we get a value of 0.56. This suggests, if we know income per person, we can predict 56% of the variability we'll see in the rate of Internet use.

This is a little bit more impressive, because we can predict over half the variability.

Again, correlation coefficients are commonly denoted with a lowercase r, and they're squared to determine the amount of variability that can be predicted. [MUSIC] You might be wondering how much variability in Internet use rates can be predicted if we consider both urban rate and income per person. A multivariate inferential tool called multiple regression can be used to answer this question and we'll discuss that in the future.

0 notes

waitarajo-blog · 5 years ago

Text

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

2.5

Total

064 3.76 7.93 90.14

53 3.11 6.57 81.54

69 4.05 8.55 78.41

59 3.46 7.31 64.84

41 2.41 5.08 60.29

521 30.59 64.56 39.47

807 47.39

7 0.41 0.78 9.86

12 0.70 1.34 18.46

19 1.12 2.12 21.59

32 1.88 3.57 35.16

27 1.59 3.01 39.71

799 46.92 89.17 60.53

896 52.61

Total

71 4.17

65 3.82

88 5.17

91 5.34

68 3.99

1320 77.51

1703 100.00

Frequency Missing = 3

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

165.2732

<.0001

Likelihood Ratio Chi-Square

176.1834

<.0001

Mantel-Haenszel Chi-Square

162.8952

<.0001

Phi Coefficient

0.3115

Contingency Coefficient

0.2974

Cramer's V

0.3115

Sample Size = 1703 Frequency Missing = 3

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

2.5

Total

064 47.06 54.70 90.14

53 38.97 45.30 81.54

117 86.03

7 5.15 36.84 9.86

12 8.82 63.16 18.46

19 13.97

Total

71 52.21

65 47.79

136 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

2.0893

0.1483

Likelihood Ratio Chi-Square

2.1023

0.1471

Continuity Adj. Chi-Square

1.4349

0.2310

Mantel-Haenszel Chi-Square

2.0740

0.1498

Phi Coefficient

0.1239

Contingency Coefficient

0.1230

Cramer's V

0.1239

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

0.9553

Right-sided Pr >= F

0.1154

Table Probability (P)

0.0707

Two-sided Pr <= P

0.2153

Sample Size = 136

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

064 90.14 100.00 90.14

64 90.14

7 9.86 100.00 9.86

7 9.86

Total

71 100.00

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

064 39.51 52.03 90.14

59 36.42 47.97 64.84

123 75.93

7 4.32 17.95 9.86

32 19.75 82.05 35.16

39 24.07

Total

71 43.83

91 56.17

162 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

13.9727

0.0002

Likelihood Ratio Chi-Square

15.0854

0.0001

Continuity Adj. Chi-Square

12.6226

0.0004

Mantel-Haenszel Chi-Square

13.8865

0.0002

Phi Coefficient

0.2937

Contingency Coefficient

0.2818

Cramer's V

0.2937

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

1.0000

Right-sided Pr >= F

0.0001

Table Probability (P)

<.0001

Two-sided Pr <= P

0.0002

Sample Size = 162

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

064 46.04 60.95 90.14

41 29.50 39.05 60.29

105 75.54

7 5.04 20.59 9.86

27 19.42 79.41 39.71

34 24.46

Total

71 51.08

68 48.92

139 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

16.7459

<.0001

Likelihood Ratio Chi-Square

17.5739

<.0001

Continuity Adj. Chi-Square

15.1695

<.0001

Mantel-Haenszel Chi-Square

16.6254

<.0001

Phi Coefficient

0.3471

Contingency Coefficient

0.3279

Cramer's V

0.3471

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

1.0000

Right-sided Pr >= F

<.0001

Table Probability (P)

<.0001

Two-sided Pr <= P

<.0001

Sample Size = 139

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

064 4.60 10.94 90.14

521 37.46 89.06 39.47

585 42.06

7 0.50 0.87 9.86

799 57.44 99.13 60.53

806 57.94

Total

71 5.10

1320 94.90

1391 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

70.9888

<.0001

Likelihood Ratio Chi-Square

76.4339

<.0001

Continuity Adj. Chi-Square

68.9247

<.0001

Mantel-Haenszel Chi-Square

70.9378

<.0001

Phi Coefficient

0.2259

Contingency Coefficient

0.2204

Cramer's V

0.2259

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

1.0000

Right-sided Pr >= F

<.0001

Table Probability (P)

<.0001

Two-sided Pr <= P

<.0001

Sample Size = 1391

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

2.5

Total

053 81.54 100.00 81.54

53 81.54

12 18.46 100.00 18.46

12 18.46

Total

65 100.00

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

2.5

Total

053 33.97 47.32 81.54

59 37.82 52.68 64.84

112 71.79

12 7.69 27.27 18.46

32 20.51 72.73 35.16

44 28.21

Total

65 41.67

91 58.33

156 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

5.2241

0.0223

Likelihood Ratio Chi-Square

5.4011

0.0201

Continuity Adj. Chi-Square

4.4318

0.0353

Mantel-Haenszel Chi-Square

5.1906

0.0227

Phi Coefficient

0.1830

Contingency Coefficient

0.1800

Cramer's V

0.1830

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

0.9939

Right-sided Pr >= F

0.0166

Table Probability (P)

0.0105

Two-sided Pr <= P

0.0299

Sample Size = 156

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

2.5

Total

053 39.85 56.38 81.54

41 30.83 43.62 60.29

94 70.68

12 9.02 30.77 18.46

27 20.30 69.23 39.71

39 29.32

Total

65 48.87

68 51.13

133 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

7.2372

0.0071

Likelihood Ratio Chi-Square

7.3891

0.0066

Continuity Adj. Chi-Square

6.2484

0.0124

Mantel-Haenszel Chi-Square

7.1827

0.0074

Phi Coefficient

0.2333

Contingency Coefficient

0.2272

Cramer's V

0.2333

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

0.9982

Right-sided Pr >= F

0.0059

Table Probability (P)

0.0041

Two-sided Pr <= P

0.0080

Sample Size = 133

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

2.5

Total

053 3.83 9.23 81.54

521 37.62 90.77 39.47

574 41.44

12 0.87 1.48 18.46

799 57.69 98.52 60.53

811 58.56

Total

65 4.69

1320 95.31

1385 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

45.1777

<.0001

Likelihood Ratio Chi-Square

46.1611

<.0001

Continuity Adj. Chi-Square

43.4608

<.0001

Mantel-Haenszel Chi-Square

45.1451

<.0001

Phi Coefficient

0.1806

Contingency Coefficient

0.1777

Cramer's V

0.1806

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

1.0000

Right-sided Pr >= F

<.0001

Table Probability (P)

<.0001

Two-sided Pr <= P

<.0001

Sample Size = 1385

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

059 64.84 100.00 64.84

59 64.84

32 35.16 100.00 35.16

32 35.16

Total

91 100.00

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

041 60.29 100.00 60.29

41 60.29

27 39.71 100.00 39.71

27 39.71

Total

68 100.00

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

0521 39.47 100.00 39.47

521 39.47

799 60.53 100.00 60.53

799 60.53

Total

1320 100.00

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

059 37.11 59.00 64.84

41 25.79 41.00 60.29

100 62.89

32 20.13 54.24 35.16

27 16.98 45.76 39.71

59 37.11

Total

91 57.23

68 42.77

159 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

0.3439

0.5576

Likelihood Ratio Chi-Square

0.3432

0.5580

Continuity Adj. Chi-Square

0.1768

0.6741

Mantel-Haenszel Chi-Square

0.3417

0.5588

Phi Coefficient

0.0465

Contingency Coefficient

0.0465

Cramer's V

0.0465

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

0.7743

Right-sided Pr >= F

0.3365

Table Probability (P)

0.1108

Two-sided Pr <= P

0.6198

Sample Size = 159

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

059 4.18 10.17 64.84

521 36.92 89.83 39.47

580 41.11

32 2.27 3.85 35.16

799 56.63 96.15 60.53

831 58.89

Total

91 6.45

1320 93.55

1411 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

22.6255

<.0001

Likelihood Ratio Chi-Square

22.2336

<.0001

Continuity Adj. Chi-Square

21.5899

<.0001

Mantel-Haenszel Chi-Square

22.6095

<.0001

Phi Coefficient

0.1266

Contingency Coefficient

0.1256

Cramer's V

0.1266

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

1.0000

Right-sided Pr >= F

<.0001

Table Probability (P)

<.0001

Two-sided Pr <= P

<.0001

Sample Size = 1411

Table of TAB12MDX by USFREQMO

TAB12MDX(Tobacco Dependence Past 12 Months)

USFREQMO

Frequency Percent Row Pct Col Pct

Total

041 2.95 7.30 60.29

521 37.54 92.70 39.47

562 40.49

27 1.95 3.27 39.71

799 57.56 96.73 60.53

826 59.51

Total

68 4.90

1320 95.10

1388 100.00

Statistics for Table of TAB12MDX by USFREQMO

Statistic

Value

Prob

Chi-Square

11.6386

0.0006

Likelihood Ratio Chi-Square

11.3718

0.0007

Continuity Adj. Chi-Square

10.7904

0.0010

Mantel-Haenszel Chi-Square

11.6302

0.0006

Phi Coefficient

0.0916

Contingency Coefficient

0.0912

Cramer's V

0.0916

Fisher's Exact Test

Cell (1,1) Frequency (F)

Left-sided Pr <= F

0.9998

Right-sided Pr >= F

0.0006

Table Probability (P)

0.0003

Two-sided Pr <= P

0.0009

Sample Size = 1388

To determine which groups are different from the others, we will again, need to perform a post hoc test. By conducting post hoc comparisons between pairs of rates, in a way that avoids excessive type one error. In other words, avoids rejecting the null hypothesis, when the null hypothesis is true. We will be much better able to appropriately describe which population rates are different from the others.

If we reject the null hypothesis, we need to perform comparisons for each pair of nicotine dependent's rates across the six smoking frequency categories. In the case of 6 groups, we actually need to perform 15 pairwise comparisons.

The goal of using the Bonferroni Adjustment is to control a family-wise error rates, also known as the maximum overall type 1 error rate. So, that we can evaluate which pairs of nicotine dependents rate are different from one another.Briefly, the process would be to conduct each of the 15 paired comparisons. But rather than evaluating significance at the p .05 level, we would adjust the p value to make it more difficult to reject the null hypothesis. The adjusted p value is calculated by dividing p .05 by the number of comparisons that we plan to make. So, if we make 3 comparisons, we would only reject null hypothesis if the p value were .017 or less. For the 15 paired comparisons that we plan to make to better understand the association between smoking frequency and nicotine dependence, our adjusted p value is .003. Adjusting the p value is definitely the easy part of the process. Now, for the more challenging piece. For the actual post hoc testing, we need to run a Chi-Square test for each of the 15 paired comparisons. To do this, I can add syntax to my program at the end of the data step just before the PROC SORT statement. Where I choose two smoking frequency groups at a time. So, I'm going to start by comparing my usual smoking frequency per month group 1, and my usual smoking frequency per month group 2.5. If I save and run this program, I get a new Chi-Square table that includes only those two frequency groups by the presence or absence of nicotine dependence.

Again, I wanna focus here on the column percentages. 9.86 and 18.46, are these two rates significantly different from one another? If I look down at my Chi-Square value and probability, I can see that they aren't.

So, I want to accept the null hypothesis, since this probability value is not only not less than 0.05. It is definitely not less than my Bonferroni Adjusted p value of .003. This is just the first step at our post hoc analysis. Now, we need to run two by two Chi-Squares for each of the remaining 14 paired comparisons.To do this, we could continue one comparison at a time, choosing two frequency groups. By adding syntax to the program, at the end of the data step, and just before the PROC SORT statement. Use the syntax requesting a Chi-Square analysis, comparing those smoking one day per month and those smoking six days per month.

For our code, we can see that only smoking frequency groups equal to 1 and equal to 6 are included in the Chi-Square table and analysis.

The nicotine dependence rates are 9.86 and 21.59. The p value associated with the Chi-Square statistic is .0468. Initially, we might wanna say that this is a significant finding because it's less than a p value of .05. Remember though, that the adjusted p value for these comparison is .003. So, this is not significant.

Going back to the graph of nicotine dependence rates, we now know that frequency group equal to 1 and equal to 6, do not have significantly different rates of nicotine dependence.

This can get pretty tedious and of course, it would be very easy to get a big confused and to skip one or more paired comparisons. Let's look at how to use syntax that you already know to this systematically. And to get results for all of the paired comparisons simultaneously.

First, we'll get rid of the lines of syntax that subsets our data to specific smoking frequency groups. Now, we'll create a series of new data sets in associated Chi-Square tables that call in the original working data set. And select each of the various combination of smoking frequency groups. We've just added quite a bit of syntax to this program.

You'll see however, that it's repetitive, and that in each group of syntax, only the logic statements that select the specific frequency groups and the working data set name need to change.

Following the data step and the request for output that we've been working with. We're going to add a new data step, and call it COMPARISON1.

We set the original working data set, which was called NEW, above. And then subset the data to the two smoking frequency categories. Smoking frequency category 1, and 2.5. We end this data step by sorting by the unique identifier, and requesting a Chi-Square analysis with a PROC FREQ procedure. Then we end this new syntax group with a RUN statement.

Next, we repeat those same lines of syntax for each of the remaining 14 paired comparisons. Changing only the name of the new working data set, and the selected smoking frequency groups.

Here, we call the DATA COMPARISON2, and select smoking frequency group 1, compared to smoking frequency group 6. Here, the data is COMPARISON3. And we're comparing smoking frequency group 1 to smoking group 14. COMPARISON4 is comparing smoking frequency group 1, and smoking frequency group 22. We see these lines of syntax repeated with new data names, and the additional paired comparisons for the smoking frequency groups. All the way down to COMPARISON15, smoking frequency group 22 versus smoking frequency group 30. Well, we run this program, the results include the overall Chi-Squared table. That is, the sixth level smoking frequency variable by the nicotine dependence response variable. And then Chi-Square tables for each of the paired comparisons. 1 versus 2.5, 1 versus 6, 1 versus 14, 1 versus 22, 1 versus 30, 2.5 versus 6, and so on.

The goal is to examine the p value for each of the paired comparisons. And to use the adjusted Bonferroni p value of .003 to evaluate significance. Here, we've created a table that shows the p values of each of the paired comparisons from the output. Obviously, there are several that are less that p is .05. Here are the p values that are less than .003. As we can see, smoking frequency group 30 that is, those who smoke 30 days in a usual month, is significantly different from each of the other smoke frequency levels. In addition, smoking frequency group 1 has significantly different nicotine dependence rates, than smoke frequency groups 14, and 22.

0 notes

waitarajo-blog · 5 years ago

Text

Class Level Information

Class

Levels

Values

MAJORDEPLIFE

0 1

Number of Observations Read

1706

Number of Observations Used

1697

Source

Sum of Squares

Mean Square

F Value

Pr > F

Model

266974.1

3.55

0.0597

Error

1695

127468189.9

75202.5

Corrected Total

1696

127735164.0

R-Square

Coeff Var

Root MSE

NUMCIGMO_EST Mean

0.002090

85.61566

274.2307

320.3044

Source

Anova SS

Mean Square

F Value

Pr > F

MAJORDEPLIFE

266974.1235

3.55

0.0597

Level of MAJORDEPLIFE

NUMCIGMO_EST

Mean

Std Dev

01253

312.837989

269.002344

444

341.375000

288.495118

Class Level Information

Class

Levels

Values

ETHRACE2A

1 2 3 4 5

Number of Observations Read

1706

Number of Observations Used

1697

Source

Sum of Squares

Mean Square

F Value

Pr > F

Model

6965167.9

1741292.0

24.40

<.0001

Error

1692

120769996.1

71377.1

Corrected Total

1696

127735164.0

R-Square

Coeff Var

Root MSE

NUMCIGMO_EST Mean

0.054528

83.40969

267.1649

320.3044

Source

Anova SS

Mean Square

F Value

Pr > F

ETHRACE2A

6965167.919

1741291.980

24.40

<.0001

Note:

This test controls the Type I comparisonwise error rate, not the experimentwise error rate.

Alpha

0.05

Error Degrees of Freedom

1692

Error Mean Square

71377.07

Harmonic Mean of Cell Sizes

100.4731

Note:

Cell sizes are not equal.

Number of Means

Critical Range

73.93

77.84

80.46

82.39

Class Level Information

Class

Levels

Values

MAJORDEPLIFE

0 1

Number of Observations Read

1706

Number of Observations Used

1697

Source

Sum of Squares

Mean Square

F Value

Pr > F

Model

266974.1

3.55

0.0597

Error

1695

127468189.9

75202.5

Corrected Total

1696

127735164.0

R-Square

Coeff Var

Root MSE

NUMCIGMO_EST Mean

0.002090

85.61566

274.2307

320.3044

Source

Anova SS

Mean Square

F Value

Pr > F

MAJORDEPLIFE

266974.1235

3.55

0.0597

Level of MAJORDEPLIFE

NUMCIGMO_EST

Mean

Std Dev

01253

312.837989

269.002344

444

341.375000

288.495118

3:39: >> Proc ANOVA first displays a table that includes the following. The name of the variable in the class statement. The number of different values or levels of the class variable. The value of the class variable. And the number of observations in the data set and the number of observations excluded from the analysis because of missing data,

3:39: So here we see our categorical explanatory variable, MAJORDEPLIFE, as two levels.

3:47: And the values are 0 and 1. Of the 1706 observations, 1697 were included in the analysis.

Proc ANOVA then displays an analysis of variance table for the response variable, also known as the dependent variable from the MODEL statement. >> In this case, our response or dependent variable was NUMCIGMO_EST.

3.54: Our calculated F statistic, called the F Value in this output, is. The significance, probability, or P value associated with this F statistic, is labeled Pr > F. And as you can see, the P value is .0601, just over our P value .05 cut point. If we look at the means table, we see that young adult smokers without major depression, as indicated by a value of 0, smoke an average of 312 cigarettes per month. And that those with major depression, indicated by a value of one, smoke on average 341.5 cigarettes per month. Because the P value is greater than 0.05, actually 0.06, we must accept the null hypothesis and say that these means are statistically equal. And that there’s no association between the presence or absence of major depression in the number of cigarettes smoked per month among young adult smokers. >> If I chose to reject the null hypothesis, I would be wrong six out of 100 times. And again, by normal scientific standards, this is not adequate certainty to reject the null hypothesis and say that there is an association.

Play video starting at.

5:37: Instead, we’re going to accept the null hypothesis and say that there is no association. Had the P value been less than .05, I would know that there was a significant association and to interpret that as significant. I would look at the means table, if P would’ve been less than .05, I can see that individuals with major depression smoke more than individuals without. And again, with a significant P value, I could have said that young adult smokers with major depression smoke significantly more cigarettes per month than young adult smokers without major depression. >> So, we’ve shown you the ropes in terms of a categorical explanatory variable that has two levels, as it did here with depression. For this interpretation, all we need to know is the P value and the means for each of the two groups.

1 note · View note