Text
Week 4 Assignment
A k-means cluster analysis was conducted to identify underlying subgroups of respondents based on their similarity of responses on 4 variables that represent characteristics that could have an impact on having hangovers. Clustering variables included two binary variables measuring whether or not the person was born in the USA, and whether or not the person worked full time, as well as quantitative variables of age, and scale measuring overall health(Excellent, Very Good, Good, Fair, Poor). All clustering variables were standardized to have a mean of 0 and a standard deviation of 1.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. A series of k-means cluster analyses were conducted on the training data specifying k=1-9 clusters, using Euclidean distance. The variance in the clustering variables that was accounted for by the clusters (r-square) was plotted for each of the nine cluster solutions in an elbow curve to provide guidance for choosing the number of clusters to interpret.
Figure 1. Elbow curve of r-square values for the nine cluster solutions
The elbow curve was inconclusive, suggesting that the 2, 3 & 4 cluster solutions might be interpreted. The results below are for an interpretation of the 3-cluster solution.
Canonical discriminant analyses was used to reduce the 11 clustering variable down a few variables that accounted for most of the variance in the clustering variables. A scatterplot of the first two canonical variables by cluster (Figure 2 shown below) indicated that the observations in purple cluster did not overlap very much with other clusters, but were spread suggesting high within variation cluster variance. Clusters Green & Yellow were densely packed with relatively low within cluster variance. Yellow tended to overlap slightly. The results of this plot suggest that the best cluster solution may have been 2 clusters.
Figure 2. Plot of the first two canonical variables for the clustering variables by cluster.
The means on the clustering variables showed that, compared to the other clusters, people in cluster 0 were mostly not born in the USA and younger. Cluster 1 group was moderate compared to other groups in all catagoies with higher results in born in the USA, working full time, and better health with lower age. Cluster 3 had worse health, not working full time, and higher in age. Maybe this population was retired.
In order to externally validate the clusters, an Analysis of Variance (ANOVA) was conducting to test for significant differences between the clusters on hangover symptom. A tukey test was used for post hoc comparisons between the clusters. Results indicated significant differences between the clusters on hangover (F(3)=236.0, p<.0001). The tukey post hoc comparisons showed significant differences between clusters on hangover. People in cluster 1 had the greatest number of hanover symptoms (mean=0.79, sd=1.04), and cluster 0 had the lowest hanovers(mean=0.42, sd=0.86).
Below is the full code & output.
OUTPUT
0 notes
Text
Week 3 Assignment
A lasso regression analysis was conducted to identify a subset of variables from a pool of 4 categorical and quantitative predictor variables that best predicted a quantitative response variable measuring having ever experienced a physical hangover symptom among those who have used alcohol. Categorical predictors included healthy(1=Excellent, Very Good & Good, 0 = Fair or Poor), Born in the USA & working full time. A quantitative predictor variable, age, was also included. All predictor variables were standardized to have a mean of zero and a standard deviation of one.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. The least angle regression algorithm with k=10 fold cross validation was used to estimate the lasso regression model in the training set, and the model was validated using the test set. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of predictor variables.
The change in the validation mean square error at each step is shown below. The shape is fairly consistent across all 10 folds
Of the 4 predictor variables, all 4 were retained in the selected model. During the estimation process, age and born in the usa were most strongly associated with school connectedness, followed by working full time and overall health. Age and overall health were negatively associated with having experienced a hangover, while born in the USA and working full time were positively associated with school connectedness. These 4 variables accounted for 43.7% of the variance in the having an experienced a hangover variable.
The entire output, and the code is included below:
0 notes
Text
Week 2 Assignment
Random forest analysis was performed to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The following explanatory variables were included as possible contributors to a random forest evaluating presence of a hangover among drinkers (my response variable): Overall health, born in the USA, working full time, and age.
The explanatory variables with the highest relative importance scores was age (71.2%) followed by born in the USA (20.1%). The accuracy of the random forest was 61%. Over 25 trees, the peak accuracy was achieved with 2 trees, but was very similar with only one tree suggesting that interpretation of a single decision tree may be appropriate.

The python code is included below
0 notes
Text
Machine Learning Week 1
Decision tree analysis was performed to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable.
Age & Overall Health were the explanatory variables included as possible contributors to a classification tree model evaluating having experienced a hangover symptom (my response variable).
Age, a quantitative variable that was managed into a binary variable via responses of 30 or greater = 1, and under 30 = 0, was the first variable to separate the sample into two subgroups. A further subdivision was made with overall heath which was classified as 1 = Excellent, Very Good or Good Health, and 0 = Fair or Poor Health. Respondents who were over 30 & Healthy were the greatest population. No matter the health rating, respondents who were over 30 had a greater change of having hangover symptoms. The total model classified 62.3% of the sample correctly.
Code is included below:
0 notes
Text
Week 4 Assignment
My categorical explanatory variable, overall health, has more than two categories. Using 1 (Excellent Health) as the reference I note all other responses for health have statistically significant difference and positive association. As the health rating gets worse (response 2 to 5), the association gets slightly stronger.
To prepare the data for this exercise, I took my response variable, number of hangover symptoms, which is categorical with 5 categories, and bin to two categories of hangover observed (1) or no hangover observed (0).
After adjusting for potential confounding factors of age, the odds of having a hangover was 0.84 times less for participants that are healthy than those that are unhealthy (OR=0.84, 95% CI = 0.79-0.90, p=.000). Age was also significantly associated with nicotine dependence, such that older participants were significantly less likely to have nicotine dependence (OR= 0.81, 95% CI=0.40-0.93, p=.041).
This does not support the original hypotheses that unhealthy people would have more hangovers. That being said, although age technically does not confound the study, it does change the result from being a greater than 1 odds ratio to a less than 1 odds ratio. Further study and confounding factors shall be explored in the future.
0 notes
Text
Week 3 Assignment
Like last week, my categorical response variable of number of hangover symptoms will be used because it can return 5 different values, as allowed in the assignment description.
We begin with a multiple regression analysis
Overall health of the individual (Beta=-.16, p=0.0) was significantly and negatively associated with number of physical hangover symptoms observed. Age was also significantly and negatively associated with number of physical hangover symptoms observed, such that younger participants reported a greater number of hangover symptoms (Beta= -.01, p=0.0). Both variables are statistically significant
It is interesting to note that by adding age into the regression analysis, the P value of health went from .011 to 0.0. This would indicate that age is a confounding variable, however small in significance.
Even though the explanatory variables are statistically significant to the response variable, the association was expected to be positively associated but the result was negative. The R-squared values tells me there is other confounding variables.
The qq plot tells me that certainty the residuals are not normally distributed.
The standardized residuals for all observations shows me that the level of error is unacceptable. This is in part due to the first statement in this post that the explanatory response variable only had 5 possible response, with one weighted heavily. This would explain why the plots are showing values only in 5 lines.
The leverage plot is interesting because it shows each value has very little leverage over the plot, if you notice the scale on the x-axle is all < .002. Therefore there are no outliers, which would be expected based on the setup of this experiment.
1 note
·
View note
Text
Week 2 Regression Assignment
For this weeks assignment, my explanatory variable is a binary categorical variable of overall health, with 0 being unhealthy and 1 being healthy. My
response variable is actually a categorical variable, but for this week’s exercise, it has enough response options to use as I have added the number of hangover symptoms observed, and can be 0, 1 ,2 ,3 or 4 hangover symptoms.
Using the ols function, we find the equation for the linear regression to be Number of Hangover Symptoms = -0.0387*(Health) + .7156. This would mean that for unhealthy people, we have an mean of .7156 hangover symptoms. For healthy people, we have a mean of .7156-.0387 = .6769 hangover symptoms. The P value is .01 and R-Squared is 0. It’s obvious that there is no correlation between these two variables.
The program output includes a frequency plot and linear analysis output.
Program output:
Program Code:
0 notes
Text
Week 1 Intro to Regression Assignment
Sample
The sample is from the first wave of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), the largest and most ambitious survey of alcohol and drug use and associated psychiatric and medical comorbidities. Participants (N=43,093) represent the civilian, noninstitutionalized adult population of the United States, including residents of the District of Columbia, Alaska, and Hawaii. It includes people living in households, military personnel living off base, and people residing in the following group quarters: boarding or rooming houses, nontransient hotels and motels, shelters, facilities for housing workers, college quarters, and group homes. The NESARC included over sampling of Blacks, Hispanics and young adults aged 18 to 24 years. The data was studied was at the individual level. The data analytic sample for this study included participants who reported as NOT abstaining from alcohol (N=34,365).
Procedure
Data were collected by trained U.S. Census Bureau Field Representatives during 2001– 2002 through computer-assisted personal interviews (CAPI). One adult was selected for interview in each household, and interviews were conducted in respondents’ homes following informed consent procedures. The original purpose of the data collection was to study the occurrence of more than one psychological disorder or substance use disorder in the same person.
Measures
Alcohol experiences (effects and consequences of drinking, development of tolerance, attempts to stop drinking) were a major part of the survey using the NIAAA, Alcohol Use Disorder and Associated Disabilities Interview Schedule – DSM-IV (AUDADIS-IV) (Grant et al., 2003; Grant, Harford, Dawson, & Chou, 1995).
My explanatory variable, measuring the self-perceived health of the individual, was a categorical variable evaluated with the possible responses to the prompt (“Self-Perceived Current Health”) of Excellent, Very Good, Good, Fair, Poor or Unknown. My response variables, categorical as well, all measured the presence of a physical hangover symptom with the same possible responses of Yes, No, Unknown or NA, lifetime abstainer to the following series of questions; “Ever shake when the effects of alcohol are wearing off”, “Ever have nausea when effects of alcohol are wearing off”, “Ever sweat or heart beat fast when effects of alcohol are wearing off” & ““Ever have very bad headaches when effects of alcohol are wearing off”.
To manage the data, I first excluded all abstainer and unknown data. I then created a new response variable which has counted the yes or no response from the 4 other variables, and represents how many of the 4 physical hangover symptom variables the person responded yes to. Therefore, there are 5 categories of this variable [0,1,2,3,4]. I had to further reduce my response variable to two categories. I decided to recode any value (1-4) that indicated at least one positive responses to a physical hangover symptom as 1, and no positive responses as 0. For the explanatory variable, I categorized into two categories of healthy (Excellent, Very Good & Good) and unhealthy (Fair & Poor).
0 notes
Text
Week 4 Assignment
First I looked at my variables to determine what type they were. I realized that the recorded health of the person was the explanatory variable, categorical with two categories. The response variable is a new variable I generated which has counted the yes or no response from 4 other variables, and represents how many of the 4 physical hangover symptom variables the person responded yes to. Therefore, there are 5 categories of this variable [0,1,2,3,4].
I did a describe on number of hangover symptoms recorded and found the mean at .68 and standard deviation around 1.
I created univariate graphs of each of the variable individually. I didn’t find these very useful. Since they are 2 category, categorical I didn’t find mean or spread meaningful.
Finally I created two bivariate plots. To do this, I needed to reduce my response variable to two categories. I decided to recode any value (1-4) that indicated at least one positive responses to a physical hangover symptom as 1, and no positive responses as 0.
My first bivariate plot is the percentage of people that had at least 1 physical hangover symptom vs the raw response of overall health.
My second bivariate plot is the percentage of people that had at least 1 physical hangover symptom vs health categorized into two categories of healthy (Excellent, Very Good & Good) and unhealthy (Fair & Poor)
In conclusion the final bivariate plot proves my original hypothesis incorrect as in fact a larger percentage of healthy people experience at least one physical hangover symptom.
Code Below:
Output:
#thatwasalongpost
0 notes
Text
Week 3 Assignment
In this weeks assignment I was able to take over 135 lines of code down to just 63 while producing much more easily understandable data to draw conclusions from.
First I coded out missing data that was for non-drinkers which was not relevant to my study. I also then re-coded variables. Since the codebook responses were 1=yes, 2=no and I wanted to count the yes responses, I found it very useful to change the no responses to 0. Therefore I could add up the yes responses with a simple math exercise and report how many hangover symptoms were reported by each person in the study. My using the simple math operation, I ended up creating this secondary variable which added value to my study. I didn’t need to group any variables. Lastly, I changed the names of the variables to be more human readable.
From the output below, I provided a snapshot of the data (first 20 rows) to verify that my data management worked successfully.
The final output from the crosstab function shows the frequency percentages from healthy and unhealthy respondents. Contrary to my hypothesis, 63% of non-healthy people reported no hangovers, vs 61% of healthy people that reported no hangovers. It was also interesting that more unhealthy people reported having all 4 physical symptoms of a hangover that I tracked, 3.9% to 1.7%
Output from program:
Code is below:
0 notes
Text
Week 2 Assignment: Running my first program
For this weeks assignment, I was able to successfully run my first program. After having problems getting Python to install on my machine, I learned that I needed to do some of the installation off my companies network.
From the data I found that 43,093 people have been surveyed, and only 0.6% people responded that they had ‘unknown’ health. I then split the data up into a ‘healthy’ data set of people that responded that their health was Excellent, Very Good or Good, and a ‘unhealthy’ data set of people that responded that their health was fair or poor. I also excluded anyone who responded that they are lifelong abstainers of alcohol for my study, which were the blank “nan” responses.
I analyzed 4 different responses for these two subgroups. First was if the subject had ever experienced body shakes with a hangover. Second was if the subject had ever experienced nausea with a hangover. Third was if the subject had ever experienced sweating & high heart bead with a hangover. Forth was if the subject had ever experienced headaches with a hangover.
Two of the studies supported my hypothesis that unhealthy people will experience more physical symptoms with hangovers. I found the 8.9% of unhealthy people experienced body shakes as compared to only 4.3% of healthy people. In addition, 11% of unhealthy people experienced sweating or high heart beating vs only 7.2% of healthy people.
Two of the studies did not supported my hypothesis. I found the 28.8% of unhealthy people experienced nausea as compared to 30.5% of healthy people. In addition, 22.7% of unhealthy people experienced headache vs 25.4% of healthy people. The output directly from python for these four studies is shown below.
Counts for S2BQ1A9B Unhealthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 499 2.000000 5007 9.000000 46 Name: S2BQ1A9B, dtype: int64
Percentages for S2BQ1A9B Unhealthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.089878 2.000000 0.901837 9.000000 0.008285 Name: S2BQ1A9B, dtype: float64
Counts for S2BQ1A9B Healthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 27734 1.000000 1276 9.000000 185 Name: S2BQ1A9B, dtype: int64
Percentages for S2BQ1A9B Healthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.949957 1.000000 0.043706 9.000000 0.006337 Name: S2BQ1A9B, dtype: float64
Counts for S2BQ1A9D Unhealthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 1604 2.000000 3901 9.000000 47 Name: S2BQ1A9D, dtype: int64
Percentages for S2BQ1A9D Unhealthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.288905 2.000000 0.702630 9.000000 0.008465 Name: S2BQ1A9D, dtype: float64
Counts for S2BQ1A9D Healthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 20095 1.000000 8916 9.000000 184 Name: S2BQ1A9D, dtype: int64
Percentages for S2BQ1A9D Healthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.688303 1.000000 0.305395 9.000000 0.006302 Name: S2BQ1A9D, dtype: float64
Counts for S2BQ1A9F Unhealthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 612 2.000000 4864 9.000000 76 Name: S2BQ1A9F, dtype: int64
Percentages for S2BQ1A9F Unhealthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.110231 2.000000 0.876081 9.000000 0.013689 Name: S2BQ1A9F, dtype: float64
Counts for S2BQ1A9F Healthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 26827 1.000000 2118 9.000000 250 Name: S2BQ1A9F, dtype: int64
Percentages for S2BQ1A9F Healthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.918890 1.000000 0.072547 9.000000 0.008563 Name: S2BQ1A9F, dtype: float64
Counts for S2BQ1A9I Unhealthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 1260 2.000000 4240 9.000000 52 Name: S2BQ1A9I, dtype: int64
Percentages for S2BQ1A9I Unhealthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.226945 2.000000 0.763689 9.000000 0.009366 Name: S2BQ1A9I, dtype: float64
Counts for S2BQ1A9I Healthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 21592 1.000000 7418 9.000000 185 Name: S2BQ1A9I, dtype: int64
Percentages for S2BQ1A9I Healthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.739579 1.000000 0.254085 9.000000 0.006337 Name: S2BQ1A9I, dtype: float64
The program and code is shown here:
# -*- coding: utf-8 -*- """ Spyder Editor
This is a temporary script file. """
import pandas import numpy
data = pandas.read_csv('nesarc_pds.csv', low_memory=False)
# bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x)
print(len(data)) #number of observations (rows) print(len(data.columns)) # number of variables (columns)
#Converts to numeric data['S1Q16'] = pandas.to_numeric(data['S1Q16'], errors='coerce') data['S2BQ1A9B'] = pandas.to_numeric(data['S2BQ1A9B'], errors='coerce') data['S2BQ1A9D'] = pandas.to_numeric(data['S2BQ1A9D'], errors='coerce') data['S2BQ1A9F'] = pandas.to_numeric(data['S2BQ1A9F'], errors='coerce') data['S2BQ1A9I'] = pandas.to_numeric(data['S2BQ1A9I'], errors='coerce')
#Old way to convert to numeric but was getting future warning error with this code #data['S1Q16'] = data['S1Q16'].convert_objects(convert_numeric=True) #data['S2BQ1A9B'] = data['S2BQ1A9B'].convert_objects(convert_numeric=True) #data['S2BQ1A9D'] = data['S2BQ1A9D'].convert_objects(convert_numeric=True) #data['S2BQ1A9F'] = data['S2BQ1A9F'].convert_objects(convert_numeric=True) #data['S2BQ1A9I'] = data['S2BQ1A9I'].convert_objects(convert_numeric=True)
#This section of code displays counts and percentages of raw data print('') print('Counts for S1Q16 – Self-Perceived Current Health, 1=Excellent, 2= Very Good, 3=Good, 4=Fair, 5=Poor, 9=Unknown') cHealth = data["S1Q16"].value_counts(sort=False) print (cHealth) print('') print('Percentages for S1Q16 – Self-Perceived Current Health, 1=Excellent, 2= Very Good, 3=Good, 4=Fair, 5=Poor, 9=Unknown') pHealth = data["S1Q16"].value_counts(sort=False, normalize=True) print (pHealth) print('')
print('Counts for S2BQ1A9B – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cShake = data["S2BQ1A9B"].value_counts(sort=False, dropna=False) print (cShake) print('') print('Percentages for S2BQ1A9B – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pShake = data["S2BQ1A9B"].value_counts(sort=False, dropna=False, normalize=True) print (pShake) print('')
print('Counts for S2BQ1A9D – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cPuke = data["S2BQ1A9D"].value_counts(sort=False, dropna=False) print (cPuke) print('') print('Percentages for S2BQ1A9D – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pPuke = data["S2BQ1A9D"].value_counts(sort=False, dropna=False, normalize=True) print (pPuke) print('')
print('Counts for S2BQ1A9F – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cSweat = data["S2BQ1A9F"].value_counts(sort=False, dropna=False) print (cSweat) print('') print('Percentages for S2BQ1A9F – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pSweat = data["S2BQ1A9F"].value_counts(sort=False, dropna=False, normalize=True) print (pSweat) print('')
print('Counts for S2BQ1A9I – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cHeadache = data["S2BQ1A9I"].value_counts(sort=False, dropna=False) print (cHeadache) print('') print('Percentages for S2BQ1A9I – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pHeadache = data["S2BQ1A9I"].value_counts(sort=False, dropna=False, normalize=True) print (pHeadache) print('')
#Subset of data that is only people with Poor or Fair Health and do not obstain from alcohol I will call this Unhealthy sub1=data[(data['S1Q16']>=4) & (data['S1Q16']<=5) & (data['S2BQ1A9B']>0)] Unhealthy = sub1.copy()
#Subset of data that is only people with Excellent, Very Good & Good Health and do not obstain from alcohol. I will call this Healthy sub2=data[(data['S1Q16']<4) & (data['S2BQ1A9B']>0)] Healthy = sub2.copy()
#This section of code will print out results for each of the hangover responses first for the unhealthy data set, and second for the healhty print('Counts for S2BQ1A9B Unhealthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cShake = Unhealthy["S2BQ1A9B"].value_counts(sort=False, dropna=False) print (cShake) print('') print('Percentages for S2BQ1A9B Unhealthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pShake = Unhealthy["S2BQ1A9B"].value_counts(sort=False, dropna=False, normalize=True) print (pShake) print('')
print('Counts for S2BQ1A9B Healthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cShake = Healthy["S2BQ1A9B"].value_counts(sort=False, dropna=False) print (cShake) print('') print('Percentages for S2BQ1A9B Healthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pShake = Healthy["S2BQ1A9B"].value_counts(sort=False, dropna=False, normalize=True) print (pShake) print('')
print('Counts for S2BQ1A9D Unhealthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cPuke = Unhealthy["S2BQ1A9D"].value_counts(sort=False, dropna=False) print (cPuke) print('') print('Percentages for S2BQ1A9D Unhealthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pPuke = Unhealthy["S2BQ1A9D"].value_counts(sort=False, dropna=False, normalize=True) print (pPuke) print('')
print('Counts for S2BQ1A9D Healthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cPuke = Healthy["S2BQ1A9D"].value_counts(sort=False, dropna=False) print (cPuke) print('') print('Percentages for S2BQ1A9D Healthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pPuke = Healthy["S2BQ1A9D"].value_counts(sort=False, dropna=False, normalize=True) print (pPuke) print('')
print('Counts for S2BQ1A9F Unhealthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cSweat = Unhealthy["S2BQ1A9F"].value_counts(sort=False, dropna=False) print (cSweat) print('') print('Percentages for S2BQ1A9F Unhealthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pSweat = Unhealthy["S2BQ1A9F"].value_counts(sort=False, dropna=False, normalize=True) print (pSweat) print('')
print('Counts for S2BQ1A9F Healthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cSweat = Healthy["S2BQ1A9F"].value_counts(sort=False, dropna=False) print (cSweat) print('') print('Percentages for S2BQ1A9F Healthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pSweat = Healthy["S2BQ1A9F"].value_counts(sort=False, dropna=False, normalize=True) print (pSweat) print('')
print('Counts for S2BQ1A9I Unhealthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cHeadache = Unhealthy["S2BQ1A9I"].value_counts(sort=False, dropna=False) print (cHeadache) print('') print('Percentages for S2BQ1A9I Unhealthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pHeadache = Unhealthy["S2BQ1A9I"].value_counts(sort=False, dropna=False, normalize=True) print (pHeadache) print('')
print('Counts for S2BQ1A9I Healthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') cHeadache = Healthy["S2BQ1A9I"].value_counts(sort=False, dropna=False) print (cHeadache) print('') print('Percentages for S2BQ1A9I Healthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol') pHeadache = Healthy["S2BQ1A9I"].value_counts(sort=False, dropna=False, normalize=True) print (pHeadache) print('')
My FULL output from my code is included here:
runfile('C:/Users/210041621/Desktop/Week 2 Assignment.py', wdir='C:/Users/210041621/Desktop/Python Files') 43093 3008
Counts for S1Q16 – Self-Perceived Current Health, 1=Excellent, 2= Very Good, 3=Good, 4=Fair, 5=Poor, 9=Unknown 1 12316 2 12424 3 10649 4 5219 5 2219 9 266 Name: S1Q16, dtype: int64
Percentages for S1Q16 – Self-Perceived Current Health, 1=Excellent, 2= Very Good, 3=Good, 4=Fair, 5=Poor, 9=Unknown 1 0.285800 2 0.288307 3 0.247117 4 0.121110 5 0.051493 9 0.006173 Name: S1Q16, dtype: float64
Counts for S2BQ1A9B – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 8266 2.000000 32785 1.000000 1777 9.000000 265 Name: S2BQ1A9B, dtype: int64
Percentages for S2BQ1A9B – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 0.191818 2.000000 0.760796 1.000000 0.041236 9.000000 0.006149 Name: S2BQ1A9B, dtype: float64
Counts for S2BQ1A9D – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 8266 2.000000 24031 1.000000 10529 9.000000 267 Name: S2BQ1A9D, dtype: int64
Percentages for S2BQ1A9D – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 0.191818 2.000000 0.557654 1.000000 0.244332 9.000000 0.006196 Name: S2BQ1A9D, dtype: float64
Counts for S2BQ1A9F – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 8266 2.000000 31732 1.000000 2732 9.000000 363 Name: S2BQ1A9F, dtype: int64
Percentages for S2BQ1A9F – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 0.191818 2.000000 0.736361 1.000000 0.063398 9.000000 0.008424 Name: S2BQ1A9F, dtype: float64
Counts for S2BQ1A9I – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 8266 2.000000 25867 1.000000 8686 9.000000 274 Name: S2BQ1A9I, dtype: int64
Percentages for S2BQ1A9I – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol nan 0.191818 2.000000 0.600260 1.000000 0.201564 9.000000 0.006358 Name: S2BQ1A9I, dtype: float64
Counts for S2BQ1A9B Unhealthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 499 2.000000 5007 9.000000 46 Name: S2BQ1A9B, dtype: int64
Percentages for S2BQ1A9B Unhealthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.089878 2.000000 0.901837 9.000000 0.008285 Name: S2BQ1A9B, dtype: float64
Counts for S2BQ1A9B Healthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 27734 1.000000 1276 9.000000 185 Name: S2BQ1A9B, dtype: int64
Percentages for S2BQ1A9B Healthy Only – Have you ever had Body Shakes from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.949957 1.000000 0.043706 9.000000 0.006337 Name: S2BQ1A9B, dtype: float64
Counts for S2BQ1A9D Unhealthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 1604 2.000000 3901 9.000000 47 Name: S2BQ1A9D, dtype: int64
Percentages for S2BQ1A9D Unhealthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.288905 2.000000 0.702630 9.000000 0.008465 Name: S2BQ1A9D, dtype: float64
Counts for S2BQ1A9D Healthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 20095 1.000000 8916 9.000000 184 Name: S2BQ1A9D, dtype: int64
Percentages for S2BQ1A9D Healthy Only – Have you ever had Nausea from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.688303 1.000000 0.305395 9.000000 0.006302 Name: S2BQ1A9D, dtype: float64
Counts for S2BQ1A9F Unhealthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 612 2.000000 4864 9.000000 76 Name: S2BQ1A9F, dtype: int64
Percentages for S2BQ1A9F Unhealthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.110231 2.000000 0.876081 9.000000 0.013689 Name: S2BQ1A9F, dtype: float64
Counts for S2BQ1A9F Healthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 26827 1.000000 2118 9.000000 250 Name: S2BQ1A9F, dtype: int64
Percentages for S2BQ1A9F Healthy Only – Have you ever had Sweating or High Heart Beat from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.918890 1.000000 0.072547 9.000000 0.008563 Name: S2BQ1A9F, dtype: float64
Counts for S2BQ1A9I Unhealthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 1260 2.000000 4240 9.000000 52 Name: S2BQ1A9I, dtype: int64
Percentages for S2BQ1A9I Unhealthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 1.000000 0.226945 2.000000 0.763689 9.000000 0.009366 Name: S2BQ1A9I, dtype: float64
Counts for S2BQ1A9I Healthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 21592 1.000000 7418 9.000000 185 Name: S2BQ1A9I, dtype: int64
Percentages for S2BQ1A9I Healthy Only – Have you ever had a Headache from a Hangover, 1=Yes, 2=No, 9=Unknown, NaN=N/A Abstainer from Alcohol 2.000000 0.739579 1.000000 0.254085 9.000000 0.006337 Name: S2BQ1A9I, dtype: float64
0 notes
Text
Week 1 Assignment
After reviewing all of the data sets I found the NESARC dataset to be the most interesting to me. It is the most versatile, and therefore I thought had the best chance for me to do original work. The only thing I didn’t like about it is that the data appears to be back from the early 2000’s, so I will be careful to pick a topic that is more timeless.
Next I started paging through the codebook to determine the first topic I was interested in. Upon an initial scan I saw that there was a lot of information concerning alcohol use. This has always been a topic I found interesting as, especially in the USA, as it is a major part of college/university life and also (right or wrong!) tends to be a big part of the social experience with work colleagues. To start my codebook, I picked the section which contained information about physical effects of alcohol wearing off (hangovers).
Next I searched for an association I found interesting to study. I found it early in the book in the section that dealt with overall health, specifically self-perceived health. Thus generated my research question and hypothesis;
I believe there is an association between having fair or poor health, and a greater occurrence of the onset of physical hangovers in the area of shaking, nausea, sweating or heartbeat, or headaches.
Literature Review
Search Terms: Hangover Cause Health Related
Alcohol Hangover Mechanisms and Mediators Robert Swift, M.D., Ph.D.; and Dena Davidson, Ph.D.
“Generally, the greater the amount and duration of alcohol consumption, the more prevalent is the hangover, although some people report experiencing a hangover after drinking low levels of alcohol (i.e., one to three alcoholic drinks), and some heavy drinkers do not report experiencing hangovers at all.”
The effects of personality on alcohol were discussed, but not overall health. I found this literature very interesting because it hints at the overall challenge to find associations between hangovers and other factors.
Alcohol Consumption and Blood Pressure — Kaiser-Permanente Multiphasic Health Examination Data; Arthur L. Klatsky, M.D., Gary D. Friedman, M.D., M.S., Abraham B. Siegelaub, M.S., and Marie J. Gérard, M.D.
A study was performed of 83,947 men and women to correlate the effects between alcohol usage, and high blood pressure. The general conclusion is that high consumption of alcohol contributes to higher blood pressure however the finding were ultimately considered inconclusive due to a combination of small number of users, small magnitude of blood pressure elevation in drinkers and limitation of assessment of compounding factors.
I would summarize the findings of prior work to say that there has been limited research done in the area of correlating hangovers to overall health. Some studies have been inconclusive and although there is plenty of discussion on the topic, little true conclusive scientific research has been completed.
CodeBook
Page 23
Self-Perceived Health
Page 51-54
EVER SHAKE WHEN EFFECTS OF ALCOHOL WERE WEARING OFF
EVER HAVE NAUSEA WHEN EFFECTS OF ALCOHOL WERE WEARING OFF
EVER SWEAT OR HEART BEAT FAST WHEN EFFECTS OF ALCOHOL WERE WEARING OFF
EVER HAVE VERY BAD HEADACHES WHEN EFFECTS OF ALCOHOL WERE WEARING OFF
0 notes