Don't wanna be here? Send us removal request.
Text
Assignment 4 - Test a Logistic Regression Model
The written program is through Spyder (image 1 and 2). The research question here is looking at if the individual received food stamps in the last 12 months (categorical response variable, 0 = No, 1= Yes) and depression diagnosis in the last 12 months (categorical explanatory variable, 0 = No, 1= Yes) with a confounding variable of total number of beers consumed on days when beer is drank (quantitative explanatory variable). The subset of age 25 to 45 was pulled out from the dataset to focus on a smaller population of adults.
The output from Spyder shows 2 results of logistic regression and odds ratios with confidence intervals (image 3 and 4). The first logistic regression and odds ratio with confidence intervals shown is from the categorical response variable of individual received food stamps in the last 12 months and categorical explanatory variable of depression diagnosis in the last 12 months. the second logistic regression and odds ratio with confidence intervals shown is from the categorical response variable of individual received food stamps in the last 12 months and categorical explanatory variable of depression diagnosis in the last 12 months with a quantitative explanatory confounding variable of total number of beers consumed on days when beer is drank. The means of the confounding variable of total number of beers consumed on days when beer is drank was also centered.
Analyzing results from the first logistic regression, it is seen that the p-value is less than 0.0001 (p = 0.000) which shows that it is statistically significant where the null hypothesis of no association can be rejected. The odds ratio is 2.98 which is greater than 1. So this means that there is a higher possibility of individuals receiving food stamps in those that are diagnosed with depression. The results also shows a 95% confidence interval of 2.37 to 3.75.
Analyzing results from the second logistic regression, which takes into the account the confounding variable of the total number of beers consumed on days when beer is drank, it is seen that the p-value is also less than 0.0001 (p = 5.425e-20). this shows that even with the confounding variable, there is an association between receiving food stamps and depression diagnosis. So the case is statistically significant and can rejected the null hypothesis of no association. The odds ratio for when the confounding variable is taken into account shows that major depression has odds ratio of 3.08 whereas total number of beers consumed on days when beer is drank has odds ratio of 1.07. Both odds ratio is greater than 1 showing that there is a higher possibility of receiving food stamps in those that are diagnosed with depression and drink higher number of beers. The confidence interval for both the explanatory variables of depression diagnosis and total number of beers consumed on days when beer is drank is 2.38 to 3.77 and 1.04 to 1.10 respectively.
0 notes
Text
Assignment 3 - Test a Multiple Regression Model
The written program is through Spyder (image 1 and 2). The research question here is looking at the association between personal income in the last 12 months (quantitative response variable) and age at onset of first depression episode (quantitative explanatory variable with a confounding variable of total number of beers consumed on days when beer is drank (quantitative explanatory variable). The subset of age 25 to 45 was pulled out from the dataset to focus on a smaller population of adults.
The output from Spyder shows the OLS regression results of personal income in the last 12 months and age at onset of first depression episode as well as the OLS regression results of personal income in the last 12 months and age at onset of first depression episode with the confounding variable total number of beers consumed on days when beer is drank(image 3, 4, 5, 6 and 7). The QQ plot, Standardized Residual plot, Regression plots as well as the Leverage plot is shown. The means of both the quantitative variables were centered.
Based on the above OLS regression results, it is seen that with and without taking into account the confounding variable of total number of beers consumed on days when beer is drank, the association between personal income in the last 12 months and age at onset of first depression episode is statistically significant with p value of 0.000822 and 2.78e-86 respectively. This shows that there is no evidence of confounding with the results rejected the null hypothesis of no association between between personal income in the last 12 months and age at onset of first depression episode and confounding variable of total number of beers consumed on days when beer is drank.
The QQ plot shows that the regression model is pretty well fitted on the best fit line. The mean. As for the standardized residual plot, it is clearly seen that the distribution is somewhat skewed. The leverage plot shows that the leverage is close to 0 even though it includes data points that may be away from the mean.
0 notes
Text
Assignment 2 - Test a Basic Linear Regression Model
The written program is through Spyder (image 1). The research question in this case looks at the association between the categorical explanatory variable depression diagnosis in the last 12 months (0 = No and 1 = Yes) and quantitative response variable of personal income in the last 12 months amongst adults of age 25 to 45. The subset of age 25 to 45 was pulled out from the dataset to focus on a smaller population of adults.
The output from Spyder shows the mean of personal income in the last 12 months in those with and without depression, the OLS regression model as well as the linear regression model (image 2, 3 and 4). The F-statistics, p-value, intercept and slope is also displayed in the models listed.
Based on the bar plot of the mean of personal income comparing those diagnosed with depression, it is seen that the mean of personal income of those not diagnosed with depression is higher ($ 32,310.71) than those diagnosed with depression ($ 25,613.75). The OLS regression model shows a high F-statistics of 40.08 and a p-value of 2.49e-10 which means that it is statistically significant in rejected the null hypothesis of no association between personal income and depression diagnosis. From the linear regression model, the intercept and slopes equals to 3.23e+4 and -6696.96 respectively. The R-squared value in this model is 0.002 which means that only a small amount of variability is being captured in the model
0 notes
Text
Assignment 1 - Writing About Your Data
SAMPLE
The sample data was pulled out from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) study that was conducted by the National Institute of Alcohol Abuse and Alcoholism (NIAA) in 2001/2002. These data were collected through surveys of a population size (N) of 43,093 participants. The population represents civilian, non-institutional adult population of the United States, as well as residents of District of Columbia, Alaska and Hawaii. Persons living in households, military personnel living off base and persons residing in boarding/rooming houses, non-transient hotels/motels, shelters, facilities for housing workers, college quarters and group homes were also included as well as sampling of Blacks, Hispanics and young adults aged 18 to 24 years old to better represent minority groups.
The sample of focus in this study includes participants of ages 25-35 years old where it looks into the association between depression diagnosis and how often beer is drank in the last 12 months.
PROCEDURE
The purpose of the collection of the NESARC dataset was to determine the prevalence, incidence, stability and recurrence of alcohol use disorders and disabilities associated with it in the general population of the United States. The dataset was originally collected in the years 2001 to 2002 with either face-to-face or computer-assisted personal interviews in the respondent’s home where each household had one sample adult that was randomly selected for interview with written informed consent. The response rate of the NESARC dataset was at 81%.
VARIABLE
Major Depression amongst those that were diagnosed in the last 12 months were evaluated with NIAA, Alcohol Use Disorder and Associated Disabilities Interview Schedule - DSM-IV (AUDADIS-IV) (Grant et al., 2002; Grant, Harford, Dawson, & Chou, 1995). The alcohol section of AUDADIS-IV presents information on alcohol consumption that includes the type of alcohol, frequency and quantity of consumption, as well as DSM-IV of alcohol dependence.
Current alcohol consumption was evaluated (”if alcohol was drank in the last 12 months? And categorized into a binary system of 1 for Yes and 2 for No”) through alcohol consumption frequency (”How often alcohol was drank in the last 12 months? And subcategorized into 10 groups”). A quantitative variable of largest number of alcohol consumed on days when drank alcohol were also looked into ranging from 1 to 98.
0 notes
Text
Assignment 4 - Testing a Potential Moderator
The written program uses Spyder (image 1 and 2). The research question here looks at the association between 2 quantitative variables age and income per person in the last 12 months with a moderator depression diagnosis in the last 12 months. A subset of age 25 to 35 and income per person of less than $500,000.00 was pulled out from the dataset to focus on the young adults population.
The output from Spyder shows 3 different scatterplots of association between age and income regardless of depression diagnosis, no depression diagnosis and with depression diagnosis respectively. The pearson correlation of each is also presented (image 3 and 4).
According to the 3 scatterplots above, it is hard to come to a conclusion if there is an association between the age and personal income regardless of depression diagnosis, in those without depression and in those with depression respectively. When looking at the presented pearson correlation coefficients (r) and p-value, it is seen that all three are statistically significant and have a somewhat weak positive relationship.
0 notes
Text
Assignment 3 - Generating a Correlation Coefficient
The written program uses Spyder (image 1). The research question looks at the association between age and income per person in the last 12 months. A subset of ages 20 to 30 and income per person of less than $500,000.00 was pulled out from the dataset to refine the population of survey to young adults.
The output from Spyder displays the scatterplot of the association between age and income per person as well as shows the pearson correlation coefficient and p-value associated with the analysis (image 2).
According to the scatterplot, it is hard to tell if there is a negative or positive association between age and income per person in the last 12 months. Though, when taking into account the pearson correlation coefficient of 0.305 (pretty close to 0) it can be concluded that though there is a positive relationship between the 2 quantitative variables, it is a weak relationship. As for the p-value of 3.96e-10 (which is less that 0.05), it can be said that the results are statistically significant where the null hypothesis of no association between the 2 groups can be rejected.
0 notes
Text
Assignment 2 - Running a Chi-Square Test of Independence
Spyder was used in writing the program (image 1, 2, 3 and 4). The association of being diagnosed with depression and alcohol dependency is looked into based on the NESARC study.
The output on Spyder displays the Chi-Square test of independence as well as analyzing combinations of groups of alcohol dependence with one another giving a total of 6 groups (0 to 1, 0 to 2, 0 to 3, 1 to 2, 1 to 3 and 2 to 3). Each groups shows the contingency table, percentages, as well as the chi-square test calculations.
The bonferroni adjustment in this case is 0.008. So in order to reject the null hypothesis of no association between depression diagnosis and alcohol dependence, the p-value has to be not only less than 0.05 but 0.008. Based on the 6 groups of 0 to 1, 0 to 2, 0 to 3, 1 to 2, 1 to 3 and 2 to 3, it can be seen that the p-values calculated are 3.59e-6, 9.39e-28, 1.46e-87, 4.74e-9, 3.55e-24 and 0.024 respectively. With this analysis, it can concluded that groups of 0 to 1, 0 to 2, 0 to 3, 1 to 2 and 1 to 3 are statistically significant (p<0.008) and can reject the null hypothesis of no association between depression diagnosis and alcohol dependence. Whereas, group 2 to 3 having a p-value of 0.024, though is less than 0.05, is not less than 0.008 causing it to not be statistically significant and fail to reject null hypothesis.
0 notes
Text
Assignment 1 - Running An Analysis Of Variance
The written program using Spyder (image 1). The research question goes into the analysis of the association between alcohol dependence and how often beer is drank in those that are diagnosed with depresssion.
The output on Spyder that displays the F-statistics and p-value (image 2). The mean and standard deviation of number of beers drank in the last year associated with alcohol dependence in those that are diagnosed with depression are shown.
The null hypothesis (H0) in this case would be that there is no association of alcohol dependence and number of beers drank within the last year in those diagnosed with depression. Whereas, the alternative hypothesis (HA) would be that there is a relationship between alcohol dependence and number of beers drank within the last year in those diagnosed with depression. As seen in the ANOVA test, the p-value of groups 0 to 1, 0 to 2, 0 to 3, 1 to 3 and 2 to 3 are less than 0.05, which means that the test is significantly significant and the null hypothesis (H0) can be rejected where there is a relationship between alcohol dependence and number of beers drank within the last year in those diagnosed with depression.
0 notes
Text
Assignment 4 - Creating Graphs for Data
Written program using Spyder from assignment 2 and 3 that goes into visualizing of data (image 1, 2 and 3). The research question looks into the association of the thought of death in those suffering from depression in relation to age. The question is then refined to focus on young people of age 17-24 specifically.
The output on Spyder displays visualization of data as well as repetition of some counts and percentages from previous assignments giving a clear understanding of where interpretation of graphs are from (image 4, 5,6 and 7). The usage of bar plots follows the flow chart provided in lecture based on the response and explanatory variables being categorical data.
The first 3 graphs from the from the output shows univariate graphs of each variable based on the subset of age groups 17-24, depression diagnosis (0 = not diagnosed with depression and 1= diagnosed with depression), thoughts of own death (1 = have thought of own death and 2 = never thought of own death) and categorized age group (17-19, 20-22 and 23-24) respectively.
From the univariate graph of depression diagnosis, it is seen that most young people of age groups 17-24 are not diagnosed with depression. The univariate graph thought of own death shows that young people don’t necessarily think of own death as more counts of 2 (never thought of own death) is collected than 1 (have thought of own death). The univariate graph of age group categorization shows that young people in the subject group are mostly in the age group of 20-22.
The following 2 graphs are bivariate graphs. Each looks into the association of different things where firstly it is the association of being diagnosed with depression and having thought of their own death, and second having thought of own death in those that are diagnosed with depression and the age group that they are mainly in.
The first bivariate graph looks into the association of depression diagnosis with thought of own death amongst young people of ages 17-24. It is seen that those that are diagnosed with depression mostly have thought of their own death. This is seen with the mode where there is higher number of 1′s (have thought of own death) collected.
The second bivariate graph looks into the association of categorization of age group of 17-19, 20-22 and 23-24 with having thought of own death in depression diagnosee. From the graph it is clearly seen that it is a somewhat uniformly shaped bar plot where the relationship of those having thought of own death with being diagnosed with depression has similar probabilities occurring in the different categorization of age groups.
0 notes
Text
Assignment 3 - Making Data Management Decisions
Image shows my written program using Spyder continuing from Assignment 1 (image 1 and 2). The research question analyzes the association of the thought of death in those suffering from depression in regards to age. The subset of categorizing young people of ages 18 to 24 that are diagnosed with depression is used to refine the research question to a more specific group.
The output on Spyder displays the count and percentage of the subset and moves on to convert missing values to NaN. A clear conversion of the value 9 (representing unknown) to NaN in the image as well as shows the grouping of the age groups of 18 to 24 into even smaller subgroups of 18-19, 20-22 and 23-24 (image 3, 4 and 5). The main thought behind the categorization is based on ages of young people that are fresh out of high school, college years and going into the workforce.
Based on the frequency distribution of the categorization of the 3 age groups of 18-19, 20-22 and 23-24, it is noticed that 46% of diagnosis of depression is at age group 20-22 which is approximately during college years. Missing analysis that is present would be the graphing in order to give a clearer image of correlation of age groups and depression diagnosis.
0 notes
Text
Assignment 2 - Running First Program
The written program with Spyder (images 1 and 2). The program starts of with the analysis of the association of the thought of death in those suffering from depression in regards to age.
The output on Spyder that displays my variables (thought of death, diagnosed with depression and age). These output shows the counts and percentages of each variable (images 3 and 4). I included subsets that I used to refine my research question which looks into young people of ages 18-24 specifically.
When the subset of young people of ages 18 to 24 are pulled out in relation to being diagnosed with depression in the last 12 months that is non-hierarchical, it is seen that at age 20 with 24% (highest amongst ages 18 to 24) is when young people are most diagnosed with depression. Looking at the counts of the variables age (before being pulled into subset) and comparing it to counts of subset 1, it is obvious that only a small amount of young people in age group 18-24 are both diagnosed with depression in the last 12 months and have thoughts of own death. In variable 1 of thoughts of own death, there also consist of a small amount of observations that are unknown which could be classified as missing data.
0 notes
Text
Assignment 1 - Getting Research Project Started
Reading the summaries of each codebook, I chose to continue on with the NESARC codebook because it has topics regarding depression. Variables that I decided to look into were thought a lot about own death with source code S4AQ4A19 and those diagnosed with depression in the last 12 months with source code MAJORDEP12.
My research question would explore the association of the thought of death in those suffering from depression. After reviewing some literature, a hypothesis obtained from the above research question would be that those that suffer from depression are more likely to think about death.
Through Google Scholar, I used searched terms “depression” and “suicidal thoughts” to obtain information of past literature published on the topic. The below literatures helped in obtaining the above hypothesis.
- https://onlinelibrary.wiley.com/doi/full/10.1111/j.1943-278X.1979.tb00439.x?casa_token=KD9ikco6QaoAAAAA%3ASUSmVp8nxF2gOeV4l_jfDHbeLzKZh7MJGPa67O4hK4OcMLDpJI7Dxlzs7h3HoLwnxFbW5VyBOPQ2SSm1
- https://www.sciencedirect.com/science/article/pii/S1054139X08003376?casa_token=_gQKpFO7dEgAAAAA:D-LY6W3V_Ysvw-WNH_hgXAvkQhoDGdWUJtmYpbKMtb0Bmdqe2KWBMSuDkuFiXyKPQ9jViFP8BdQ
- https://onlinelibrary.wiley.com/doi/abs/10.1007/BF02506948
1 note
·
View note