yashtron
yashtron
Untitled
4 posts
Don't wanna be here? Send us removal request.
yashtron · 5 years ago
Text
Visualizing Data
Preview
In the final assignment are presented visualized data, taken from NESARC codebook, in order to examine the correlation between cannabis use and mental disorders such as major depression and general anxiety diagnosed in the last 12 months in a sample of 9535 U.S. young adults, aged from 18 to 30 years old. I used Spyder IDE to create both univariate and bivariate bar charts for the selected variables. More specifically, with variable ‘AGE’ between 18 and 30, I built unvariate graphs for categorical variables ‘S3BQ1A5’ which represents cannabis use, ‘S3BD5Q2E’ which is frequency of this use, ‘MAJORDEP12’ that stands for major depression diagnosis in the last 12 months and ‘GENAXDX12’ that indicates general anxiety diagnosis in the same period. In addition, you will find another univariate graph for the quantitative variable ‘NUMJOPMOTH_EST’, which I created in my previous assignment by multiplying frequency of cannabis use and average quantity of joints smoked, in order to estimate the total number of joints smoked per month by the individuals. As far as the bivariate graphs are concerned, I chose to examine visualized the association between cannabis use (C->C) and both mentioned disorders and additionally the relationship between frequency (C->C) and quantity (Q->C) of this use with both depression and anxiety. Thus, bar charts were created combining variables ‘S3BQ1A5’ (cannabis use), ‘S3BD5Q2E’ (frequency of use) and ‘NUMJOPMOTH_EST’ (quantity of joints) with variables ‘MAJORDEP12’ (major depression) and ‘GENAXDX12’ (general anxiety). Concluding, for the quantitative variable both center and spread were measured and describe function was used in order to examine useful information, about the selected categorical variables.
Output
Univariate graphs:
Tumblr media
A random sample of 9535 U.S. young adults, aged 18-30, were asked, as a part of NESARC survey, the following question: “Have you ever used cannabis?” A percentage of 25.29% (or 7042 individuals) answered “Yes”, whereas 73.85% (or about 2500 individuals) answered “No” which was the most frequent answer. Also a significantly small percentage of 0.84%, fell into category 9 (“Unknown“) which is our missing data.
Tumblr media
To the question of “How often did you use cannabis when using the most?”, the top answer was “Every day”, since 534 individuals fell into this category, followed by “Once a year” category with approximately 400 individuals. Less than 100 people chose “7-11 times per year” category, which was the least frequent answer.
Tumblr media
Of the total number of participants (18-30) who answered “Yes” to the question of cannabis use, only those who were smoking marijuana in last 12 months and prior were taken into consideration for the next two questions.
To the question of “Have you been diagnosed with non-hierarchical major depression in the last 12 months?”, about 660 participants or 79.04% answered “No” which was the most frequent answer, whereas 175 or 20.95% fell into “Yes”.
For the question, ”Have you been diagnosed with non-hierarchical generalized anxiety in the last 12 months?”, 802 individuals or 96.04% answered “No“ that was our top answer, while only 33 or 3.95% chose “Yes“.
Tumblr media
For the estimated number of joints smoked per month by cannabis users, ages 18-30, it noticeable from the graph that there was a skewed-right distribution. The spread or the standard deviation of the variable is extremely large which indicates a large variety of answers among the participants. The three main numerical measures of the center of the distribution are the mode, the median, and the mean. Here we can see that mode is equal to 0.1 and it was the most common occurring value in the distribution, which means that most of participants smoked less than 1 joint per month. The mean is equal to 70.1 which indicates that cannabis users smoked about 70 joints per month on average and the median or the middle value is 6.
Tumblr media
Estimated number of joints smoked per month binned to groups as illustrated above. Another way of visualizing the distribution of variable ‘NUMJOPMOTH_EST’. We can see that most individuals, about 990, smoked less than one joint per month and the shape of the distribution is right-skewed.
Bivariate graphs:
Tumblr media Tumblr media
In the bar charts above we can see the relationship between quantity of joints smoked per month by cannabis user, aged 18 to 30 years old, and both major depression (first) and general anxiety (second) diagnoses in the last 12 months (Q->C). The explanatory variable is quantity of joints (quantitative), while the response variables are depression and anxiety diagnoses (categorical). There is a slightly increasing trend in the first graph, but not in the second.
Tumblr media Tumblr media
In the graphs presented above we can see the correlation between frequency of cannabis use and both major depression and general anxiety (C->C). The explanatory variable is frequency of cannabis use (categorical), while the response variables are depression and anxiety diagnoses (categorical). Again, for the first graph we have a right-skewed distribution, which indicates that the more an individual smoked cannabis, the better were the chances to get diagnosed with depression. However, we cannot support the same as far as anxiety is concerned, which appears to have a more raffle and abnormal distribution.
Tumblr media Tumblr media
The graphs presented above illustrate the association between cannabis use and both major depression and general anxiety diagnoses in young adults, aged from 18 to 30 years old, in the last 12 months (C->C). The explanatory variable is cannabis use (categorical) and the response variables are depression and anxiety diagnoses (categorical).
Summary
To sum up, looking through the the last graphs, it can be noticed that there are some slight differences between the percentages of cannabis users compared to non-users. Major depression cases in cannabis users young adults (20.95%) seem to be slightly more than double compared to those of non-users (8.42%).In addition, general anxiety diagnoses in cannabis users (3.95%) appear to be also marginally more than double in comparison to the non-users (1.63%). It could be supported that there is a relative association between cannabis and such mental disorders, thus cannabis use increases the likelihood of meeting criteria for depression or general anxiety in the future. However, the sample is extremely small and it is unclear how representative it is, making the findings less reliable, since a large amount of error may be involved.
0 notes
yashtron · 5 years ago
Text
Data Management and Visualization - Week 3 - Assignment
SAS code
Tumblr media
Result
Tumblr media
There are three dominant fluidized ejecta morphologies of Martian craters:
·       Single Layer Ejecta(SLE)
·       Double Layer Ejecta(DLE)
·       Multiple Layer Ejecta(MLE)
Nadine G. Barlow and Carola B. Perez(2003) conducted a study of the distribution of the above three morphologies within the ±60° latitude zone on Mars and confirmed that the SLE morphology is the most common and the DLE and MLE morphologies are much less common.
One goal of my research is to study to distribution of ejecta morphologies again. This week, I collapsed the category responses of variable Morthology_Ejecta_1 and created four new response groups:
·       EjectaMorphology1: DLE
·       EjectaMorphology2: MLE
·       EjectaMorphology3: SLE
·       EjectaMorphology4: Other
The updated frequency result shows that:
1.      For craters located within ±60° latitude zone and having >=8km diameter, SLE is the most common displayed ejecta morphology if we ignore other types.
2.      DLE and MLE are less common than SLE.
3.      The primary conclusion is in accordance with previous research.
0 notes
yashtron · 5 years ago
Text
Data Management and Visualization week 2
The second assignment of the course Data Management and Visualization is to write a program in SAS or Python and to perform univariate analysis on the variables I choose in my research for the association between CO2 emissions and urbanization.
I wrote two different posts that cover this weeks assignment. The first post is: Univariate Analysis. Here you can find the results of the analysis including the frequency distribution of the three variables (co2emissions, urban rate and income per person) of my study and the output of my Python program.
The second post, Distributions with Python, explains the Python program I wrote. The complete program can be seen here.
#!/usr/bin/env python3 # -*- coding: utf-8 -*- import pandas # Statics DATA_SET = 'gapminder.csv' # GapMinder indicators GP_COUNTRY = 'country' GP_INCOMEPERPERSON = 'incomeperperson' GP_CO2EMISSIONS = 'co2emissions' GP_URBANRATE = 'urbanrate' def load_data_set(filename):    """    Loads a data set from the file system    @param filename: the name of the CSV file that contains the data set    """    print('Loading data set "' + filename + '"...')    # low_memory=False prevents pandas to try to determine the data type of each value    return pandas.read_csv(filename, low_memory=False) def load_gapminder_data_set():    """    Load the GapMinder data set and prepare the columns needed.    """    data = load_data_set(DATA_SET)        # The number of observations    print("Number of records: " + str(len(data)))        # The number of variables    print("Number of columns: " + str(len(data.columns)))    # convert the values of co2emissions, urbanrate and incomeperperson to numeric    data[GP_CO2EMISSIONS] = data[GP_CO2EMISSIONS].convert_objects(convert_numeric=True)    data[GP_URBANRATE] = data[GP_URBANRATE].convert_objects(convert_numeric=True)    data[GP_INCOMEPERPERSON] = data[GP_INCOMEPERPERSON].convert_objects(convert_numeric=True)    return data     def groupby(data_set, variables):    """    Get the distributed values of a variable of the data_set.    @param data_set: the data set to examine.    @param variables: the variable, or list of variables, of interest.    @return a tuple of 2 pandas.core.series.Series objects where the first            object is the absolute distribution over the values of            the given variable(s) and the second list is their            precentages as part of the total number of rows.    """    counts = data_set.groupby(variables).size()    return counts, counts * 100 / len(counts) def print_distributions(data, variable):    """    Prints the distribution of the values of a specific variable.    @param data: the data set to examine.    @param variable: the variable of interest.    """    distribution = groupby(data, variable)        print("Counts for " + variable + ":")    print(distribution[0])    print("Percentages for " + variable + ":")    print(distribution[1])    print("----------------------------") if __name__ == "__main__":    data = load_gapminder_data_set()    print_distributions(data, GP_CO2EMISSIONS)    print_distributions(data, GP_URBANRATE)    print_distributions(data, GP_INCOMEPERPERSON)
0 notes
yashtron · 5 years ago
Text
Data set: GapMinder Data.
Research question: Is a number of breast cancer cases associated with a fertility rate?
Items included in the CodeBook:
for fertility rate:
Children per woman (total fertility) Children per woman (total fertility), with projections
for breast cancer:
Breast cancer, deaths per 100,000 women Breast cancer, new cases per 100,000 women Breast cancer, number of female deaths Breast cancer, number of new female cases
Literature Review:
From original source: http://ww5.komen.org/KomenPerspectives/Does-pregnancy-affect-breast-cancer-risk-and-survival-.html
The lower her risk of breast cancer tends to be,the more children a woman has given birth to. Women have a slightly higher risk of breast cancer who have never given birth compared to women who have had more than one child.
The hypothesis to explore using GapMinder data set:  The lower risk of breast cancer,higher the fertility rate.
1 note · View note