dmv-pinot-blog
dmv-pinot-blog
Coursera_data management & visualization
2 posts
Don't wanna be here? Send us removal request.
dmv-pinot-blog · 8 years ago
Text
Assignment2:Running Your First Program
PROGRAM:
# -*- coding: utf-8 -*- """ Spyder Editor This is a temporary script file. """ #import packages import pandas import numpy #read the raw data data = pandas.read_csv('gapminder.csv', low_memory = False) print(len(data)) print(len(data.columns)) #setting variables data['alcconsumption'] = data['alcconsumption'].convert_objects(convert_numeric = True) data['hivrate'] = data['hivrate'].convert_objects(convert_numeric = True) data['internetuserate'] = data['internetuserate'].convert_objects(convert_numeric = True) data['polityscore'] = data['polityscore'].convert_objects(convert_numeric = True) data['suicideper100th'] = data['suicideper100th'].convert_objects(convert_numeric = True) data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric = True) #subset data sub1=data[(data['suicideper100th']>=10) & (data['polityscore']>= 0) & (data['alcconsumption']>= 10)] #counts and percentages for each variables print('\nfrequency distritions on all data:\n') alcconsumption = pandas.cut(data['alcconsumption'],5, labels=['very low','low', 'medium','high','very high']) al = alcconsumption.value_counts(sort = False) print('counts for alconsumption') print(al) p_al = alcconsumption.value_counts(sort = False, normalize = True) print('percentages for alconsumption') print(p_al) hivrate = pandas.cut(data['hivrate'],5, labels=['very low','low', 'medium','high','very high']) hr = hivrate.value_counts(sort = False) print('counts for hivrate') print(hr) p_hr = hivrate.value_counts(sort = False, normalize = True) print('percentages for hivrate') print(p_hr) internetuserate = pandas.cut(data['internetuserate'],5, labels=['very low','low', 'medium','high','very high']) ir = internetuserate.value_counts(sort = False) print('counts for internetuserate') print(ir) p_ir = internetuserate.value_counts(sort = False, normalize = True) print('percentages for internetuserate') print(p_ir) polityscore = pandas.cut(data['polityscore'],5, labels=['very low','low', 'medium','high','very high']) ps = polityscore.value_counts(sort = False) print('counts for polityscore') print(ps) p_ps = polityscore.value_counts(sort = False, normalize = True) print('percentages for polityscore') print(p_ps) urbanrate = pandas.cut(data['urbanrate'],5, labels=['very low','low', 'medium','high','very high']) ur = urbanrate.value_counts(sort = False) print('counts for urbanrate') print(ur) p_ur = urbanrate.value_counts(sort = False, normalize = True) print('percentages for urbanrate') print(p_ur) suicideper100th = pandas.cut(data['suicideper100th'],5, labels=['very low','low', 'medium','high','very high']) sp = suicideper100th.value_counts(sort = False) print('counts for suicideper100th') print(sp) p_sp = suicideper100th.value_counts(sort = False, normalize = True) print('percentages for suicideper100th') print(p_sp) #make a copy of my new subsetted data sub2 = sub1.copy() # frequency distritions on new sub2 data frame print('\nfrequency distritions on new sub2 data frame:\n') suicideper100th = pandas.cut(sub2['suicideper100th'],5, labels=['very low','low', 'medium','high','very high']) sp = suicideper100th.value_counts(sort = False) print('counts for suicideper100th') print(sp) p_sp = suicideper100th.value_counts(sort = False, normalize = True) print('percentages for suicideper100th') print(p_sp) polityscore = pandas.cut(sub2['polityscore'],5, labels=['very low','low', 'medium','high','very high']) ps = polityscore.value_counts(sort = False) print('counts for polityscore') print(ps) p_ps = polityscore.value_counts(sort = False, normalize = True) print('percentages for polityscore') print(p_ps) alconsumption = pandas.cut(sub2['alcconsumption'],5, labels=['very low','low', 'medium','high','very high']) al = alconsumption.value_counts(sort = False) print('counts for alcconsumption') print(al) p_al = alconsumption.value_counts(sort = False, normalize = True) print('percentages for alcconsumption') print(p_al)
RESULTS:
frequency distritions on all data:
counts for alconsumption very low     74 low          55 medium       42 high         13 very high     3 Name: alcconsumption, dtype: int64 percentages for alconsumption very low     0.395722 low          0.294118 medium       0.224599 high         0.069519 very high    0.016043 Name: alcconsumption, dtype: float64 counts for hivrate very low     134 low            4 medium         5 high           1 very high      3 Name: hivrate, dtype: int64 percentages for hivrate very low     0.911565 low          0.027211 medium       0.034014 high         0.006803 very high    0.020408 Name: hivrate, dtype: float64 counts for internetuserate very low     73 low          35 medium       37 high         24 very high    23 Name: internetuserate, dtype: int64 percentages for internetuserate very low     0.380208 low          0.182292 medium       0.192708 high         0.125000 very high    0.119792 Name: internetuserate, dtype: float64 counts for polityscore very low     23 low          19 medium       16 high         23 very high    80 Name: polityscore, dtype: int64 percentages for polityscore very low     0.142857 low          0.118012 medium       0.099379 high         0.142857 very high    0.496894 Name: polityscore, dtype: float64 counts for urbanrate very low     31 low          39 medium       49 high         49 very high    35 Name: urbanrate, dtype: int64 percentages for urbanrate very low     0.152709 low          0.192118 medium       0.241379 high         0.241379 very high    0.172414 Name: urbanrate, dtype: float64 counts for suicideper100th very low     79 low          78 medium       24 high          7 very high     3 Name: suicideper100th, dtype: int64 percentages for suicideper100th very low     0.413613 low          0.408377 medium       0.125654 high         0.036649 very high    0.015707 Name: suicideper100th, dtype: float64
frequency distritions on new sub2 data frame:
counts for suicideper100th very low     10 low           7 medium        4 high          1 very high     1 Name: suicideper100th, dtype: int64 percentages for suicideper100th very low     0.434783 low          0.304348 medium       0.173913 high         0.043478 very high    0.043478 Name: suicideper100th, dtype: float64 counts for polityscore very low      1 low           0 medium        1 high          6 very high    15 Name: polityscore, dtype: int64 percentages for polityscore very low     0.043478 low          0.000000 medium       0.043478 high         0.260870 very high    0.652174 Name: polityscore, dtype: float64 counts for alcconsumption very low     7 low          7 medium       7 high         1 very high    1 Name: alcconsumption, dtype: int64 percentages for alcconsumption very low     0.304348 low          0.304348 medium       0.304348 high         0.043478 very high    0.043478 Name: alcconsumption, dtype: float64
SUMMARY:
Because I chose the dataset ‘Gapminder’. The numbers are sucessive, so it’s kind of impossible to do the frenquency work to the raw data. I used the cut methods in pandas package to split the value of every variables into 5 intervals. Then I used these processed data to do the following work.
For alcconsumption, there are small number of countries which have high alcohol consumption. There are almost the same number of countries which have very low, low, and medium alcohol consumption.
For hivrate, the very low HIV rate is very common among these countries, whitch makes sense. Probably, the cut method is not ideal for the HIV rate data. The data should not be split  equally.
For internetuserate, a relatively large number of countries which have very low internet use rate. But for other four categories, the number is almost the same.
For polityscore, much more countries have a very high score. It means that many countries have high level of democracy.
For urbanate, It seems that the frenquency of the urbanization is parallel to normal distribution.
For suicideper100th, whitch is the question that I’m interested in, the rate of suicide among the countries is relatively low. But there is still some countries that have very high suicide rate. This circumustance is really meaningful. We could study the conditions in these countries to figure it out what is going on. On the other hand, the data should not be split  equally I guess.
Then I chose a subset according to the  suicideper100th, polityscore and alcconsumption. What is really interesting is that the distribution of suicide rate ( 'suicideper100th'>=10) is oppisite to the whole data.
0 notes
dmv-pinot-blog · 8 years ago
Text
Assignment1: Getting Your Research Project Started
The first assignment needs me to develop a research question from five code books are provided. I need to tell my classmates which data set that I have chosen and describe the association I’d like to study. Here we go:
Step 1: I decided to choose the data set ‘GapMinder’. I’m interested in the suicide problem. There are some factors related to the economic development indirectly and other factors social changes in the data set. The suicide problem is seemingly related to almost every factors. So I am not sure yet which variables are critical before I do some data cleaning, so I will include all of the variables in my personal code book at the primary stage.
Step 2: Because there have already been so many studies about the relationship between the economic development and the suicide problem. I especially would like to study about the relationship between the individual and the social behaviors ( such as alcconsumption, polityscore, urbanrate, etc.) and the suicide problem. Even though we could say that the individual behaviors are also related to the economic development, I’d rather use the direct and obvious factors: alcohol consumption, HIV rate, internet use, polity score and urban rate.
Step 3: Prepared my own code book.
Step 4: One of reasons for suicide might be depression. The level of depression might be modeled by these factors.
Step 5: Added my second topic into my code book.
Tumblr media
Step 6: Literature review: 
When it comes to alcohol, on the one hand, alcohol abuse and alcohol intoxication are often present in suicidal behavior(Norström & Rossow, 2016). A higher level of depression might bring alcohol abuse. On the other hand, not just alcohol abuse matters. First, alcohol use is usually associated with impulsive suicide attempts(Bryan, Garland, & Rudd, 2016). Second, higher suicide death is also related to a greater frequency and amount of usual alcohol consumption(Yi et al., 2016). 
Some studies found that people living with HIV are more vulnerable to mental health problems (such as depression)(Niu, Luo, Liu, Silenzio, & Xiao, 2016; Quinlivan et al., 2016). This could be used to explain the prevalence of suicidal ideation among them.
Likewise, people (including adolescents and adults) having Internet addiction (IA) have more sleep difficulties. IA with poor sleep quality was magnificently related to suicide attempts(Kim et al., 2017; Lee et al., 2016). Furthermore, people could get suicide methods from the internet, which might change the suicide ideation into suicide action. But there is not only negative aspect. People with depression and suicide attempts also could get help from the internet(Thornton, Handley, Kay‐Lambkin, & Baker, 2017).
Some studies in China concluded that further urbanization might induce stress and other issues which might bring in suicide problem. What’s more, especially in rural areas with insufficient healthcare and social support might have higher suicide rate among older adult(Sha, Yip, & Law, 2017). Similar conclusion also was made in Georgia(Mahon, 2017).
Above all are physical influences. Also, mental moral impacts, such as religious and political practices, might influence adult suicide(ANI & UGWUOKE, 2017). Some studies indicated that obvious and positive relationship between democracy and happiness(Dorn, Fischer, Kirchgässner, & Sousa-Poza, 2005). What’s more, this effect is stronger in countries with an established democratic tradition(Dorn, Fischer, Kirchgässner, & Sousa-Poza, 2007).
However, every factor might have different influence on different people (such as female and male) or different areas (such as developing and developed areas). More conclusive results need more observed data.
References:
ANI, N. R., & UGWUOKE, A. C. (2017). Religious and Political Practices that Influence Adult Suicide. Journal of Science & Computer Education (JOSCED), 2(2), 1–21.
Bryan, C. J., Garland, E. L., & Rudd, M. D. (2016). From impulse to action among military personnel hospitalized for suicide risk: alcohol consumption and the reported transition from suicidal thought to behavior. General Hospital Psychiatry, 41, 13–19.
Dorn, D., Fischer, J. A. V, Kirchgässner, G., & Sousa-Poza, A. (2005). Is it culture or democracy? The impact of democracy, income, and culture on happiness.
Dorn, D., Fischer, J. A. V, Kirchgässner, G., & Sousa-Poza, A. (2007). Is it culture or democracy? The impact of democracy and culture on happiness. Social Indicators Research, 82(3), 505–526.
Kim, K., Lee, H., Hong, J. P., Cho, M. J., Fava, M., Mischoulon, D., … Jeon, H. J. (2017). Poor sleep quality and suicide attempt among adults with internet addiction: A nationwide community sample of Korea. PLoS One, 12(4), e0174619.
Lee, S. Y., Park, E.-C., Han, K.-T., Kim, S. J., Chun, S.-Y., & Park, S. (2016). The association of level of internet use with suicidal ideation and suicide attempts in South Korean adolescents: a focus on family structure and household economic status. The Canadian Journal of Psychiatry, 61(4), 243–251.
Mahon, G. (2017). Urban/Rural Inequalities in Suicide Rates in Georgia, 2008-2013: A county-level analysis.
Niu, L., Luo, D., Liu, Y., Silenzio, V. M. B., & Xiao, S. (2016). The Mental Health of People Living with HIV in China, 1998–2014: A Systematic Review. PloS One, 11(4), e0153489.
Norström, T., & Rossow, I. (2016). Alcohol consumption as a risk factor for suicidal behavior: a systematic review of associations at the individual and at the population level. Archives of Suicide Research, 20(4), 489–506.
Quinlivan, E. B., Gaynes, B. N., Lee, J. S., Heine, A. D., Shirey, K., Edwards, M., … Pence, B. W. (2016). Suicidal Ideation is Associated with Limited Engagement in HIV Care. AIDS and Behavior, 1–10.
Sha, F., Yip, P. S. F., & Law, Y. W. (2017). Decomposing change in China’s suicide rate, 1990–2010: ageing and urbanisation. Injury Prevention, 23(1), 40–45.
Thornton, L., Handley, T., Kay‐Lambkin, F., & Baker, A. (2017). Is a person thinking about suicide likely to find help on the internet? An evaluation of Google search results. Suicide and Life-Threatening Behavior, 47(1), 48–53.
Yi, S.-W., Jung, M., Kimm, H., Sull, J.-W., Lee, E., Lee, K. O., & Ohrr, H. (2016). Usual alcohol consumption and suicide mortality among the Korean elderly in rural communities: Kangwha Cohort Study. Journal of Epidemiology and Community Health,70(8), 778–783.
Step 7: Hypothesis: 
According to past experience and studies, depression, suicide behavior and attempts are related to alcohol consumption, HIV rate, Internet use, political issues and urbanization. I hypothesized that the level of depression and suicide rate have a positive relationship with alcconsumption , HIVrate, and Internetuserate ; have a negative relationship with urbanrate and polityscore. 
0 notes