Tumgik
aroy1994 · 7 years
Link
Win free stuff in the Play and Win contest @Flipkart! Have you tried your luck? #PlayandWin #FlipkartBig10Sale
0 notes
aroy1994 · 8 years
Text
Creating graphs for your data
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
import os import pandas import numpy import seaborn import matplotlib.pyplot as plt
data=pandas.read_csv('gapminder (2).csv', low_memory=False)
print(len(data)) print(len(data.columns))
data['suicideper100th']=data['suicideper100th'].convert_objects(convert_numeric=True) data['alcconsumption']=data['alcconsumption'].convert_objects(convert_numeric=True) data['urbanrate']=data['urbanrate'].convert_objects(convert_numeric=True) data['employrate']=data['employrate'].convert_objects(convert_numeric=True)
print("Counts for suicideper100th") c1=data["suicideper100th"].value_counts(sort=False, dropna=False) print(c1) print("Percentage for suicideper100th") p1=data["suicideper100th"].value_counts(sort=False, normalize=True, dropna=False) print(p1)
print("Counts for alcconsumption") c2=data["alcconsumption"].value_counts(sort=False, dropna=False) print(c2) print("Percentage for alcconsumption") p2=data["alcconsumption"].value_counts(sort=True, normalize=True, dropna=False, ) print(p2)
print("Counts for urbanrate") c3=data["urbanrate"].value_counts(sort=False, dropna=False) print(c3) print("Percentage for urbanrate") p3=data["urbanrate"].value_counts(sort=False, normalize=True, dropna=False) print(p3)
print("Counts for employrate") c4=data["employrate"].value_counts(sort=False) print(c4) print("Percentage for employrate") p4=data["employrate"].value_counts(sort=False, normalize=True) print(p4)
seaborn.distplot(data["alcconsumption"].dropna(), kde=False); plt.xlabel('Alcohol Consumption')
seaborn.distplot(data["urbanrate"].dropna(), kde=False); plt.xlabel('Urban Rate')
seaborn.distplot(data["suicideper100th"].dropna(), kde=False); plt.xlabel('Suicide per 100 people')
seaborn.distplot(data["employrate"].dropna(), kde=False); plt.xlabel('Employment Rate')
seaborn.factorplot(x="suicideper100th", y="employrate", data=data, kind="bar", ci=None) plt.xlabel('Suicide Rate') plt.ylabel('Employment Rate')
seaborn.factorplot(x="suicideper100th", y="alcconsumption", data=data, kind="bar", ci=None) plt.xlabel('Suicide Rate') plt.ylabel('Alcohol Consumption')
seaborn.factorplot(x="suicideper100th", y="urbanrate", data=data, kind="bar", ci=None) plt.xlabel('Suicide Rate') plt.ylabel('Urban Percentage')
From the graphs we can observe that the only interdependence which is observed, is between alcohol consumption and suicide rate. Employment rate and urban percentages do not create much of an impact on suicide rates. The various individual factors are varied in their values, and each has considerable standard deviation. 
0 notes
aroy1994 · 8 years
Text
Running my first program-
import pandas import numpy
data=pandas.read_csv('gapminder (2).csv', low_memory=False)
print(len(data)) print(len(data.columns))
data['suicideper100th']=data['suicideper100th'].convert_objects(convert_numeric=True) data['alcconsumption']=data['alcconsumption'].convert_objects(convert_numeric=True) data['urbanrate']=data['urbanrate'].convert_objects(convert_numeric=True) data['employrate']=data['employrate'].convert_objects(convert_numeric=True)
print("Counts for suicideper100th") c1=data["suicideper100th"].value_counts(sort=False, dropna=False) print(c1) print("Percentage for suicideper100th") p1=data["suicideper100th"].value_counts(sort=False, normalize=True, dropna=False) print(p1)
print("Counts for alcconsumption") c2=data["alcconsumption"].value_counts(sort=False, dropna=False) print(c2) print("Percentage for alcconsumption") p2=data["alcconsumption"].value_counts(sort=False, normalize=True, dropna=False) print(p2)
print("Counts for urbanrate") c3=data["urbanrate"].value_counts(sort=False, dropna=False) print(c3) print("Percentage for urbanrate") p3=data["urbanrate"].value_counts(sort=False, normalize=True, dropna=False) print(p3)
print("Counts for employrate") c4=data["employrate"].value_counts(sort=False) print(c4) print("Percentage for employrate") p4=data["employrate"].value_counts(sort=False, normalize=True) print(p4)
13.905267     1 9.927033      1 15.953850     1 13.716340     1 12.411181     1 15.714571     1 9.257976      1 1.370002      1 4.961071      1 5.554276      1 7.304886      1 4.414990      1              .. 8.164005      1 0.201449      1 14.537270     1 1.392951      1 20.369590     1 5.213720      1 7.745065      1 8.211067      1 4.119620      1 4.777007      1 1.658908      1 3.374416      1 4.288574      1 1.922485      1 13.239810     1 7.202384      1 4.527852      1 10.645740     1 6.265789      1 8.262893      1 26.219198     1 20.162010     1 14.547167     1 8.210948      1 14.554677     1 13.089616     1 8.081540      1 6.087671      1 22.404560     1 5.362179      1 Name: suicideper100th, dtype: int64 Percentage for suicideper100th 6.684385     0.004695 NaN           0.103286 7.765584     0.004695 15.542603    0.004695 6.519537     0.004695 7.563692     0.004695 7.184853     0.004695 6.597168     0.004695 11.151073    0.004695 17.032646    0.004695 13.094370    0.004695 3.576478     0.004695 6.105282     0.004695 15.538490    0.004695 14.091530    0.004695 10.937718    0.004695 1.380965     0.004695 10.823000    0.004695 13.905267    0.004695 9.927033     0.004695 15.953850    0.004695 13.716340    0.004695 12.411181    0.004695 15.714571    0.004695 9.257976     0.004695 1.370002     0.004695 4.961071     0.004695 5.554276     0.004695 7.304886     0.004695 4.414990     0.004695                ...   8.164005     0.004695 0.201449     0.004695 14.537270    0.004695 1.392951     0.004695 20.369590    0.004695 5.213720     0.004695 7.745065     0.004695 8.211067     0.004695 4.119620     0.004695 4.777007     0.004695 1.658908     0.004695 3.374416     0.004695 4.288574     0.004695 1.922485     0.004695 13.239810    0.004695 7.202384     0.004695 4.527852     0.004695 10.645740    0.004695 6.265789     0.004695 8.262893     0.004695 26.219198    0.004695 20.162010    0.004695 14.547167    0.004695 8.210948     0.004695 14.554677    0.004695 13.089616    0.004695 8.081540     0.004695 6.087671     0.004695 22.404560    0.004695 5.362179     0.004695 Name: suicideper100th, dtype: float64 Counts for alcconsumption NaN       26 5.25      1 9.75      1 0.50      1 9.50      1 9.60      1 5.05      1 0.96      1 3.61      1 7.29      1 7.38      1 7.60      1 2.14      1 15.00     1 1.49      1 0.56      1 13.66     1 6.16      1 9.70      1 8.81      1 0.34      2 3.53      1 5.12      1 9.43      1 0.11      1 8.35      1 16.23     1 5.21      1 12.72     1 2.08      1          .. 4.98      1 12.14     1 13.24     1 2.70      1 2.76      1 0.79      1 12.02     1 19.15     1 4.71      1 13.31     1 3.02      1 3.41      1 3.39      2 12.11     1 13.34     1 7.10      1 3.58      1 4.81      1 3.92      1 9.65      1 0.92      1 1.92      1 10.17     1 3.99      1 10.41     1 12.09     1 4.99      1 1.03      1 3.64      1 3.11      1 Name: alcconsumption, dtype: int64 Percentage for alcconsumption NaN       0.122066 5.25     0.004695 9.75     0.004695 0.50     0.004695 9.50     0.004695 9.60     0.004695 5.05     0.004695 0.96     0.004695 3.61     0.004695 7.29     0.004695 7.38     0.004695 7.60     0.004695 2.14     0.004695 15.00    0.004695 1.49     0.004695 0.56     0.004695 13.66    0.004695 6.16     0.004695 9.70     0.004695 8.81     0.004695 0.34     0.009390 3.53     0.004695 5.12     0.004695 9.43     0.004695 0.11     0.004695 8.35     0.004695 16.23    0.004695 5.21     0.004695 12.72    0.004695 2.08     0.004695            ...   4.98     0.004695 12.14    0.004695 13.24    0.004695 2.70     0.004695 2.76     0.004695 0.79     0.004695 12.02    0.004695 19.15    0.004695 4.71     0.004695 13.31    0.004695 3.02     0.004695 3.41     0.004695 3.39     0.009390 12.11    0.004695 13.34    0.004695 7.10     0.004695 3.58     0.004695 4.81     0.004695 3.92     0.004695 9.65     0.004695 0.92     0.004695 1.92     0.004695 10.17    0.004695 3.99     0.004695 10.41    0.004695 12.09    0.004695 4.99     0.004695 1.03     0.004695 3.64     0.004695 3.11     0.004695 Name: alcconsumption, dtype: float64 Counts for urbanrate C:/SPB_Data/.spyder2/temp.py:12: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.  data['suicideper100th']=data['suicideper100th'].convert_objects(convert_numeric=True) C:/SPB_Data/.spyder2/temp.py:13: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.  data['alcconsumption']=data['alcconsumption'].convert_objects(convert_numeric=True) C:/SPB_Data/.spyder2/temp.py:14: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.  data['urbanrate']=data['urbanrate'].convert_objects(convert_numeric=True) C:/SPB_Data/.spyder2/temp.py:15: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.  data['employrate']=data['employrate'].convert_objects(convert_numeric=True) C:\Python27\lib\site-packages\pandas\core\format.py:2087: RuntimeWarning: invalid value encountered in greater  has_large_values = (abs_vals > 1e8).any() C:\Python27\lib\site-packages\pandas\core\format.py:2088: RuntimeWarning: invalid value encountered in less  has_small_values = ((abs_vals < 10 ** (-self.digits)) & C:\Python27\lib\site-packages\pandas\core\format.py:2089: RuntimeWarning: invalid value encountered in greater  (abs_vals > 0)).any() 38.58     1 74.50     1 NaN       10 73.50     1 67.50     1 87.30     1 66.50     1 47.04     1 37.76     1 59.58     1 68.08     1 24.94     1 73.48     1 17.00     1 70.36     1 92.68     1 67.16     1 22.54     1 23.00     1 52.74     1 77.54     1 81.46     1 12.98     1 88.92     1 91.66     1 82.44     1 33.32     1 15.10     1 64.92     1 98.36     1          .. 60.18     1 80.40     1 28.08     1 67.98     1 57.94     1 65.58     2 53.30     1 51.70     1 17.96     1 56.56     1 56.02     1 66.48     1 43.44     1 65.22     1 73.64     1 27.14     1 86.96     1 54.22     1 18.80     1 47.88     1 12.54     1 97.36     1 29.54     1 51.64     1 94.22     1 25.46     1 89.94     1 51.46     1 54.24     1 83.52     1 Name: urbanrate, dtype: int64 Percentage for urbanrate 38.58    0.004695 74.50    0.004695 NaN       0.046948 73.50    0.004695 67.50    0.004695 87.30    0.004695 66.50    0.004695 47.04    0.004695 37.76    0.004695 59.58    0.004695 68.08    0.004695 24.94    0.004695 73.48    0.004695 17.00    0.004695 70.36    0.004695 92.68    0.004695 67.16    0.004695 22.54    0.004695 23.00    0.004695 52.74    0.004695 77.54    0.004695 81.46    0.004695 12.98    0.004695 88.92    0.004695 91.66    0.004695 82.44    0.004695 33.32    0.004695 15.10    0.004695 64.92    0.004695 98.36    0.004695            ...   60.18    0.004695 80.40    0.004695 28.08    0.004695 67.98    0.004695 57.94    0.004695 65.58    0.009390 53.30    0.004695 51.70    0.004695 17.96    0.004695 56.56    0.004695 56.02    0.004695 66.48    0.004695 43.44    0.004695 65.22    0.004695 73.64    0.004695 27.14    0.004695 86.96    0.004695 54.22    0.004695 18.80    0.004695 47.88    0.004695 12.54    0.004695 97.36    0.004695 29.54    0.004695 51.64    0.004695 94.22    0.004695 25.46    0.004695 89.94    0.004695 51.46    0.004695 54.24    0.004695 83.52    0.004695 Name: urbanrate, dtype: float64 Counts for employrate 50.500000    1 61.500000    3 64.500000    1 63.500000    1 56.500000    1 53.500000    3 81.500000    1 60.500000    1 64.199997    1 42.500000    1 54.500000    2 71.800003    1 49.500000    1 52.500000    1 58.500000    1 57.500000    2 43.099998    1 63.799999    2 46.400002    1 74.699997    1 54.599998    1 55.400002    1 78.900002    1 58.799999    2 59.700001    1 83.199997    2 48.700001    2 39.000000    1 42.000000    1 42.400002    2            .. 57.200001    1 44.700001    1 57.299999    1 55.599998    1 65.900002    1 56.400002    1 66.800003    1 63.599998    1 73.599998    1 63.700001    1 72.800003    1 63.900002    1 71.599998    1 57.099998    1 64.900002    1 73.099998    1 81.300003    1 64.300003    1 63.099998    1 52.700001    1 50.700001    1 68.099998    1 61.299999    1 65.699997    1 60.400002    2 68.300003    1 66.900002    1 55.099998    1 46.799999    1 51.400002    1 Name: employrate, dtype: int64 Percentage for employrate 50.500000    0.004695 61.500000    0.014085 64.500000    0.004695 63.500000    0.004695 56.500000    0.004695 53.500000    0.014085 81.500000    0.004695 60.500000    0.004695 64.199997    0.004695 42.500000    0.004695 54.500000    0.009390 71.800003    0.004695 49.500000    0.004695 52.500000    0.004695 58.500000    0.004695 57.500000    0.009390 43.099998    0.004695 63.799999    0.009390 46.400002    0.004695 74.699997    0.004695 54.599998    0.004695 55.400002    0.004695 78.900002    0.004695 58.799999    0.009390 59.700001    0.004695 83.199997    0.009390 48.700001    0.009390 39.000000    0.004695 42.000000    0.004695 42.400002    0.009390               ...   57.200001    0.004695 44.700001    0.004695 57.299999    0.004695 55.599998    0.004695 65.900002    0.004695 56.400002    0.004695 66.800003    0.004695 63.599998    0.004695 73.599998    0.004695 63.700001    0.004695 72.800003    0.004695 63.900002    0.004695 71.599998    0.004695 57.099998    0.004695 64.900002    0.004695 73.099998    0.004695 81.300003    0.004695 64.300003    0.004695 63.099998    0.004695 52.700001    0.004695 50.700001    0.004695 68.099998    0.004695 61.299999    0.004695 65.699997    0.004695 60.400002    0.009390 68.300003    0.004695 66.900002    0.004695 55.099998    0.004695 46.799999    0.004695 51.400002    0.004695 Name: employrate, dtype: float64 >>>
In my program, I find the frequency distribution of alcohol consumption rate, employment rate, urban rate and suicide rates. I also list the frequency of values absent(NaN). The variables need to be plotted against each other later on in order to show what factor suicide rate depends on the most.
0 notes
aroy1994 · 8 years
Text
Gapminder
Is Unemployment connected to Suicide rates? And is this more prevalent in Urban areas? Does this indirectly depend on alcohol consumption as well?
I have chosen the Gapminder Data Set. I want to find the correlations if any between alcohol consumption and suicide rate around the world as well as unemployment rate and alcohol consumption. This will in turn prove that unemployment rate is directly proportional to the Suicide Rate. I will be stacking up suicideper100th against alcconsumption and then alcconsumption against employrate, and then find correlations, where variables are-
suicideper100th-The number of suicides per 100 people
alcconsumption-The percentage of alcohol consumption among people
employrate-the percentage of employment rate among people.
I have in my own experience seen more suicide among students due to fear of unemployment than among any other group of people and this is why this topic is of particular interest to me. Another correlation that could found later on is plotting the per capita income against the employrate, which would actually should how diversely wealth is distributed and weather a higher per capita income helps in increasing employment rate.
There has been several studies in recent years in this field such as-(1) http://ije.oxfordjournals.org/content/19/2/412.short, (2) http://jech.bmj.com/content/57/8/594.short, (3)http://www.sciencedirect.com/science/article/pii/S0167487009000361 and (4)http://www.amsciepub.com/doi/abs/10.2466/pr0.1980.47.3f.1095?journalCode=pr0. 
In (4), we can see how suicides rates changed with employment rates from 1962 to 1976 in Canda, France, Germany, Japan, Sweden and US,  Paper (3) tests the hypothesis whether the relationship between unemployment rates and suicide rates vary according to the level of real per capita GDP. (1) has  Data collected on unemployment and suicide rates in 16 developed countries for 1973 and 1983 and studies on this data. (2) is on the determindation of association of labour force status and socioeconomic position with death by suicide .
However we can see, that in most studies the relationship was established in a particular countries or a few countries and not on such a large scale. Thus my paper would find if this association can be extended to each and every country on the Gapminder Dataset.
0 notes