Win free stuff in the Play and Win contest @Flipkart! Have you tried your luck? #PlayandWin #FlipkartBig10Sale
0 notes
Creating graphs for your data
import os
import pandas
import numpy
import seaborn
import matplotlib.pyplot as plt
data=pandas.read_csv('gapminder (2).csv', low_memory=False)
print(len(data))
print(len(data.columns))
data['suicideper100th']=data['suicideper100th'].convert_objects(convert_numeric=True)
data['alcconsumption']=data['alcconsumption'].convert_objects(convert_numeric=True)
data['urbanrate']=data['urbanrate'].convert_objects(convert_numeric=True)
data['employrate']=data['employrate'].convert_objects(convert_numeric=True)
print("Counts for suicideper100th")
c1=data["suicideper100th"].value_counts(sort=False, dropna=False)
print(c1)
print("Percentage for suicideper100th")
p1=data["suicideper100th"].value_counts(sort=False, normalize=True, dropna=False)
print(p1)
print("Counts for alcconsumption")
c2=data["alcconsumption"].value_counts(sort=False, dropna=False)
print(c2)
print("Percentage for alcconsumption")
p2=data["alcconsumption"].value_counts(sort=True, normalize=True, dropna=False, )
print(p2)
print("Counts for urbanrate")
c3=data["urbanrate"].value_counts(sort=False, dropna=False)
print(c3)
print("Percentage for urbanrate")
p3=data["urbanrate"].value_counts(sort=False, normalize=True, dropna=False)
print(p3)
print("Counts for employrate")
c4=data["employrate"].value_counts(sort=False)
print(c4)
print("Percentage for employrate")
p4=data["employrate"].value_counts(sort=False, normalize=True)
print(p4)
seaborn.distplot(data["alcconsumption"].dropna(), kde=False);
plt.xlabel('Alcohol Consumption')
seaborn.distplot(data["urbanrate"].dropna(), kde=False);
plt.xlabel('Urban Rate')
seaborn.distplot(data["suicideper100th"].dropna(), kde=False);
plt.xlabel('Suicide per 100 people')
seaborn.distplot(data["employrate"].dropna(), kde=False);
plt.xlabel('Employment Rate')
seaborn.factorplot(x="suicideper100th", y="employrate", data=data, kind="bar", ci=None)
plt.xlabel('Suicide Rate')
plt.ylabel('Employment Rate')
seaborn.factorplot(x="suicideper100th", y="alcconsumption", data=data, kind="bar", ci=None)
plt.xlabel('Suicide Rate')
plt.ylabel('Alcohol Consumption')
seaborn.factorplot(x="suicideper100th", y="urbanrate", data=data, kind="bar", ci=None)
plt.xlabel('Suicide Rate')
plt.ylabel('Urban Percentage')
From the graphs we can observe that the only interdependence which is observed, is between alcohol consumption and suicide rate. Employment rate and urban percentages do not create much of an impact on suicide rates. The various individual factors are varied in their values, and each has considerable standard deviation.
0 notes
Running my first program-
import pandas
import numpy
data=pandas.read_csv('gapminder (2).csv', low_memory=False)
print(len(data))
print(len(data.columns))
data['suicideper100th']=data['suicideper100th'].convert_objects(convert_numeric=True)
data['alcconsumption']=data['alcconsumption'].convert_objects(convert_numeric=True)
data['urbanrate']=data['urbanrate'].convert_objects(convert_numeric=True)
data['employrate']=data['employrate'].convert_objects(convert_numeric=True)
print("Counts for suicideper100th")
c1=data["suicideper100th"].value_counts(sort=False, dropna=False)
print(c1)
print("Percentage for suicideper100th")
p1=data["suicideper100th"].value_counts(sort=False, normalize=True, dropna=False)
print(p1)
print("Counts for alcconsumption")
c2=data["alcconsumption"].value_counts(sort=False, dropna=False)
print(c2)
print("Percentage for alcconsumption")
p2=data["alcconsumption"].value_counts(sort=False, normalize=True, dropna=False)
print(p2)
print("Counts for urbanrate")
c3=data["urbanrate"].value_counts(sort=False, dropna=False)
print(c3)
print("Percentage for urbanrate")
p3=data["urbanrate"].value_counts(sort=False, normalize=True, dropna=False)
print(p3)
print("Counts for employrate")
c4=data["employrate"].value_counts(sort=False)
print(c4)
print("Percentage for employrate")
p4=data["employrate"].value_counts(sort=False, normalize=True)
print(p4)
13.905267 1
9.927033 1
15.953850 1
13.716340 1
12.411181 1
15.714571 1
9.257976 1
1.370002 1
4.961071 1
5.554276 1
7.304886 1
4.414990 1
..
8.164005 1
0.201449 1
14.537270 1
1.392951 1
20.369590 1
5.213720 1
7.745065 1
8.211067 1
4.119620 1
4.777007 1
1.658908 1
3.374416 1
4.288574 1
1.922485 1
13.239810 1
7.202384 1
4.527852 1
10.645740 1
6.265789 1
8.262893 1
26.219198 1
20.162010 1
14.547167 1
8.210948 1
14.554677 1
13.089616 1
8.081540 1
6.087671 1
22.404560 1
5.362179 1
Name: suicideper100th, dtype: int64
Percentage for suicideper100th
6.684385 0.004695
NaN 0.103286
7.765584 0.004695
15.542603 0.004695
6.519537 0.004695
7.563692 0.004695
7.184853 0.004695
6.597168 0.004695
11.151073 0.004695
17.032646 0.004695
13.094370 0.004695
3.576478 0.004695
6.105282 0.004695
15.538490 0.004695
14.091530 0.004695
10.937718 0.004695
1.380965 0.004695
10.823000 0.004695
13.905267 0.004695
9.927033 0.004695
15.953850 0.004695
13.716340 0.004695
12.411181 0.004695
15.714571 0.004695
9.257976 0.004695
1.370002 0.004695
4.961071 0.004695
5.554276 0.004695
7.304886 0.004695
4.414990 0.004695
...
8.164005 0.004695
0.201449 0.004695
14.537270 0.004695
1.392951 0.004695
20.369590 0.004695
5.213720 0.004695
7.745065 0.004695
8.211067 0.004695
4.119620 0.004695
4.777007 0.004695
1.658908 0.004695
3.374416 0.004695
4.288574 0.004695
1.922485 0.004695
13.239810 0.004695
7.202384 0.004695
4.527852 0.004695
10.645740 0.004695
6.265789 0.004695
8.262893 0.004695
26.219198 0.004695
20.162010 0.004695
14.547167 0.004695
8.210948 0.004695
14.554677 0.004695
13.089616 0.004695
8.081540 0.004695
6.087671 0.004695
22.404560 0.004695
5.362179 0.004695
Name: suicideper100th, dtype: float64
Counts for alcconsumption
NaN 26
5.25 1
9.75 1
0.50 1
9.50 1
9.60 1
5.05 1
0.96 1
3.61 1
7.29 1
7.38 1
7.60 1
2.14 1
15.00 1
1.49 1
0.56 1
13.66 1
6.16 1
9.70 1
8.81 1
0.34 2
3.53 1
5.12 1
9.43 1
0.11 1
8.35 1
16.23 1
5.21 1
12.72 1
2.08 1
..
4.98 1
12.14 1
13.24 1
2.70 1
2.76 1
0.79 1
12.02 1
19.15 1
4.71 1
13.31 1
3.02 1
3.41 1
3.39 2
12.11 1
13.34 1
7.10 1
3.58 1
4.81 1
3.92 1
9.65 1
0.92 1
1.92 1
10.17 1
3.99 1
10.41 1
12.09 1
4.99 1
1.03 1
3.64 1
3.11 1
Name: alcconsumption, dtype: int64
Percentage for alcconsumption
NaN 0.122066
5.25 0.004695
9.75 0.004695
0.50 0.004695
9.50 0.004695
9.60 0.004695
5.05 0.004695
0.96 0.004695
3.61 0.004695
7.29 0.004695
7.38 0.004695
7.60 0.004695
2.14 0.004695
15.00 0.004695
1.49 0.004695
0.56 0.004695
13.66 0.004695
6.16 0.004695
9.70 0.004695
8.81 0.004695
0.34 0.009390
3.53 0.004695
5.12 0.004695
9.43 0.004695
0.11 0.004695
8.35 0.004695
16.23 0.004695
5.21 0.004695
12.72 0.004695
2.08 0.004695
...
4.98 0.004695
12.14 0.004695
13.24 0.004695
2.70 0.004695
2.76 0.004695
0.79 0.004695
12.02 0.004695
19.15 0.004695
4.71 0.004695
13.31 0.004695
3.02 0.004695
3.41 0.004695
3.39 0.009390
12.11 0.004695
13.34 0.004695
7.10 0.004695
3.58 0.004695
4.81 0.004695
3.92 0.004695
9.65 0.004695
0.92 0.004695
1.92 0.004695
10.17 0.004695
3.99 0.004695
10.41 0.004695
12.09 0.004695
4.99 0.004695
1.03 0.004695
3.64 0.004695
3.11 0.004695
Name: alcconsumption, dtype: float64
Counts for urbanrate
C:/SPB_Data/.spyder2/temp.py:12: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
data['suicideper100th']=data['suicideper100th'].convert_objects(convert_numeric=True)
C:/SPB_Data/.spyder2/temp.py:13: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
data['alcconsumption']=data['alcconsumption'].convert_objects(convert_numeric=True)
C:/SPB_Data/.spyder2/temp.py:14: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
data['urbanrate']=data['urbanrate'].convert_objects(convert_numeric=True)
C:/SPB_Data/.spyder2/temp.py:15: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
data['employrate']=data['employrate'].convert_objects(convert_numeric=True)
C:\Python27\lib\site-packages\pandas\core\format.py:2087: RuntimeWarning: invalid value encountered in greater
has_large_values = (abs_vals > 1e8).any()
C:\Python27\lib\site-packages\pandas\core\format.py:2088: RuntimeWarning: invalid value encountered in less
has_small_values = ((abs_vals < 10 ** (-self.digits)) &
C:\Python27\lib\site-packages\pandas\core\format.py:2089: RuntimeWarning: invalid value encountered in greater
(abs_vals > 0)).any()
38.58 1
74.50 1
NaN 10
73.50 1
67.50 1
87.30 1
66.50 1
47.04 1
37.76 1
59.58 1
68.08 1
24.94 1
73.48 1
17.00 1
70.36 1
92.68 1
67.16 1
22.54 1
23.00 1
52.74 1
77.54 1
81.46 1
12.98 1
88.92 1
91.66 1
82.44 1
33.32 1
15.10 1
64.92 1
98.36 1
..
60.18 1
80.40 1
28.08 1
67.98 1
57.94 1
65.58 2
53.30 1
51.70 1
17.96 1
56.56 1
56.02 1
66.48 1
43.44 1
65.22 1
73.64 1
27.14 1
86.96 1
54.22 1
18.80 1
47.88 1
12.54 1
97.36 1
29.54 1
51.64 1
94.22 1
25.46 1
89.94 1
51.46 1
54.24 1
83.52 1
Name: urbanrate, dtype: int64
Percentage for urbanrate
38.58 0.004695
74.50 0.004695
NaN 0.046948
73.50 0.004695
67.50 0.004695
87.30 0.004695
66.50 0.004695
47.04 0.004695
37.76 0.004695
59.58 0.004695
68.08 0.004695
24.94 0.004695
73.48 0.004695
17.00 0.004695
70.36 0.004695
92.68 0.004695
67.16 0.004695
22.54 0.004695
23.00 0.004695
52.74 0.004695
77.54 0.004695
81.46 0.004695
12.98 0.004695
88.92 0.004695
91.66 0.004695
82.44 0.004695
33.32 0.004695
15.10 0.004695
64.92 0.004695
98.36 0.004695
...
60.18 0.004695
80.40 0.004695
28.08 0.004695
67.98 0.004695
57.94 0.004695
65.58 0.009390
53.30 0.004695
51.70 0.004695
17.96 0.004695
56.56 0.004695
56.02 0.004695
66.48 0.004695
43.44 0.004695
65.22 0.004695
73.64 0.004695
27.14 0.004695
86.96 0.004695
54.22 0.004695
18.80 0.004695
47.88 0.004695
12.54 0.004695
97.36 0.004695
29.54 0.004695
51.64 0.004695
94.22 0.004695
25.46 0.004695
89.94 0.004695
51.46 0.004695
54.24 0.004695
83.52 0.004695
Name: urbanrate, dtype: float64
Counts for employrate
50.500000 1
61.500000 3
64.500000 1
63.500000 1
56.500000 1
53.500000 3
81.500000 1
60.500000 1
64.199997 1
42.500000 1
54.500000 2
71.800003 1
49.500000 1
52.500000 1
58.500000 1
57.500000 2
43.099998 1
63.799999 2
46.400002 1
74.699997 1
54.599998 1
55.400002 1
78.900002 1
58.799999 2
59.700001 1
83.199997 2
48.700001 2
39.000000 1
42.000000 1
42.400002 2
..
57.200001 1
44.700001 1
57.299999 1
55.599998 1
65.900002 1
56.400002 1
66.800003 1
63.599998 1
73.599998 1
63.700001 1
72.800003 1
63.900002 1
71.599998 1
57.099998 1
64.900002 1
73.099998 1
81.300003 1
64.300003 1
63.099998 1
52.700001 1
50.700001 1
68.099998 1
61.299999 1
65.699997 1
60.400002 2
68.300003 1
66.900002 1
55.099998 1
46.799999 1
51.400002 1
Name: employrate, dtype: int64
Percentage for employrate
50.500000 0.004695
61.500000 0.014085
64.500000 0.004695
63.500000 0.004695
56.500000 0.004695
53.500000 0.014085
81.500000 0.004695
60.500000 0.004695
64.199997 0.004695
42.500000 0.004695
54.500000 0.009390
71.800003 0.004695
49.500000 0.004695
52.500000 0.004695
58.500000 0.004695
57.500000 0.009390
43.099998 0.004695
63.799999 0.009390
46.400002 0.004695
74.699997 0.004695
54.599998 0.004695
55.400002 0.004695
78.900002 0.004695
58.799999 0.009390
59.700001 0.004695
83.199997 0.009390
48.700001 0.009390
39.000000 0.004695
42.000000 0.004695
42.400002 0.009390
...
57.200001 0.004695
44.700001 0.004695
57.299999 0.004695
55.599998 0.004695
65.900002 0.004695
56.400002 0.004695
66.800003 0.004695
63.599998 0.004695
73.599998 0.004695
63.700001 0.004695
72.800003 0.004695
63.900002 0.004695
71.599998 0.004695
57.099998 0.004695
64.900002 0.004695
73.099998 0.004695
81.300003 0.004695
64.300003 0.004695
63.099998 0.004695
52.700001 0.004695
50.700001 0.004695
68.099998 0.004695
61.299999 0.004695
65.699997 0.004695
60.400002 0.009390
68.300003 0.004695
66.900002 0.004695
55.099998 0.004695
46.799999 0.004695
51.400002 0.004695
Name: employrate, dtype: float64
>>>
In my program, I find the frequency distribution of alcohol consumption rate, employment rate, urban rate and suicide rates. I also list the frequency of values absent(NaN). The variables need to be plotted against each other later on in order to show what factor suicide rate depends on the most.
0 notes
Gapminder
Is Unemployment connected to Suicide rates? And is this more prevalent in Urban areas? Does this indirectly depend on alcohol consumption as well?
I have chosen the Gapminder Data Set. I want to find the correlations if any between alcohol consumption and suicide rate around the world as well as unemployment rate and alcohol consumption. This will in turn prove that unemployment rate is directly proportional to the Suicide Rate. I will be stacking up suicideper100th against alcconsumption and then alcconsumption against employrate, and then find correlations, where variables are-
suicideper100th-The number of suicides per 100 people
alcconsumption-The percentage of alcohol consumption among people
employrate-the percentage of employment rate among people.
I have in my own experience seen more suicide among students due to fear of unemployment than among any other group of people and this is why this topic is of particular interest to me. Another correlation that could found later on is plotting the per capita income against the employrate, which would actually should how diversely wealth is distributed and weather a higher per capita income helps in increasing employment rate.
There has been several studies in recent years in this field such as-(1) http://ije.oxfordjournals.org/content/19/2/412.short, (2) http://jech.bmj.com/content/57/8/594.short, (3)http://www.sciencedirect.com/science/article/pii/S0167487009000361 and (4)http://www.amsciepub.com/doi/abs/10.2466/pr0.1980.47.3f.1095?journalCode=pr0.
In (4), we can see how suicides rates changed with employment rates from 1962 to 1976 in Canda, France, Germany, Japan, Sweden and US, Paper (3) tests the hypothesis whether the relationship between unemployment rates and suicide rates vary according to the level of real per capita GDP. (1) has Data collected on unemployment and suicide rates in 16 developed countries for 1973 and 1983 and studies on this data. (2) is on the determindation of association of labour force status and socioeconomic position with death by suicide .
However we can see, that in most studies the relationship was established in a particular countries or a few countries and not on such a large scale. Thus my paper would find if this association can be extended to each and every country on the Gapminder Dataset.
0 notes