jiteshblog1 - Tumblr blog

jiteshblog1 · 5 years ago

Text

Alcohol Consumption and Suicide rate relation

The following python code was written to add the dataset from GAPMINDER codebook to the python and do the data analysis.

Few observations and data analysis before we go to the code. The GAPMINDER code book was selected to analyse the “alcohol consumption per adult (age 15+), litres Recorded and estimated average alcohol consumption adult (15+) percapita consumption in litres pure alcohol for each country” and the “ 2005 Suicide, age adjusted, per 100 000 Mortality due to self-inflicted injury, per 100 000 standard population, age adjusted”

For this the data was imported from the csv in the python code. There were in total 213 countries which were the part of the survey. The variable that was used was “alcconsumption” and “ suicideper100TH “. The data was first sorted by removing the blank columns with nan so that python can differentiate from string and numeric.

First we will discuss the Alcohol Consumption data analysis:

Out of 213 countries only 187 countries had the data for the variable “alcconsumption” .

The value ranged from 0 to 21.03 ( The max value calculated from the max function in python. The data set was split into 4 sections to see answer the question “ How many countries have the consumption

1. above 0 but less than 5,

2 .above 5 and less than 15,

3. above 15 and less than 20

4. Above 20

To solve this a new variable was created and can been seen in the code.

The below is the code was written and we got the below frequency distributions analysis :

Result for Alcohol Consumption Question:

81 Countries which is 43.33% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years between 0 to 5.

95 Countries which is 50.8% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years between 5 to 15.

81 Countries which is 5.3% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years between 15 to 20.

81 Countries which is 0.053% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years above 20.

The above observations with Counts for Alcohol were plotted as well. As they are categorical we have bar graphs.

Now coming to the Suicide data analysis:

The intention was to have the frequency distribution for the data to see “ how many countries have suicide rate above and below the average suicuide rate.”

For this we created a new variable which has the value 0 for the countries having the suicide rate below the average and 1 for the countries above the average.

The code is very simplified and can be understood easily for this distribution.

Result for suicide question mention in above section:

57.06 Countries have suicide rate below average

42.93 have above average.

The above observations with Counts for Alcohol and Suicide patterns were plotted as well. As they are categorical we have bar graphs.

The above observations with Counts for Alcohol were plotted as well. As they are categorical we have bar graphs.

There was another question that we would like to answer was to have a correlation the alcohol Consumption and the Suicide rate per capita. As both are quantitative variables we saw the spread for same. That was caluclated from the describe function in python.

The output is as follows:

It can be seen that the spread is almost of similar nature for both variables. we then plotted the scatter plot using the seaborn package in python.

It can be seen that the countries which have higher suicide rate are somehow linked with the more suicide happening due to more alcohol consumption.

Python Code:

# -*- coding: utf-8 -*- """ Created on Sat May 2 02:17:20 2020

@author: 212415780 """

import pandas as pd import numpy import seaborn import matplotlib.pyplot as plt # any additional libraries would be imported here

# Import the entire data set to memory from the GAPMINDER CodeBook .csv file data = pd.read_csv('gapminder.csv', low_memory=False)

#upper-case all DataFrame column names - place afer code for loading data aboave #data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid run time errors - put after code for loading data above pd.set_option('display.float_format', lambda x:'%f'%x) print('Total Number of Samples in the Gapminder Codebook data file') print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns)

#Getting Data from the csv file into Python with using Pandas data1=pd.DataFrame(data,columns=['country','alcconsumption','suicideper100th'])

#pd.to_numeric(s) data1['alcconsumption']=pd.to_numeric(data1['alcconsumption'],errors='coerce') data1['alcconsumption']=data1['alcconsumption'].sort_values(ascending=True) data1['suicideper100th']=pd.to_numeric(data1['suicideper100th'],errors='coerce') data1['suicideper100th']=data1['suicideper100th'].sort_values(ascending=True)

data1.sort_values(["alcconsumption"], axis=0,ascending=True, inplace=True)

#counts and percentages (i.e. frequency distributions) for each variable

print('Printing the Country List') t1=data1['country']

print ('Printing the 2008 alcohol consumption per adult (age 15+), litres Recorded and estimated average alcohol consumption') print('adult (15+) percapita consumption in litres pure alcohol for each country') c1 = data1['alcconsumption'].value_counts(sort=False) #print (c1)

p1 = data1['alcconsumption'].value_counts(sort=True, normalize=True) #print (p1)

#sFinding the countries which have per capita consumption more the 10 lts sub1=data1[(data1['alcconsumption']>=0)] sub2 = sub1.copy()

# frequency distritions on new sub2 data frame #print 'counts for AGE' c5 = sub2['alcconsumption'].value_counts(sort=False) #print(c5)

from numpy import mean #Caluculating the maximum adult (15+) percapita consumption in litres pure alcohol from the data in whole of data set max_alco=max(data1['alcconsumption']) print('Maximum Alcohol Consumption from Data set') print(max_alco)

#Caluculating the maximum adult (15+) percapita consumption in litres pure alcohol from the data in whole of data set min_alco=min(data1['alcconsumption']) print('Minimum Alcohol Consumption from Data set') print(min_alco)

avg_alco=mean(data1['alcconsumption']) print('Average Alcohol Consumption from Data set') print(avg_alco)

#new alcconsumption_split variable, categorical 1 through 6 def alcconsumption_split (row): if row['alcconsumption'] > 0 and row['alcconsumption']<=5: return 1 if row['alcconsumption'] >5 and row['alcconsumption']<=15 : return 2 if row['alcconsumption'] > 15 and row['alcconsumption']<=20: return 3 if row['alcconsumption'] > 20 : return 4

sub1['alcconsumption_split'] = sub1.apply (lambda row: alcconsumption_split (row),axis=1)

#print(a)

print('counts for alcohol comsumption splits ') c10 = sub1['alcconsumption_split'].value_counts(sort=False)

print(c10)

print('percentages for alcohol comsumption splits') p10 = sub1['alcconsumption_split'].value_counts(sort=False, normalize=True) p11=p10*100 print (p11)

print('2005 Suicide, age adjusted, per 100 000 Mortality due to self-inflicted injury, per 100 000 standard population,age adjusted')

sub3=data1[(data1['suicideper100th']>=0)] sub4 = sub3.copy()

max_suc=max(data1['suicideper100th']) print('Maximum number of suicide case per 100000 from Data set') print(max_suc)

#Caluculating the maximum adult (15+) percapita consumption in litres pure alcohol from the data in whole of data set min_suc=min(data1['suicideper100th']) print('Minimum number of suicide case per 100000 from Data set') print(min_suc)

avg_suc=mean(data1['suicideper100th']) print('Average number of suicide case per 100000 from Data set') print(avg_suc)

print(' The Purpose of code is to see the freqeuncy distribution of no of countries which has this rate belor or above average, for this we need to split the data') print('creating new variable')

def suic_split (row): if row['suicideper100th'] < avg_suc : return 0 if row['suicideper100th'] > avg_suc : return 1

sub3['suic_split'] = sub3.apply (lambda row: suic_split (row),axis=1)

#print(a)

print('Freqeuncy distribution for new variable suic_split which counts 0- no of countries with Suicide rate less than the average and 1 as the -no of countries with Suicide rate greater than the average ') c15 = sub3['suic_split'].value_counts(sort=False)

print(c15)

print('Perecnetage for new variable suic_split which counts 0- no of countries with Suicide rate less than the average and 1 as the -no of countries with Suicide rate greater than the average ') p15 = sub3['suic_split'].value_counts(sort=False, normalize=True) p16=p15*100 print (p16)

fig1 = plt.figure(1) sub1["alcconsumption_split"] = sub1["alcconsumption_split"].astype('category') t=seaborn.countplot(x="alcconsumption_split", data=sub1) plt.xlabel('1- Counteies with Alcohol Consumption more then 0 less than 5, 2- Counteies with Alcohol Consumption more then 5 less than 15,3- Counteies with Alcohol Consumption more then 15 less than 20,4- Counteies with Alcohol Consumption more then 20') plt.title('Alcohol Consumption Bar Graph') fig1.savefig('Plot1.png')

fig2 = plt.figure(2) sub3["suic_split"] = sub3["suic_split"].astype('category') y=seaborn.countplot(x="suic_split", data=sub3) plt.xlabel('Freqeuncy distribution Plot for new variable suic_split which counts 0- no of countries with Suicide rate less than the average and 1 as the -no of countries with Suicide rate greater than the average ') plt.title('Suicide Rate Bar Graph') fig2.savefig('PLot 2.png')

plt.show()

# standard deviation and other descriptive statistics for quantitative variables print ('describe number alcconsumption per capita ') desc1 = data1['alcconsumption'].describe() print(desc1)

print ('describe number Sucide per capita ') desc2 = data1['suicideper100th'].describe() print(desc2)

#basic scatterplot: Q->Q scat1 = seaborn.regplot(x="suicideper100th", y="alcconsumption", fit_reg=False, data=data) plt.xlabel('Suicide Rate') plt.ylabel('Alcohol Consumption') plt.title('Scatterplot for the Association Between Alcohol and Suicide vaiables ')

0 notes

jiteshblog1 · 5 years ago

Text

The following python code was written to add the dataset from GAPMINDER codebook to the python and do the data analysis.

First we will discuss the Alcohol Consumption data analysis:

Out of 213 countries only 187 countries had the data for the variable “alcconsumption” .

1. above 0 but less than 5,

2 .above 5 and less than 15,

3. above 15 and less than 20

4. Above 20

To solve this a new variable was created and can been seen in the code.

The below is the code was written and we got the below frequency distributions analysis :

Result for Alcohol Consumption Question:

81 Countries which is 43.33% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years between 0 to 5.

95 Countries which is 50.8% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years between 5 to 15.

81 Countries which is 5.3% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years between 15 to 20.

81 Countries which is 0.053% of total countries of which we had the percapita consumption in litres pure alcohol for adult above 15 years above 20.

Now coming to the Suicide data analysis:

The intention was to have the frequency distribution for the data to see “ how many countries have suicide rate above and below the average suicuide rate.”

For this we created a new variable which has the value 0 for the countries having the suicide rate below the average and 1 for the countries above the average.

The code is very simplified and can be understood easily for this distribution.

Result for suicide question mention in above section:

57.06 Countries have suicide rate below average

42.93 have above average.

The snapshot of the data table and result is shown below:

Code:

# -*- coding: utf-8 -*- """ Created on Sat May 2 02:17:20 2020

@author: 212415780 """

import pandas as pd import numpy # any additional libraries would be imported here

# Import the entire data set to memory from the GAPMINDER CodeBook .csv file data = pd.read_csv('gapminder.csv', low_memory=False)

#upper-case all DataFrame column names - place afer code for loading data aboave #data.columns = map(str.upper, data.columns)

#Getting Data from the csv file into Python with using Pandas data1=pd.DataFrame(data,columns=['country','alcconsumption','suicideper100th'])

data1.sort_values(["alcconsumption"], axis=0,ascending=True, inplace=True)

#counts and percentages (i.e. frequency distributions) for each variable

print('Printing the Country List') t1=data1['country']

p1 = data1['alcconsumption'].value_counts(sort=True, normalize=True) #print (p1)

#sFinding the countries which have per capita consumption more the 10 lts sub1=data1[(data1['alcconsumption']>=0)] sub2 = sub1.copy()

# frequency distritions on new sub2 data frame #print 'counts for AGE' c5 = sub2['alcconsumption'].value_counts(sort=False) #print(c5)

avg_alco=mean(data1['alcconsumption']) print('Average Alcohol Consumption from Data set') print(avg_alco)

sub1['alcconsumption_split'] = sub1.apply (lambda row: alcconsumption_split (row),axis=1)

#print(a)

print('counts for alcohol comsumption splits ')http://www.mpc.berkeley.edu/mpc-course-material c10 = sub1['alcconsumption_split'].value_counts(sort=False)