ultrafahmina06things-blog
ultrafahmina06things-blog
Assignments
5 posts
Don't wanna be here? Send us removal request.
ultrafahmina06things-blog · 5 years ago
Text
Week4 - Assignment
For this assignment, Though three or more variables could be selected, in the name of clarity and simplicity and to focus on the hypothesis of the project, I opted for only 2: Life Expectancy and alcohol consumption.
Here is my code:
# importing necessary libraries
%matplotlib inline import pandas as pd import numpy as np from collections import OrderedDict from tabulate import tabulate, tabulate_formats import seaborn import matplotlib.pyplot as plt
# bug fix for display formats to avoid run time errors pd.set_option('display.float_format', lambda x:'%f'%x) # Load from CSV
data1 = pd.read_csv('D:/CourseEra/gapminder.csv', skip_blank_lines=True,                   usecols=['country',                             'alcconsumption', 'lifeexpectancy']) data1=data1.replace(r'^\s*$', np.nan, regex=True) data1 = data1.dropna(axis=0, how='any') # Variables Descriptions ALCOHOL = "2008 alcohol consumption per adult (liters, age 15+)" LIFE = "2011 life expectancy at birth (years)" for dt in ( 'alcconsumption', 'lifeexpectancy') :     data1[dt] = pd.to_numeric(data1[dt], 'errors=coerce') data2 = data1.copy()
#Univariate histogram for alcohol consumption: seaborn.distplot(data1["alcconsumption"].dropna(), kde=False); plt.xlabel('alcohol consumption (liters)') plt.title(ALCOHOL) plt.show()
The univariate graph of alcohol consumption :
Tumblr media
The Univariate Graph of Life Expectancy:
Tumblr media
Scatterplot for the association between Alcohol Consumption and Life Expectancy:
Tumblr media
Univariate bar graph for categorical variable life expectancy:
Tumblr media
Univariate bar graph for categorical variable alcohol consumption:
Tumblr media
Bivariate bar graph
Tumblr media
Analyzing only the scatter graph seems do not be a correlation between the variables, but considering the bivariate bar graph, we can say that moderate alcohol consumption can contribute to life expectancy increases. Of course, this is not a scientific work and have value only for this context.
0 notes
ultrafahmina06things-blog · 5 years ago
Text
Week3-Assignment
Below is the full code -
# importing necessary libraries
import pandas as pd import numpy as np from collections import OrderedDict from tabulate import tabulate, tabulate_formats
# Dictionaries
counts = OrderedDict() prcnts = OrderedDict()
# Load from CSV
data1 = pd.read_csv('D:/CourseEra/gapminder.csv', skip_blank_lines=True,                   usecols=['country','incomeperperson',                             'alcconsumption', 'lifeexpectancy'])
data1=data1.replace(r'^\s*$', np.nan, regex=True) data1 = data1.dropna(axis=0, how='any')
ALCOHOL = "2008 alcohol consumption per adult (liters, age 15+)" INCOME = "2010 Gross Domestic Product per capita in constant 2000 US$" LIFE = "2011 life expectancy at birth (years)"
for dt in ('incomeperperson', 'alcconsumption', 'lifeexpectancy') :     counts[dt] = pd.to_numeric(data1[dt], 'errors=coerce')
# absolute Frequency distributions freq_life_n = data1.lifeexpectancy.value_counts(sort=False) freq_income_n = data1.incomeperperson.value_counts(sort=False) freq_alcohol_n = data1.alcconsumption.value_counts(sort=False)
# Relative Frequency distributions freq_life_r = data1.lifeexpectancy.value_counts(sort=False, normalize=True) freq_income_r = data1.incomeperperson.value_counts(sort=False, normalize=True) freq_alcohol_r = data1.alcconsumption.value_counts(sort=False, normalize=True)
print ('********************************************************') print ('* Absolute Frequencies original variables (first 5)    *') print ('********************************************************') print ('\nlife variable ('+LIFE+'):') print ( tabulate([freq_life_n.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) ) print ('\nincome variable ('+INCOME+'):') print ( tabulate([freq_income_n.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_income_n.index])) ) print ('\nalcohol variable ('+ALCOHOL+'):') print ( tabulate([freq_life_n.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) )
print ('\n********************************************************') print ('* Relative Frequencies original variables (first 5)    *') print ('********************************************************')
print ('\nlife variable ('+LIFE+'):') print ( tabulate([freq_life_r.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) ) print ('\nincome variable ('+INCOME+'):') print ( tabulate([freq_income_r.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_income_n.index])) ) print ('\nalcohol variable ('+ALCOHOL+'):') print ( tabulate([freq_life_r.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) )
******************************************************** * Absolute Frequencies original variables (first 5)    * ******************************************************** life variable (2011 life expectancy at birth (years)): ╒══════════╤══════════╤══════════╤══════════╤══════════╕ │   74.402 │   73.373 │   54.675 │   81.804 │   76.126 │ ╞══════════╪══════════╪══════════╪══════════╪══════════╡ │        1 │        1 │        1 │        1 │        1 │ ╘══════════╧══════════╧══════════╧══════════╧══════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒════════════════════╤════════════════════╤════════════════════╤════════════════════╤════════════════════╕ │   9425.32586978275 │   15313.8593472276 │   389.763634253063 │   268.331790297681 │   11744.8341671737 │ ╞════════════════════╪════════════════════╪════════════════════╪════════════════════╪════════════════════╡ │                  1 │                  1 │                  1 │                  1 │                  1 │ ╘════════════════════╧════════════════════╧════════════════════╧════════════════════╧════════════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒══════════╤══════════╤══════════╤══════════╤══════════╕ │   74.402 │   73.373 │   54.675 │   81.804 │   76.126 │ ╞══════════╪══════════╪══════════╪══════════╪══════════╡ │        1 │        1 │        1 │        1 │        1 │ ╘══════════╧══════════╧══════════╧══════════╧══════════╛ ******************************************************** * Relative Frequencies original variables (first 5)    * ******************************************************** life variable (2011 life expectancy at birth (years)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │     74.402 │     73.373 │     54.675 │     81.804 │     76.126 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ ╘════════════╧════════════╧════════════╧════════════╧════════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒════════════════════╤════════════════════╤════════════════════╤════════════════════╤════════════════════╕ │   9425.32586978275 │   15313.8593472276 │   389.763634253063 │   268.331790297681 │   11744.8341671737 │ ╞════════════════════╪════════════════════╪════════════════════╪════════════════════╪════════════════════╡ │         0.00584795 │         0.00584795 │         0.00584795 │         0.00584795 │         0.00584795 │ ╘════════════════════╧════════════════════╧════════════════════╧════════════════════╧════════════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │     74.402 │     73.373 │     54.675 │     81.804 │     76.126 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ ╘════════════╧════════════╧════════════╧═════════��══╧════════════╛
# Min and Max continuous variables: min_max = OrderedDict() dict1 = OrderedDict()
dict1['min'] = data1.lifeexpectancy.min() dict1['max'] = data1.lifeexpectancy.max() min_max['lifeexpectancy'] = dict1
dict2 = OrderedDict() dict2['min'] = data1.incomeperperson.min() dict2['max'] = data1.incomeperperson.max() min_max['incomeperperson'] = dict2
dict3 = OrderedDict() dict3['min'] = data1.alcconsumption.min() dict3['max'] = data1.alcconsumption.max() min_max['alcconsumption'] = dict3
df = pd.DataFrame([min_max['incomeperperson'],min_max['lifeexpectancy'],min_max['alcconsumption']], index = ['incomeperperson','lifeexpectancy','alcconsumption']) print (tabulate(df.sort_index(axis=1, ascending=False), headers=['Var','Min','Max'])) data2 = data1.copy()
Var                  Min      Max ---------------  -------  ------- incomeperperson  103.776  952.827 lifeexpectancy    47.794   83.394 alcconsumption     0.05     9.99
# Maps incomeperperson_map = {1: '>=100  <5k', 2: '>=5k <10k', 3: '>=10k <20k', 4: '>=20K <30K', 5: '>=30K <40K', 6: '>=40K <50K' } lifeexpectancy_map = {1: '>=40 <50', 2: '>=50 <60', 3: '>=60 <70', 4: '>=70 <80', 5: '>=80 <90'} alcconsumption_map = {1: '>=0.5 <5', 2: '>=5 <10', 3: '>=10 <15', 4: '>=15 <20', 5: '>=20 <25'}
# absolute Frequency distributions
freq_life_n = data2.lifeexpectancy.value_counts(sort=False) freq_income_n = data2.incomeperperson.value_counts(sort=False) freq_alcohol_n = data2.alcconsumption.value_counts(sort=False)
freq_life_r = data2.lifeexpectancy.value_counts(sort=False, normalize=True) freq_income_r = data2.incomeperperson.value_counts(sort=False, normalize=True) freq_alcohol_r = data2.alcconsumption.value_counts(sort=False, normalize=True)
print ('************************') print ('* Absolute Frequencies *') print ('************************') print ('\nlife variable ('+LIFE+'):') print( tabulate([freq_life_n], tablefmt="fancy_grid", headers=(lifeexpectancy_map.values()))) print ('\nincome variable ('+INCOME+'):') print( tabulate([freq_income_n], tablefmt="fancy_grid", headers=(incomeperperson_map.values()))) print ('\nalcohol variable ('+ALCOHOL+'):') print( tabulate([freq_alcohol_n], tablefmt="fancy_grid", headers=(alcconsumption_map.values())))
print ('\n************************') print ('* Relative Frequencies *') print ('************************') print ('\nlife variable ('+LIFE+'):') print( tabulate([freq_life_r], tablefmt="fancy_grid", headers=(lifeexpectancy_map.values()))) print ('\nincome variable ('+INCOME+'):') print( tabulate([freq_income_r], tablefmt="fancy_grid", headers=(incomeperperson_map.values()))) print ('\nalcohol variable ('+ALCOHOL+'):') print( tabulate([freq_alcohol_r], tablefmt="fancy_grid", headers=(alcconsumption_map.values())))
************************ * Absolute Frequencies * ************************ life variable (2011 life expectancy at birth (years)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │   >=40 <50 │   >=50 <60 │   >=60 <70 │   >=70 <80 │   >=80 <90 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │          8 │         28 │         35 │         80 │         20 │ ╘════════════╧════════════╧════════════╧════════════╧════════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒══════════════╤═════════════╤══════════════╤══════════════╤══════════════╤══════════════╕ │   >=100  <5k │   >=5k <10k │   >=10k <20k │   >=20K <30K │   >=30K <40K │   >=40K <50K │ ╞══════════════╪═════════════╪══════════════╪══════════════╪══════════════╪══════════════╡ │          110 │          24 │           15 │           12 │            9 │            0 │ ╘══════════════╧═════════════╧══════════════╧══════════════╧══════════════╧══════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒════════════╤═══════════╤════════════╤════════════╤════════════╕ │   >=0.5 <5 │   >=5 <10 │   >=10 <15 │   >=15 <20 │   >=20 <25 │ ╞════════════╪═══════════╪════════════╪════════════╪════════════╡ │         63 │        56 │         31 │         10 │          1 │ ╘════════════╧═══════════╧════════════╧════════════╧════════════╛ ************************ * Relative Frequencies * ************************ life variable (2011 life expectancy at birth (years)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │   >=40 <50 │   >=50 <60 │   >=60 <70 │   >=70 <80 │   >=80 <90 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │  0.0467836 │   0.163743 │   0.204678 │   0.467836 │   0.116959 │ ╘════════════╧════════════╧════════════╧════════════╧════════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒══════════════╤═════════════╤══════════════╤══════════════╤══════════════╤══════════════╕ │   >=100  <5k │   >=5k <10k │   >=10k <20k │   >=20K <30K │   >=30K <40K │   >=40K <50K │ ╞══════════════╪═════════════╪══════════════╪══════════════╪══════════════╪══════════════╡ │     0.647059 │    0.141176 │    0.0882353 │    0.0705882 │    0.0529412 │            0 │ ╘══════════════╧═════════════╧══════════════╧══════════════╧══════════════╧══════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒════════════╤═══════════╤════════════╤════════════╤════════════╕ │   >=0.5 <5 │   >=5 <10 │   >=10 <15 │   >=15 <20 │   >=20 <25 │ ╞════════════╪═══════════╪════════════╪════════════╪════════════╡ │   0.391304 │  0.347826 │   0.192547 │  0.0621118 │ 0.00621118 │ ╘════════════╧═══════════╧════════════╧════════════╧════════════╛
Explanation - 
I collapsed the responses for lifeexpectancy, incomeperperson, and alcconsumption to create three new variables: life, income, and alcohol. For life, the most commonly endorsed response was 4(>=70 <80) (46.78%), meaning that most countries have a life expectancy between 70 to 80 year old. For income, the most commonly endorsed response was 1(>=100  <5k) (64.7%), meaning that more than half of the countries have income level is between 100 to 5000 dollar. For alcohol, the most commonly endorsed response was 1(>=0.5 <5) (39.68%), meaning that the alcohol consumption for most countries is .5 to 5 liters.
0 notes
ultrafahmina06things-blog · 5 years ago
Text
Assignment-Week2
Bellow is my code:
# importing necessary libraries
import pandas as pd import numpy from collections import OrderedDict from tabulate import tabulate, tabulate_formats
# Dictionaries
counts = OrderedDict() prcnts = OrderedDict()
# Load from CSV
data1 = pd.read_csv('D:/CourseEra/gapminder.csv', skip_blank_lines=True,                   usecols=['country','incomeperperson',                             'alcconsumption', 'lifeexpectancy'])
# Counts missing entry
missings = [['Var', 'Missings']] for var in ('incomeperperson', 'alcconsumption', 'lifeexpectancy'):    missings.append([var, data1[var].value_counts()[' ']]) print (tabulate(missings, headers="firstrow"))
# Count each variable
for dt in ('incomeperperson', 'alcconsumption', 'lifeexpectancy') :     counts[dt] = pd.to_numeric(data1[dt], 'errors=coerce') print (counts['incomeperperson']) print (counts['alcconsumption']) print (counts['lifeexpectancy'])
# each variable as percentage
for dt in ('incomeperperson', 'alcconsumption', 'lifeexpectancy') :    prcnts[dt] = data1[dt].value_counts(sort=False, normalize=True) print (prcnts['incomeperperson']) print (prcnts['alcconsumption']) print (prcnts['lifeexpectancy'])
The output and explanation
This line shows information about dataset, where: incomeperperson alcconsumption lifeexpectancy
in the lines, 23-29 is showed the number of missing data for each variable by value_counts() function
# Counts missing entry
missings = [['Var', 'Missings']] for var in ('incomeperperson', 'alcconsumption', 'lifeexpectancy'):    missings.append([var, data1[var].value_counts()[' ']]) print (tabulate(missings, headers="firstrow"))
Var                              Missings 
 incomeperperson          23 
 alcconsumption             26
 lifeexpectancy               22
the nominal values of frequency of each observation related to the variable incomeperperson:
Tumblr media
the nominal values of frequency of each observation related to the variable  alcconsumption:
Tumblr media
the nominal values of frequency of each observation related to the variable lifeexpectancy:
Tumblr media
the frequency values, of each observation related to the variable incomeperperson , expressed as a percentage
Tumblr media
the frequency values, of each observation related to the variable alcconsumption, expressed as a percentage
Tumblr media
the frequency values, of each observation related to the variable  lifeexpectancy , expressed as a percentage
Tumblr media
0 notes
ultrafahmina06things-blog · 5 years ago
Text
Assignment1
Data Set:
After reviewing the five provided codebooks, I have opted for the “portion of’ GapMinder. The main reason for my interest in this data set is because of the global context, especially, in regarding health data.
First Topic:
After reviewing the dataset and codebook, I want to investigate to find out the correlation between life expectancy life and alcohol consumption.
The variable I want to use here -
incomeperperson, alcconsumption and lifeexpectancy
2nd Topic:
I also want to know the possible correlation between Socioeconomic status and alcohol consumption
Research Question - Is there a direct relationship between alcohol consumption and life expectancy?
Literature Review:
I did do my research in Google Scholar with the string: “alcohol consumption life expectancy”.  My original plan was to do an in-depth investigation of the chosen topics. Due to the short time and the fact that most articles are accessible only through pay assign, I opted by to select some papers and to analyze their abstracts and the open sections (frequently, “results’’ and “conclusions”). Two of these papers are summarized below.
Drinking Pattern and Mortality:: The Italian Risk Factor and Life Expectancy Pooling Project [1]
The purpose of this article is exactly to analyze the relationship between a particular aspect of drink pattern and risk of all-cause and specific-cause mortality The results presented in this paper indicate that drinking patterns may have important health implications, impacting directly on the life expectancy.
Alcohol-related mortality by age and sex and its impact on life expectancy. Estimates based on the Finnish death register [2]
This study was made in Finland and based on the “Finnish Death Register” that includes information on both the underlying and contributory causes of death and it yields an individual-level estimate of the contribution of alcohol to mortality. The data for 1987-1993 is used to examine alcohol-related mortality by cause of death. According to the results, 6% of all deaths were alcohol-related. These deaths were responsible for a 2-year loss in life expectancy at age 15 years among men and 0.4 years among women.
Hypothesizes -
Primary hypothesis:
The level of alcohol consumption of a country might be directly related to expectancy life.
Secondary hypothesis:
Socioeconomic status and income levels have a direct correlation with the level of alcohol consumption of a country.
0 notes
ultrafahmina06things-blog · 5 years ago
Text
dfd
1 note · View note