ultrafahmina06things-blog - Tumblr blog

ultrafahmina06things-blog · 5 years ago

Text

Week3-Assignment

Below is the full code -

# importing necessary libraries

import pandas as pd import numpy as np from collections import OrderedDict from tabulate import tabulate, tabulate_formats

# Dictionaries

counts = OrderedDict() prcnts = OrderedDict()

# Load from CSV

data1 = pd.read_csv('D:/CourseEra/gapminder.csv', skip_blank_lines=True, usecols=['country','incomeperperson', 'alcconsumption', 'lifeexpectancy'])

data1=data1.replace(r'^\s*$', np.nan, regex=True) data1 = data1.dropna(axis=0, how='any')

ALCOHOL = "2008 alcohol consumption per adult (liters, age 15+)" INCOME = "2010 Gross Domestic Product per capita in constant 2000 US$" LIFE = "2011 life expectancy at birth (years)"

for dt in ('incomeperperson', 'alcconsumption', 'lifeexpectancy') : counts[dt] = pd.to_numeric(data1[dt], 'errors=coerce')

# absolute Frequency distributions freq_life_n = data1.lifeexpectancy.value_counts(sort=False) freq_income_n = data1.incomeperperson.value_counts(sort=False) freq_alcohol_n = data1.alcconsumption.value_counts(sort=False)

# Relative Frequency distributions freq_life_r = data1.lifeexpectancy.value_counts(sort=False, normalize=True) freq_income_r = data1.incomeperperson.value_counts(sort=False, normalize=True) freq_alcohol_r = data1.alcconsumption.value_counts(sort=False, normalize=True)

print ('********************************************************') print ('* Absolute Frequencies original variables (first 5) *') print ('********************************************************') print ('\nlife variable ('+LIFE+'):') print ( tabulate([freq_life_n.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) ) print ('\nincome variable ('+INCOME+'):') print ( tabulate([freq_income_n.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_income_n.index])) ) print ('\nalcohol variable ('+ALCOHOL+'):') print ( tabulate([freq_life_n.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) )

print ('\n********************************************************') print ('* Relative Frequencies original variables (first 5) *') print ('********************************************************')

print ('\nlife variable ('+LIFE+'):') print ( tabulate([freq_life_r.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) ) print ('\nincome variable ('+INCOME+'):') print ( tabulate([freq_income_r.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_income_n.index])) ) print ('\nalcohol variable ('+ALCOHOL+'):') print ( tabulate([freq_life_r.head(5)], tablefmt="fancy_grid", headers=([i for i in freq_life_n.index])) )

******************************************************** * Absolute Frequencies original variables (first 5) * ******************************************************** life variable (2011 life expectancy at birth (years)): ╒══════════╤══════════╤══════════╤══════════╤══════════╕ │ 74.402 │ 73.373 │ 54.675 │ 81.804 │ 76.126 │ ╞══════════╪══════════╪══════════╪══════════╪══════════╡ │ 1 │ 1 │ 1 │ 1 │ 1 │ ╘══════════╧══════════╧══════════╧══════════╧══════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒════════════════════╤════════════════════╤════════════════════╤════════════════════╤════════════════════╕ │ 9425.32586978275 │ 15313.8593472276 │ 389.763634253063 │ 268.331790297681 │ 11744.8341671737 │ ╞════════════════════╪════════════════════╪════════════════════╪════════════════════╪════════════════════╡ │ 1 │ 1 │ 1 │ 1 │ 1 │ ╘════════════════════╧════════════════════╧════════════════════╧════════════════════╧════════════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒══════════╤══════════╤══════════╤══════════╤══════════╕ │ 74.402 │ 73.373 │ 54.675 │ 81.804 │ 76.126 │ ╞══════════╪══════════╪══════════╪══════════╪══════════╡ │ 1 │ 1 │ 1 │ 1 │ 1 │ ╘══════════╧══════════╧══════════╧══════════╧══════════╛ ******************************************************** * Relative Frequencies original variables (first 5) * ******************************************************** life variable (2011 life expectancy at birth (years)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │ 74.402 │ 73.373 │ 54.675 │ 81.804 │ 76.126 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ ╘════════════╧════════════╧════════════╧════════════╧════════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒════════════════════╤════════════════════╤════════════════════╤════════════════════╤════════════════════╕ │ 9425.32586978275 │ 15313.8593472276 │ 389.763634253063 │ 268.331790297681 │ 11744.8341671737 │ ╞════════════════════╪════════════════════╪════════════════════╪════════════════════╪════════════════════╡ │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ ╘════════════════════╧════════════════════╧════════════════════╧════════════════════╧════════════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │ 74.402 │ 73.373 │ 54.675 │ 81.804 │ 76.126 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ 0.00584795 │ ╘════════════╧════════════╧════════════╧═════════��══╧════════════╛

# Min and Max continuous variables: min_max = OrderedDict() dict1 = OrderedDict()

dict1['min'] = data1.lifeexpectancy.min() dict1['max'] = data1.lifeexpectancy.max() min_max['lifeexpectancy'] = dict1

dict2 = OrderedDict() dict2['min'] = data1.incomeperperson.min() dict2['max'] = data1.incomeperperson.max() min_max['incomeperperson'] = dict2

dict3 = OrderedDict() dict3['min'] = data1.alcconsumption.min() dict3['max'] = data1.alcconsumption.max() min_max['alcconsumption'] = dict3

df = pd.DataFrame([min_max['incomeperperson'],min_max['lifeexpectancy'],min_max['alcconsumption']], index = ['incomeperperson','lifeexpectancy','alcconsumption']) print (tabulate(df.sort_index(axis=1, ascending=False), headers=['Var','Min','Max'])) data2 = data1.copy()

Var Min Max --------------- ------- ------- incomeperperson 103.776 952.827 lifeexpectancy 47.794 83.394 alcconsumption 0.05 9.99

# Maps incomeperperson_map = {1: '>=100 <5k', 2: '>=5k <10k', 3: '>=10k <20k', 4: '>=20K <30K', 5: '>=30K <40K', 6: '>=40K <50K' } lifeexpectancy_map = {1: '>=40 <50', 2: '>=50 <60', 3: '>=60 <70', 4: '>=70 <80', 5: '>=80 <90'} alcconsumption_map = {1: '>=0.5 <5', 2: '>=5 <10', 3: '>=10 <15', 4: '>=15 <20', 5: '>=20 <25'}

# absolute Frequency distributions

freq_life_n = data2.lifeexpectancy.value_counts(sort=False) freq_income_n = data2.incomeperperson.value_counts(sort=False) freq_alcohol_n = data2.alcconsumption.value_counts(sort=False)

freq_life_r = data2.lifeexpectancy.value_counts(sort=False, normalize=True) freq_income_r = data2.incomeperperson.value_counts(sort=False, normalize=True) freq_alcohol_r = data2.alcconsumption.value_counts(sort=False, normalize=True)

print ('************************') print ('* Absolute Frequencies *') print ('************************') print ('\nlife variable ('+LIFE+'):') print( tabulate([freq_life_n], tablefmt="fancy_grid", headers=(lifeexpectancy_map.values()))) print ('\nincome variable ('+INCOME+'):') print( tabulate([freq_income_n], tablefmt="fancy_grid", headers=(incomeperperson_map.values()))) print ('\nalcohol variable ('+ALCOHOL+'):') print( tabulate([freq_alcohol_n], tablefmt="fancy_grid", headers=(alcconsumption_map.values())))

print ('\n************************') print ('* Relative Frequencies *') print ('************************') print ('\nlife variable ('+LIFE+'):') print( tabulate([freq_life_r], tablefmt="fancy_grid", headers=(lifeexpectancy_map.values()))) print ('\nincome variable ('+INCOME+'):') print( tabulate([freq_income_r], tablefmt="fancy_grid", headers=(incomeperperson_map.values()))) print ('\nalcohol variable ('+ALCOHOL+'):') print( tabulate([freq_alcohol_r], tablefmt="fancy_grid", headers=(alcconsumption_map.values())))

************************ * Absolute Frequencies * ************************ life variable (2011 life expectancy at birth (years)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │ >=40 <50 │ >=50 <60 │ >=60 <70 │ >=70 <80 │ >=80 <90 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │ 8 │ 28 │ 35 │ 80 │ 20 │ ╘════════════╧════════════╧════════════╧════════════╧════════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒══════════════╤═════════════╤══════════════╤══════════════╤══════════════╤══════════════╕ │ >=100 <5k │ >=5k <10k │ >=10k <20k │ >=20K <30K │ >=30K <40K │ >=40K <50K │ ╞══════════════╪═════════════╪══════════════╪══════════════╪══════════════╪══════════════╡ │ 110 │ 24 │ 15 │ 12 │ 9 │ 0 │ ╘══════════════╧═════════════╧══════════════╧══════════════╧══════════════╧══════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒════════════╤═══════════╤════════════╤════════════╤════════════╕ │ >=0.5 <5 │ >=5 <10 │ >=10 <15 │ >=15 <20 │ >=20 <25 │ ╞════════════╪═══════════╪════════════╪════════════╪════════════╡ │ 63 │ 56 │ 31 │ 10 │ 1 │ ╘════════════╧═══════════╧════════════╧════════════╧════════════╛ ************************ * Relative Frequencies * ************************ life variable (2011 life expectancy at birth (years)): ╒════════════╤════════════╤════════════╤════════════╤════════════╕ │ >=40 <50 │ >=50 <60 │ >=60 <70 │ >=70 <80 │ >=80 <90 │ ╞════════════╪════════════╪════════════╪════════════╪════════════╡ │ 0.0467836 │ 0.163743 │ 0.204678 │ 0.467836 │ 0.116959 │ ╘════════════╧════════════╧════════════╧════════════╧════════════╛ income variable (2010 Gross Domestic Product per capita in constant 2000 US$): ╒══════════════╤═════════════╤══════════════╤══════════════╤══════════════╤══════════════╕ │ >=100 <5k │ >=5k <10k │ >=10k <20k │ >=20K <30K │ >=30K <40K │ >=40K <50K │ ╞══════════════╪═════════════╪══════════════╪══════════════╪══════════════╪══════════════╡ │ 0.647059 │ 0.141176 │ 0.0882353 │ 0.0705882 │ 0.0529412 │ 0 │ ╘══════════════╧═════════════╧══════════════╧══════════════╧══════════════╧══════════════╛ alcohol variable (2008 alcohol consumption per adult (liters, age 15+)): ╒════════════╤═══════════╤════════════╤════════════╤════════════╕ │ >=0.5 <5 │ >=5 <10 │ >=10 <15 │ >=15 <20 │ >=20 <25 │ ╞════════════╪═══════════╪════════════╪════════════╪════════════╡ │ 0.391304 │ 0.347826 │ 0.192547 │ 0.0621118 │ 0.00621118 │ ╘════════════╧═══════════╧════════════╧════════════╧════════════╛

Explanation -

I collapsed the responses for lifeexpectancy, incomeperperson, and alcconsumption to create three new variables: life, income, and alcohol. For life, the most commonly endorsed response was 4(>=70 <80) (46.78%), meaning that most countries have a life expectancy between 70 to 80 year old. For income, the most commonly endorsed response was 1(>=100 <5k) (64.7%), meaning that more than half of the countries have income level is between 100 to 5000 dollar. For alcohol, the most commonly endorsed response was 1(>=0.5 <5) (39.68%), meaning that the alcohol consumption for most countries is .5 to 5 liters.

0 notes