gapminderdataanal-blog - Tumblr blog

gapminderdataanal-blog · 4 years

Text

Assignment 4

Below is the code for Assignment 4:

# -*- coding: utf-8 -*- """ Spyder Editor

"""

import pandas import numpy import seaborn import matplotlib.pyplot as plt

data = pandas.read_csv("gapminder.csv", low_memory=False)

#setting variables to numeric data["polityscore"] = data["polityscore"].apply(pandas.to_numeric, errors='coerce') data["incomeperperson"] = data["incomeperperson"].apply(pandas.to_numeric, errors='coerce') data["femaleemployrate"] = data["femaleemployrate"].apply(pandas.to_numeric, errors='coerce')

data['polityscore']=data['polityscore'].replace(' ', numpy.nan) data['incomeperperson']=data['incomeperperson'].replace(' ', numpy.nan) data['femaleemployrate']=data['femaleemployrate'].replace(' ', numpy.nan)

desc1 = data['polityscore'].describe() print (desc1)

desc2 = data['incomeperperson'].describe() print (desc2)

desc3 = data['femaleemployrate'].describe() print (desc3)

# Polity score distribution graph seaborn.distplot(data["polityscore"].dropna(), kde=False); plt.xlabel('Polity Score') plt.title('Estimated Number of countries per Polity Score')

# Quartile Income distribution graph print ('Income per person - 4 categories - quartiles') data['INCOMEGRP4']=pandas.qcut(data.incomeperperson, 4, labels=["1=25th%tile","2=50%tile","3=75%tile","4=100%tile"]) c1 = data['INCOMEGRP4'].value_counts(sort=False, dropna=True) data["INCOMEGRP4"] = data["INCOMEGRP4"].astype('category') seaborn.countplot(x="INCOMEGRP4", data=data) plt.xlabel('Income categories') plt.title('Number of countries in each income quartile')

# Female Employ rate distribution graph seaborn.distplot(data["femaleemployrate"].dropna(), kde=False); plt.xlabel('Female Employ Rate %') plt.title('Estimated Number of countries per Female Employ Rate')

# Scatter plot comparing polity score with income per person scat1 = seaborn.regplot(x="polityscore", y="incomeperperson", fit_reg=True, data=data) plt.xlabel('Polity Score') plt.ylabel('Income Per Person') plt.title('Scatterplot for the Association Between Polity Score and Income Per Person')

# Scatter plot comparing polity score with female employ rate scat2 = seaborn.regplot(x="polityscore", y="femaleemployrate", fit_reg=True, data=data) plt.xlabel('Polity Score') plt.ylabel('Femal Employ Rate') plt.title('Scatterplot for the Association Between Polity Score and Female Employ Rate')

Interpretation of Gapminder variables:

Polityscore variable interpretation. The following is the frequency description of the variable:

count 161.000000 mean 3.689441 std 6.314899 min -10.000000 25% -2.000000 50% 6.000000 75% 9.000000 max 10.000000 Name: polityscore, dtype: float64

Polityscore shows the level of democracy of 161 countries with the lowest level at -10 and the highest level at +10. It shows that the mean is around 3.7 with a huge variation. The above description also shows that majority of countries are more democratic than not. This is further substantiated by the following histogram of polityscore.

Incomeperperson variable interpretation. The following is the frequency description of the variable:

count 190.000000

mean 8740.966076 std 14262.809083 min 103.775857 25% 748.245151 50% 2553.496056 75% 9379.891166 max 105147.437700 Name: incomeperperson, dtype: float64

The incomeperperson shows huge variability across countries and no meaningful insight can be gained by the above description. In the following graph I divide the income into four quartiles and see the number of countries in each of these quartile:

This shows that the 190 countries are evenly distributed along the four quartiles.

Femaleemployrate variable interpretation. The following is the frequency description of the variable: count 178.000000 mean 47.549438 std ��14.625743 min 11.300000 25% 38.725000 50% 47.549999 75% 55.875000 max 83.300003 Name: femaleemployrate, dtype: float64

The above distribution shows that the variance around the mean pf 47.5% is small as evidenced by the following graph:

Bi-variate Analysis:

The following scatter plot shows the relationship between the independent variable polityscore and the dependent variable incomeperperson. It shows a fairly strong relationship that per capita income raises for people in more democratic countries than those for less democratic countries.

Finally the following scatter plot shows the relationship between the independent variable polityscore and the dependent variable femaleemployrate. It shows a that femaleemployrate is same across all levels of democracy.

0 notes

gapminderdataanal-blog · 4 years

Text

Assignment 3

Below is the code:

# -*- coding: utf-8 -*- """ Spyder Editor

"""

import pandas import numpy import os

data = pandas.read_csv("gapminder.csv", low_memory=False)

# polityscore analysis print ("The polityscore ranging between -10 to 10 is a summary measure of a country's democratic and free nature.\n") print ("-10 represents most authoritarian regimes and 10 represents mature democracies.\n") data["polityscore"] = data["polityscore"].apply(pandas.to_numeric, errors='coerce')

ps = data["polityscore"].value_counts(dropna=False).reset_index() ps.columns = ["polityscore", "Count"] ps = ps.sort_values(by=["polityscore"]) #print (ps.to_string(index=False))

print ("In the following after removing the missing data I aggregate or bin polityscores (ps) into following categories for easy classification of countries.")

# Mature democracies if ps >= 9 md = ps[(ps["polityscore"] >= 9)] print ("\nNumber of Mature Democracies if ps >= 9: \t\t\t", md["Count"].sum()) # Moderately mature democracies if ps is between 5 and 8 mmd = ps[(ps["polityscore"] >= 5) & (ps["polityscore"] <= 8)] print ("Number of Moderately Mature Democracies if ps between 5 and 8: \t", mmd["Count"].sum()) # Immature or new democracies if ps is between 0 and 4 nd = ps[(ps["polityscore"] >= 0) & (ps["polityscore"] <= 4)] print ("Number of Immature or New Democracies if ps between 0 and 4 \t", nd["Count"].sum()) # strict democracis if ps between -1 and -4 sd = ps[(ps["polityscore"] >= -4) & (ps["polityscore"] <= -1)] print ("Number of Slightly Authoritarian Regimes if ps between -1 and -4", sd["Count"].sum()) # Moderatly authoritarian regimes if ps is between -5 and -8 ar = ps[(ps["polityscore"] >= -8) & (ps["polityscore"] <= -5)] print ("Number of Moderately Authoritarian Regimes: ps between -5 and -8", ar["Count"].sum()) # Strictly authoritarian if ps <= -9 sar = ps[(ps["polityscore"] <= -9)] print ("Number of Strictly Authoritarian regimes if ps <= -9 \t\t", sar["Count"].sum()) print ("\nIn general number of democratic countries", md["Count"].sum()+mmd["Count"].sum()+nd["Count"].sum(), "is more than number of authoritarian regimes", sd["Count"].sum()+ar["Count"].sum()+sar["Count"].sum())

# incomeperperson analysis print ("\nFor the above categories I compare the average incomeperperson for people.") print ("The incomeperperson is 2010 Gross Domestic Product per capita in constant 2000 US$.\n") data["incomeperperson"] = data["incomeperperson"].apply(pandas.to_numeric, errors='coerce') md = data[(data["polityscore"] >= 9)] print ("Average incomeperperson in Mature Democracies:\t\t\t$", int(md["incomeperperson"].mean())) mmd = data[(data["polityscore"] >= 5) & (data["polityscore"] <= 8)] print ("Average incomeperperson in Moderately Mature Democracies:\t$", int(mmd["incomeperperson"].mean())) nd = data[(data["polityscore"] >= 0) & (data["polityscore"] <= 4)] print ("Average incomeperperson in Immature or New Democracies:\t\t$", int(nd["incomeperperson"].mean())) sd = data[(data["polityscore"] >= -4) & (data["polityscore"] <= -1)] print ("Average incomeperperson in Slightly Authoritarian Regimes:\t$", int(sd["incomeperperson"].mean())) ar = data[(data["polityscore"] >= -8) & (data["polityscore"] <= -5)] print ("Average incomeperperson in Moderately Authoritarian Regimes:\t$", int(ar["incomeperperson"].mean())) sar = data[(data["polityscore"] <= -9)] print ("Average incomeperperson in Strictly Authoritarian Regimes:\t$", int(sar["incomeperperson"].mean())) print ("There is a strong evidence from above analysis that people in Mature Democracies enjoy much higher incomeperperson than those from other forms of government.")

# femaleemployerate analysis print ("\nFor the above categories I also compare the average femaleemployrate for people.") print ("The femaleemployerate is the percentage of female population, age above 15, that has been employed during the given year.\n") data["femaleemployrate"] = data["femaleemployrate"].apply(pandas.to_numeric, errors='coerce') md = data[(data["polityscore"] >= 9)] print ("Average femaleemployrate in Mature Democracies:\t\t\t", int(md["femaleemployrate"].mean()),"\b%") mmd = data[(data["polityscore"] >= 5) & (data["polityscore"] <= 8)] print ("Average femaleemployrate in Moderately Mature Democracies:\t", int(mmd["femaleemployrate"].mean()),"\b%") nd = data[(data["polityscore"] >= 0) & (data["polityscore"] <= 4)] print ("Average femaleemployrate in Immature or New Democracies:\t", int(nd["femaleemployrate"].mean()),"\b%") sd = data[(data["polityscore"] >= -4) & (data["polityscore"] <= -1)] print ("Average femaleemployrate in Slightly Authoritarian Regimes:\t", int(sd["femaleemployrate"].mean()),"\b%") ar = data[(data["polityscore"] >= -8) & (data["polityscore"] <= -5)] print ("Average femaleemployrate in Moderately Authoritarian Regimes:\t", int(ar["femaleemployrate"].mean()),"\b%") sar = data[(data["polityscore"] <= -9)] print ("Average femaleemployrate in Strictly Authoritarian Regimes:\t", int(sar["femaleemployrate"].mean()),"\b%") print ("There is a slim evidence from above analysis that female employment is slightly higher in Democracies than in Authoritarian regimes.")

Below is the output of the above code:

The polityscore ranging between -10 to 10 is a summary measure of a country's democratic and free nature.

-10 represents most authoritarian regimes and 10 represents mature democracies.

In the following after removing the missing data I aggregate or bin polityscores (ps) into following categories for easy classification of countries.

Number of Mature Democracies if ps >= 9: 48 Number of Moderately Mature Democracies if ps between 5 and 8: 49 Number of Immature or New Democracies if ps between 0 and 4 18 Number of Slightly Authoritarian Regimes if ps between -1 and -4 21 Number of Moderately Authoritarian Regimes: ps between -5 and -8 19 Number of Strictly Authoritarian regimes if ps <= -9 6

In general number of democratic countries 115 is more than number of authoritarian regimes 46

For the above categories I compare the average incomeperperson for people. The incomeperperson is 2010 Gross Domestic Product per capita in constant 2000 US$.

Average incomeperperson in Mature Democracies: $ 14528 Average incomeperperson in Moderately Mature Democracies: $ 2750 Average incomeperperson in Immature or New Democracies: $ 1179 Average incomeperperson in Slightly Authoritarian Regimes: $ 2662 Average incomeperperson in Moderately Authoritarian Regimes: $ 4888 Average incomeperperson in Strictly Authoritarian Regimes: $ 9636 There is a strong evidence from above analysis that people in Mature Democracies enjoy much higher incomeperperson than those from other forms of government.

For the above categories I also compare the average femaleemployrate for people. The femaleemployerate is the percentage of female population, age above 15, that has been employed during the given year.

Average femaleemployrate in Mature Democracies: �� 47% Average femaleemployrate in Moderately Mature Democracies: 48% Average femaleemployrate in Immature or New Democracies: 51% Average femaleemployrate in Slightly Authoritarian Regimes: 49% Average femaleemployrate in Moderately Authoritarian Regimes: 45% Average femaleemployrate in Strictly Authoritarian Regimes: 44% There is a slim evidence from above analysis that female employment is slightly higher in Democracies than in Authoritarian regimes.

0 notes

gapminderdataanal-blog · 4 years

Text

Assignment 2

Below is the code to generate the frequency count for three variables: polityscore, incomeperperson, femaleemployrate:

# -*- coding: utf-8 -*- import pandas

import numpy

import os

os.chdir('../Kris Stuff/Online Courses/Data Analysis') code_path = os.getcwd() gapminderfile = code_path+"/gapminder.csv"

data = pandas.read_csv(gapminderfile, low_memory=False)

# polityscore frequency count print ("Frequency count for polityscore.") print ("--------------------------------") print ("The polityscore is a summary measure of a country's democratic and free nature.") print ("-10 is the lowest value, 10 the highest.") print ("In the following the frequency count is sorted from lowest to highest polityscore.") data["polityscore"] = data["polityscore"].apply(pandas.to_numeric, errors='coerce') c1 = data["polityscore"].value_counts(dropna=False).reset_index() c1.columns = ["polityscore", "Count"] c1 = c1.sort_values(by=["polityscore"]) print (c1.to_string(index=False))

# incomeperperson frequency count print ("\nFrequency count for incomeperperson.") print ("------------------------------------") print ("The incomeperperson is 2010 Gross Domestic Product per capita in constant 2000 US$.") print ("In the following the frequency count is sorted from lowest to highest incomeperperson.") data["incomeperperson"] = data["incomeperperson"].apply(pandas.to_numeric, errors='coerce') c2 = data["incomeperperson"].value_counts(dropna=False).reset_index() c2.columns = ["incomeperperson", "Count"] c2 = c2.sort_values(by=["incomeperperson"]) print (c2.to_string(index=False))

# femaleemployrate frequency count print ("\nFrequency count for femaleemployrate.") print ("---------------------------------------") print ("The femaleemployrate is the percentage of female population, age above 15, that has been employed during the given year.") print ("In the following the frequency count is sorted from lowest to highest femaleemployrate.") data["femaleemployrate"] = data["femaleemployrate"].apply(pandas.to_numeric, errors='coerce') c3 = data["femaleemployrate"].value_counts(dropna=False).reset_index() c3.columns = ["femaleemployrate", "Count"] c3 = c3.sort_values(by=["femaleemployrate"]) print (c3.to_string(index=False))

Below is the output of the above code showing frequency counts for three variables: polityscore, incomeperperson, femaleemployrate:

Frequency count for polityscore. ------------------------------------------ The polityscore is a summary measure of a country's democratic and free nature. -10 is the lowest value, 10 the highest. In the following the frequency count is sorted from lowest to highest polityscore. polityscore Count -10.0 2 -9.0 4 -8.0 2 -7.0 12 -6.0 3 -5.0 2 -4.0 6 -3.0 6 -2.0 5 -1.0 4 0.0 6 1.0 3 2.0 3 3.0 2 4.0 4 5.0 7 6.0 10 7.0 13 8.0 19 9.0 15 10.0 33 NaN 52

Frequency count for incomeperperson. --------------------------------------------------- The incomeperperson is 2010 Gross Domestic Product per capita in constant 2000 US$. In the following the frequency count is sorted from lowest to highest incomeperperson. incomeperperson Count 103.775857 1 115.305996 1 131.796207 1 155.033231 1 161.317137 1 180.083376 1 184.141797 1 220.891248 1 239.518749 1 242.677534 1 268.259450 1 268.331790 1 269.892881 1 275.884287 1 276.200413 1 279.180453 1 285.224449 1 320.771890 1 336.368750 1 338.266391 1 354.599726 1 358.979540 1 369.572954 1 371.424197 1 372.728414 1 377.039699 1 377.421113 1 389.763634 1 411.501447 1 432.226337 1 456.385712 1 468.696044 1 495.734247 1 523.950151 1 544.599477 1 554.879840 1 557.947513 1 558.062877 1 561.708585 1 591.067944 1 595.874534 1 609.131206 1 610.357367 1 668.547943 1 713.639303 1 722.807559 1 736.268054 1 744.239413 1 760.262365 1 772.933345 1 786.700098 1 895.318340 1 948.355952 1 952.827261 1 1036.830725 1 1143.831514 1 1144.102193 1 1194.711433 1 1200.652075 1 1232.794137 1 1253.292015 1 1258.762596 1 1295.742686 1 1324.194906 1 1326.741757 1 1381.004268 1 1383.401869 1 1392.411829 1 1525.780116 1 1543.956457 1 1621.177078 1 1714.942890 1 1728.020976 1 1784.071284 1 1810.230533 1 1844.351028 1 1860.753895 1 1914.996551 1 1959.844472 1 1975.551906 1 2025.282665 1 2062.125152 1 2146.358593 1 2161.546510 1 2183.344867 1 2221.185664 1 2222.335052 1 2230.676374 1 2231.993335 1 2344.896916 1 2425.471293 1 2437.282445 1 2481.718918 1 2534.000380 1 2549.558474 1 2557.433638 1 2636.787800 1 2667.246710 1 2668.020519 1 2712.517199 1 2737.670379 1 2923.144355 1 3164.927693 1 3180.430612 1 3233.423780 1 3545.652174 1 3665.348369 1 3745.649852 1 4038.857818 1 4049.169629 1 4180.765821 1 4189.436587 1 4495.046262 1 4699.411262 1 4885.046701 1 5011.219456 1 5182.143721 1 5184.709328 1 5188.900935 1 5248.582321 1 5330.401612 1 5332.238591 1 5348.597192 1 5528.363114 1 5634.003948 1 5900.616944 1 6105.280743 1 6147.779610 1 6238.537506 1 6243.571318 1 6334.105194 1 6338.494668 1 6575.745044 1 6746.612632 1 7381.312751 1 7885.468037 1 8445.526689 1 8614.120219 1 8654.536845 1 9106.327234 1 9175.796015 1 9243.587053 1 9425.325870 1 10480.817200 1 10749.419240 1 11066.784140 1 11191.811010 1 11744.834170 1 11894.464070 1 12505.212540 1 12729.454400 1 13577.879890 1 14778.163930 1 15313.859350 1 15461.758370 1 15822.112140 1 16372.499780 1 17092.460000 1 18982.269290 1 19630.540550 1 20751.893420 1 21087.394120 1 21943.339900 1 22275.751660 1 22878.466570 1 24496.048260 1 25249.986060 1 25306.187190 1 25575.352620 1 26551.844240 1 26692.984110 1 27110.731590 1 27595.091350 1 28033.489280 1 30532.277040 1 31993.200690 1 32292.482980 1 32535.832510 1 33923.313870 1 33931.832080 1 33945.314420 1 35536.072470 1 37491.179520 1 37662.751250 1 39309.478860 1 39972.352770 1 52301.587180 1 62682.147010 1 81647.100030 1 105147.437700 1 NaN 23

Frequency count for femaleemployrate. ---------------------------------------------------- The femaleemployrate is the percentage of female population, age above 15, that has been employed during the given year. In the following the frequency count is sorted from lowest to highest femaleemployrate. femaleemployrate Count 11.300000 1 12.400000 1 13.000000 1 16.700001 1 17.700001 1 18.200001 1 19.000000 1 20.299999 1 21.400000 1 21.900000 1 22.200001 1 22.299999 1 22.600000 1 23.200001 1 25.600000 1 26.799999 1 26.900000 1 27.900000 1 30.100000 1 30.200001 1 30.400000 1 31.700001 1 32.299999 1 34.200001 2 34.299999 1 34.400002 1 34.599998 1 34.900002 1 35.400002 1 35.500000 1 35.799999 1 36.000000 1 36.299999 1 36.500000 1 36.799999 1 37.299999 2 37.799999 1 37.900002 1 38.000000 1 38.099998 1 38.200001 1 38.299999 1 38.700001 1 38.799999 1 39.200001 1 39.400002 1 39.599998 3 39.900002 1 40.099998 1 40.299999 1 40.500000 1 41.099998 1 41.700001 2 41.799999 1 42.000000 1 42.099998 4 43.099998 1 43.400002 1 43.599998 2 43.700001 1 43.799999 1 44.000000 1 44.099998 1 44.799999 1 45.299999 2 45.500000 1 45.599998 1 45.799999 1 45.900002 2 46.000000 1 46.200001 1 46.400002 1 46.799999 2 47.099998 3 47.500000 1 47.599998 1 48.000000 1 48.400002 1 48.500000 1 48.599998 1 48.799999 2 49.000000 1 49.400002 2 49.700001 1 49.799999 1 49.900002 1 50.099998 1 50.400002 1 50.500000 1 50.599998 1 50.700001 2 50.900002 1 51.000000 1 51.299999 3 51.599998 1 51.700001 1 52.099998 1 52.299999 1 52.599998 2 53.099998 1 53.200001 1 53.299999 1 53.400002 1 53.500000 1 53.599998 1 53.799999 1 53.900002 1 54.299999 1 54.599998 2 54.700001 1 54.900002 1 55.500000 1 56.000000 2 56.200001 1 56.700001 1 56.900002 1 57.000000 ��1 57.500000 1 58.099998 2 58.200001 2 58.299999 1 58.900002 1 59.299999 1 59.799999 1 60.299999 1 60.700001 1 60.900002 1 61.599998 1 62.900002 1 63.400002 1 64.099998 1 65.000000 1 65.300003 1 65.699997 1 66.300003 1 66.500000 1 66.599998 1 67.599998 1 68.900002 2 69.000000 1 69.400002 1 69.599998 1 73.000000 1 73.400002 1 75.800003 1 76.099998 1 77.599998 1 78.099998 1 79.199997 1 80.000000 1 80.500000 1 82.199997 1 83.300003 1 NaN 35

Summary:

With the polityscore frequency distribution if we ignore the large number (52) of missing data we see that there are more number of democratic countries than undemocratic countries.

With incomeperperson and femaleemployrate frequency distributions it is hard to glean any insight as each variable mostly occurs once.

0 notes

gapminderdataanal-blog · 5 years

Text

Assignment 1

I'll be using the Gapminder data for my project.

Topic 1

Does type of government influence the wealth of people?

Variables: incomeperperson and polityscore

Literature Review

Does High Income Promote Democracy? Author(s): John B. Londregan and Keith T. Poole Source: World Politics, Vol. 49, No. 1 (Oct., 1996), pp. 1-30 Published by: Cambridge University Press Stable URL: https://www.jstor.org/stable/25053987

Below is the summary of this article:

"In this analysis we examine the empirical regularity that countries with high incomes are more likely to enjoy democratic political institutions than their low-income counterparts. Some argue that economic development promotes democracy, while others claim this regularity is simply a chance by-product of the fact that countries with democratic political cultures industrialized first. Using techniques that correct for an array of measures of the institutional context, for idiosyncratic features of individual country histories, and for the potential simultaneity of the processes of leadership change and regime change, we test whether the democratizing effects of income are a mere by-product of failing to account for political and historical context.

We find that even after correcting for many features of the political and historical context, the democratizing effect of income remains as a significant factor promoting the emergence of democratic political institutions."

Based on this article my hypothesis is, "Wealthier people (incomeperperson) live in more democratic countries (polityscore).

Topic 2

Does type of government influence female employment? Variables: femaleemployrate and polityscore

Literature Review

Democracy and Female Labor Force Participation: An Empirical Examination Author(s): Ghazal Bayanpourtehrani and Kevin Sylwester Source: Social Indicators Research, Vol. 112, No. 3 (July 2013), pp. 749-762 Published by: Springer Stable URL: https://www.jstor.org/stable/24719397

Below is the summary of this article:

“Our results provide for some interesting findings and implications. The first is that democracy is not associated with higher Female Labor Force Participation (FLFP). No where do we find strong evidence that FLFP is higher under democracy. To the extent that more countries become democratic in the future, our results give pause to views that such political changes will greatly increase women's participation in the formal marketplace. If anything, FLFP appears to be lower in democracies.”

Based on this article my hypothesis is, "Female employment rate (femaleemployrate) is high in less democratic countries (polityscore)."

0 notes