#CO2emmision | Explore Tumblr Posts and Blogs

dheedataanalysis · 4 years

Text

Data Management for the GapMinder Dataset

The Data Management steps taken in this study is described in this post.

Summary of Data Management Steps

For all the variables used in this research, missing data/ empty strings are replaced by NaN.

Frequency Tables are drawn to compare between the datasets before and after the Data Management Steps.

It can be observed that the NaN values took the frequency of its respective missing data, and all other observation frequency remained the same.

‘urbanrate’ is categorized into four different bins with the same interval, and ‘incomperperson’ is categorized using quartiles.

Other Data Management processes like creating secondary variables are not used because, the main variables ‘incomeperperson’, ‘urbanrate’, and ‘relectricperperson’ used in this study have quantitative and distinct obesrvations.

Outline of Code

The step to step of code cells are explained here.

Cell 1: Import the modules needed for Data Management

Cell 2: Load the GapMinder dataset into the programming environment stored in data, and create your own datasets with variables relevant to the study (data_new, data_nan).

Cell 3: Replacing the missing data “ “ in the ‘incomeperperson’ variable with np.nan.

From the Output of this cell, it is noticed that the missing data of frequency count of 23 in the first dataframe, is replaced by NaN of frequency 23 in the second dataframe.

Cell 4: Replacing the missing data “ “ in the ‘urbanrate’ variable with np.nan.

From the output, it is evident that the missing data of frequency 10 in the first dataframe, is replaced by NaN of the same frequency in the second dataframe.

Cell 5: Replacing the missing data “ “ in the ‘employrate’ variable with np.nan.

From the output, we see that the missing data of frequency 35 in the first dataframe, is replaced by NaN of the same frequency count in the second dataframe.

Cell 6: Replacing the missing data “ “ in the ‘co2emmisions’ variable with np.nan.

From the output, it is noticed that the missing data of frequency 13 in the first dataframe, is replaced by NaN of the same frequency count in the second dataframe.

Cell 7: Replacing the missing data “ “ in the ‘relectricperperson’ variable with np.nan.

The output shows that the missing data of frequency 77 in the first dataframe, is replaced by NaN of the same frequency count 77 in the second dataframe.

Cell 8: Converting the datasets into numeric values, so we can use the value_counts function on them.

Cell 9: Categorizing the ‘incomeperperson’ observations into four classes using the quartiles function pd.qcut.

From the output, we can see that the low_income observations are 48, the low-middle 47, the high-middle 47, and the high income observations 48.

Cell 10: Categorizing the ‘urbanrate’ using bins of interval 18. The ranges 10-28, 29-46, 47-64, 65-82, and 83-100.

From the output, the frequency and percentage of each observation in its range is drawn in the dataframe. The ranging works because python indexes starting from 0, and that is why the categories have an open bracket in the beginning.

Adjustments Made

In the Frequency Distribution for the GapMinder Dataset post, I used the groupby function in Python to get the frequency of the observations under each variable. Replacing the missing data with NaN, i need Python not to drop the NaN values. For this, dropna=False is written in the arguments. The Groupby function does not have this argument until Pandas version 1.1, so i decided to use the Value_counts function.

For easy comparison of the datasets before and after the data management step, i need Python to display the Frequency Distribution Dataframes side by side. So i displayed them by using IPython’s display_html function.

Panda’s to_numeric function is used instead of convert_numeric. The do the sane function but convert_numeric does not work on dataframes.

#Data Management #Data Analysis #Cousera

1 note · View note

nsanguphiri-blog · 6 years

Text

Data Management & Visualization-week one

Introduction

This is week one assignment for the data management and visualization course. the objective was to select a codebook among the five made available and come up with a research topic on a specific data sets from the selected codebook.

Data set

Selected data set: Gapminder

After viewing and analyzing the Gapminder codebook provided, Co2emission (carbon dioxide emissions) particularly caught my interest. Thereafter, i have decided to choose Co2emissions as a specific topic of interest.

Specific research topic: While Co2emissions is a starting point, the aim is to explore if economic and social factors affect the rate of carbon dioxide emissions in cities. in particular, is there an association between Urban-rate and Co2emissions. In that case, i decided to incorporate the following variables listed below in my codebook:

urban-rate (urban population)

oilperperson (oil consumption)

incomeperperson (gross domestic product per capita)

Second research topic: After researching around the specific topic of interest, is there correlation between Urbanrate and Co2emission. For the second research topic, I saw a possibility to explore the association between incomeperperson and urbanrate.

Literature Review:

Search terms used: “urbanrate co2emissions”,”incomeperperson urbanrate”,”oilperperson co2emmisions”.

[1] The Impact of Urbanization on Carbon Emission: Empirical Evidence in Beijing

In this article, the authors examine historical data in Beijing from 1980 to 2013. According to their findings, the results showed that, first, the urbanization level plays a positive role in promoting carbon emission no matter in the long or short term during the sample period. Second, the impact of per capita energy consumption on carbon emission does not seem significant in the long or short term, which is due to the energy efficiency improvement and energy consumption structure adjustment in the past decades. Third, the growth of per capita GDP may curb carbon emission growth in the long term although its impact in the short term does not appear statistically significant, and they find significant inverted U-shaped relationship between per capita GDP and per capita carbon emission in Beijing during the sample period. Finally, there exists significant negative adjusting mechanism from the short term towards the long term among these variables.

[2] The effect of energy and urbanization on carbon dioxide emission: evidence from Ghana

The authors analyze the declining trend of renewable energy consumption as well as a change in the energy mix for electricity production, growing urban population and carbon dioxide (CO2) emission in Ghana, they examine the effect of urbanization and energy on carbon dioxide emission in Ghana within the framework of the Environmental Kuznets Curve (EKC) Hypothesis over the period 1971–2013. According to their findings combustible renewables and waste consumption, electricity production from hydro and trade openness are found to reduce carbon dioxide emission while fossil fuel consumption, electricity production from fossil fuels, urbanization and industrialization increase carbon dioxide emission for Ghana. They went on to and find that an interaction between urbanization and combustible renewables and waste consumption, however, has a positive effect on Co2emissions while the interaction between urbanization and fossil fuel consumption has a negative effect.

[3] Relationship between Urbanization and Co2-Emissions Depends on Income Level and Policy

In this publication, the authors investigate empirically how national-level CO2emissions are affected by urbanization and environmental policy while putting in consideration the income variable. According to their findings, the urbanization−emissions elasticity may depend on the strength of a country’s environmental policy, not just marginal increases in income. With increasing urbanization and wealth, developing nations have been shown to pollute faster, and at lower income levels,than did developed countries.They went on to add that urbanization’s impact on CO2emissions is smaller in higher-income countries than in other countries but with a positive elasticity for all income levels. In contrast, the authors find that urbanization−emissions elasticity to be negative and statistically significant only in specific cases (countries with higher-income and strong environmental policy). Finally, their findings indicate that the relation-ship between urbanization and higher-income is not in itself sufficient to foster a negative elasticity with carbon emissions;rather, strong environmental policy and its implementation are essential to reduce the environmental footprint of urbanization.

[4] Relationship between Urbanization and CO2Emissions Dependson Income Level and Policy

According to the authors analysis of data for the metropolitan areas of the United States from 1970 to 1990. Indicates per capita income increases directly with population size. they further stated that, states of the United States and 113 countries for 1960 and 1980, showed a positive relationship exits and holds temporally between level of per capita Gross Domestic Product and percentage of the population that is urban.

Primary research topic hypothesis:

Based on the analysis on the literature review articles, In higher-income countries and in lower-income countries, Urban population might be directly associated with co2emissions of a country.

Secondary research topic hypothesis:

Gross Domestic Product per capita have a direct correlation with the level of Urban population of a country.

Reference

[1] Zhang, Yue-Jun & Yi, Wei-Chen & Li, Bo-Wen. (2015). The Impact of Urbanization on Carbon Emission: Empirical Evidence in Beijing. Energy Procedia. 75. 2963-2968. 10.1016/j.egy

[2] Kwakwa, Paul & Alhassan, Hamdiyah. (2018). The effect of energy and urbanisation on carbon dioxide emission: evidence from Ghana. OPEC Energy Review. 10.1111/opec.12133.

[3] Diego Ponce de Leon Barido & Julian D. Marshall. (2013). Relationship between Urbanization and CO2Emissions Depends on Income Level and Policy.

[4] Jones BG & Kone S. (1996). Relationship between Urbanization and CO2Emissions Dependson Income Level and Policy. pap REg sci, 75(2):135-54.

1 note · View note