Don't wanna be here? Send us removal request.
Text
Graphs for Gapminder Dataset
PROGRAM
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
/*Loading Gapminder Dataset*/ DATA new; SET mydata.gapminder;
/*Labeling Variables*/ Label relectricperperson="Electricity Per Person(KWH/Person/YEAR)" oilperperson="Oil Per Person(Tonnes/Person/year)" urbanrate="Urban Rate(% of total)" lifeexpectancy="Life Expectancy(Years)" co2emissions="Co2 Emissions(metric Tonnes)";
/*Data management*/
/*coding out missing data and coding in valid data*/ If relectricperperson=0 then relectricperperson=.; IF lifeexpectancy=. then lifeexpectancy=50; IF urbanrate=. then urbanrate=50;
/*Grouping variables*/ IF lifeexpectancy LT 50 Then HealthStandard="LOW"; ELSE IF lifeexpectancy LT 75 and lifeexpectancy GE 50 THEN HealthStandard="MEDIUM"; ELSE IF lifeexpectancy GE 75 THEN HealthStandard="HIGH";
/*Data management*/ IF urbanrate GE 0 and urbanrate LT 25 then UrbanizationLevel=4; ELSE IF urbanrate GE 25 and urbanrate LT 50 then UrbanizationLevel=3; ELSE IF urbanrate GE 50 and urbanrate LT 75 then UrbanizationLevel=2; ELSE IF urbanrate GE 75 then UrbanizationLevel=1;
IF co2emissions LT 2500000000 then PollutionLevel="LOW"; ELSE IF co2emissions GE 2500000000 and co2emissions LT 5000000000 then PollutionLevel="MEDIUM"; ELSE IF co2emissions GE 5000000000 then PollutionLevel="HIGH";
run;
/*Frequency table for three Grouping variables*/ PROC freq; TABLE HealthStandard UrbanizationLevel PollutionLevel; run;
/*Procedure to obtain brief summary of variables in dataset*/ Proc MEANS; run;
/*Univariate Graph*/ PROC GCHART; VBAR HealthStandard/DISCRETE TYPE=PCT WIDTH=30; run;
PROC GCHART; VBAR UrbanizationLevel/DISCRETE TYPE=PCT WIDTH=30; run;
PROC GCHART; VBAR PollutionLevel/DISCRETE TYPE=PCT WIDTH=30; run;
/*bivariate Graph*/
PROC GCHART; VBAR HealthStandard/DISCRETE TYPE=MEAN SUMVAR=UrbanizationLevel; run;
PROC GPLOT;PLOT relectricperperson*urbanrate; run;
PROC GPLOT;PLOT co2emissions*oilperperson; run;
PROC GPLOT;PLOT co2emissions*relectricperperson; run;
PROC GPLOT;PLOT co2emissions*urbanrate; run;
PROC GPLOT;PLOT co2emissions*lifeexpectancy; run;
OBSERVATIONS
UNIVARIATE GRAPHS
Health standard percentage graph
As observed, plot is unimodel, with more than 60% of countries have life expectancy 50-75, and graph is left skewed, reflecting that very few countries have life expectancy less than 50 years.
2. urbanization level percentage graph
The graph shows unimodel percentage distribution, with level two indicating urban rate of 25-50, having maximum percentage and skewed to right, which implies only few countries have urbanrate greater that 75.
3.Pollution Level percentage graph
The graph shows countries with low Co2 emission levels occupy maximum percentage and the percent drastically decreases as Co2 levels increase.
BIVARIATE GRAPH
1. Health standard VS Urbanization Levels
The graph for two categorical variables health standard and urbanization levels shows that as urban rate increases, health standard Decreases.
This directly shows people from developed countries have low life expectancy.
2. Urban Rate vs Electricity per capita
The graph clearly shows as urban rate increases, electricity per person also increases.
3. CO2 Emission vs Oil per person graph
Graph between two continuous variable shows oil consumption and CO2 emissions dont have any connection.
4. CO2 Emission vs electricity per capita graph
As Electricity per capita increases CO2 emissions increases, since electricity is produced with mostly non renewable energy sources like coal,oil.
5.CO2 Emission vs Urban Rate
As urban rate increases, co2 emissions increases too, since urbanization and living standards are achieved at the cost of energy sources.
6. CO2 Emission vs Life Expectancy
The graph clearly shows that as life expectancy increases, CO2 emissions also increases, since for some countries have better health standard are developed countries with high energy utilization.
SUMMARY
One of the graph clearly shows that electricity usage is greater in more urbanized countries.
Also,CO2 emissions increases as both urban rate and life expectancy increases from last two graphs.
So we can conclude as people life standards increases, we tend to use more resources and which leads to high CO2 emissions.
0 notes
Text
My Data Management Decisions on GAPMINDER Dataset
PROGRAM
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
/*Loading Gapminder Dataset*/ DATA new; SET mydata.gapminder;
/*Labeling Variables*/ Label relectricperperson="Electricity Per Person" oilperperson="Oil Per Person" urbanrate="Urban Rate" lifeexpectancy="Life Expectancy" co2emissions="Co2 Emissions";
/*Data management*/
/*coding out missing data and coding in valid data*/ If relectricperperson=0 then relectricperperson=.; IF lifeexpectancy=. then lifeexpectancy=50; IF urbanrate=. then urbanrate=50;
/*Grouping variables*/ IF lifeexpectancy LT 50 Then HealthStandard="LOW"; ELSE IF lifeexpectancy LT 75 and lifeexpectancy GE 50 THEN HealthStandard="MEDIUM"; ELSE IF lifeexpectancy GE 75 THEN HealthStandard="HIGH";
/*Data management*/ IF urbanrate GE 0 and urbanrate LT 25 then UrbanizationLevel=4; ELSE IF urbanrate GE 25 and urbanrate LT 50 then UrbanizationLevel=3; ELSE IF urbanrate GE 50 and urbanrate LT 75 then UrbanizationLevel=2; ELSE IF urbanrate GE 75 then UrbanizationLevel=1;
IF co2emissions LT 2500000000 then PollutionLevel="LOW"; ELSE IF co2emissions GE 2500000000 and co2emissions LT 5000000000 then PollutionLevel="MEDIUM"; ELSE IF co2emissions GE 5000000000 then PollutionLevel="HIGH";
PROC SORT; by country;
/*Frequency table for three Grouping variables*/ PROC freq; TABLE HealthStandard UrbanizationLevel PollutionLevel;
/*Procedure to obtain brief summary of variables in dataset*/ Proc MEANS; run;
RESULTS
Process
To Begin with I have considered electricity per person data as missing data, if value is 0.
Also, for life expectancy and urban rate, i have included missing values with a value of 50, which i consider logical, after looking at all values.
Also, I have created 3 grouping variables for Life expectancy(50,75), urban rate(25,50,75) and CO2 emissions (2500000000,5000000000) .
Could not create secondary variable as of now, since the observation I selected are not interlinked.
Observations
Taking life expectancy as direct measure of Health standard, Most of the countries have medium life expectancy(50-75), with some outliers in other categories.
Dividing urban levels into 4 levels, more than 65% of countries come under 25 and 75% urban rates.
Calculating pollution levels using CO2 emissions, we find that more than 80% countries have low pollution rates, and only 15-20% have higher pollution rate.
when analyzing countries with higher pollution rates, all countries where developed and developing countries!!
0 notes
Text
My First SAS Program
PROGRAM
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
/*Loading Gapminder Dataset*/ DATA new; SET mydata.gapminder;
/*Sorting Result by Country*/ PROC SORT; by country;
/*Frequency table for three variables*/ PROC freq; TABLE hivrate polityscore urbanrate;
/*Procedure to obtain brief summary of variables in dataset*/ Proc MEANS; run;
OUTPUT
1. Frequency Distribution of HIVRATE
2. Lower half of frequency Distribution of HIVRATE
3. Frequency Distribution of POLITYSCORE
4. Frequency Distribution of UrbanRate
5.Lower end of frequency Distribution of UrbanRate
6. Summary of Dataset
SUMMARY
1. The first image shows the frequency distribution of HIV RATE(Hiv affected per 100 person) shows that many countries have hiv rate less than 1 and second image shows the lower half of distribution where many countries have unique rates, highest being 25.9. There are 66 missing countries with hiv rates.
2.The third image shows frequency distribution of polity score(measure of a country's democratic and free nature. -10 to +10). The frequency is pretty distributed with more countries sharing positive polity score. there are 52 countries with missing scores.
3.The fourth image shows urban rate, with is kind of continuous variable. Frequency distribution shows 6 countries have urban rate as 100. 10 countries have missing values for urban rate.
4.The sixth image shows mean output of dataset, which shows min and max values of all variables, and count of missing values can be verified by N column.
0 notes
Text
A Closer Look at CO2 Emissions
Having looked at Gapminder dataset, which contains data from various countries, I have chosen it to ask some questions.
I can see different variables, which I recon will have a connection.
Climate change and global warming are more threat now than ever.
I am interested in determining whether per capita electricity and oil consumption (relectricperperson & oilperperson) along with urbanization rates, will have a impact on CO2 emissions (co2emissions), as CO2 is a major green house gas.
The result of analysis will help in getting some insights, whether a normal resident is a major contributor to CO2 emission.
The second question i have is whether CO2 emission , which contributes to 72% of green house gas, really affect Life expectancy (lifeexpectancy), as i feel we will start to take global warming seriously, only if it directly affects our lives.
Based on some research for article published in this topic, I have narrowed down few literatures.
World Energy Consumption and Carbon Dioxide Emissions:1950-2050 (https://dspace.mit.edu/bitstream/handle/1721.1/3642/MITJPSPGC_Rpt5.pdf)
Global climate change and health published in WHO(http://www.who.int/globalchange/publications/climatechangechap1.pdf)
Based on quick readings, I can see the relationship between energy consumption and co2 emissions
Also, co2 emission leading to climate change, comes with some health issues, which I strongly feel will have a effect on Life expectancy.
0 notes