deepanshu5258
deepanshu5258
Dataisbeautiful
3 posts
Don't wanna be here? Send us removal request.
deepanshu5258 · 8 years ago
Text
Data analysis and Interpretation, Assignment 3
I selected gapminder as my data set to work on.
For this week’s assignment I have utilised the ability to remove missing data, create new variables and group a bunch of data together.
My code is as follows:(Comments help explain the code better, see below)
Tumblr media Tumblr media Tumblr media
The above code should be self explanatory to my fellow sas learners.
Here are the results to the above code:
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
Now for the short explanation:
As the first table indicates, it shows the availability of GDP per capita data availability. We can see that 23 countries do not have GDP data available(shown 0 in table 1). Consequently, in the next table which shows the different income strata we can see that 23 frequencies are missing.Thus we have successfully removed missing data from our consideration and classified the remaining data according to our need.
Similarly for the internetuserate data. Only countries with valid data for internet use rate have been used in the class of countries into strata acc. to internet penetration rates.
FIN
Please leave a comment if you are working on gapminder and have some meaningful suggestion. Thanks for reviewing.
0 notes
deepanshu5258 · 8 years ago
Text
Data analysis and Interpretation, Assignment 2
I am working with the data from GapMinder and for those of you who aren’t using it, it is not like a survey form where the responses are an entry from a few pre-determined responses so the idea of analysing frequencies of various responses doesn’t really apply here. So to demonstrate the lessons I added 2 new columns to the data indicating whether that data was available(1) or not available(0) in the codebook for a particular country. The queries I made do not reveal any great information but clearly demo what was taught this past week.
Here’s what my personal codebook looks like:
Tumblr media
The variables internetYorN and YorN(next to incomeperperson) indicate whether for a particular country that variable has a data record or not. Only working with countries that have all the required data makes sense.
Since it was now a custom data book, I had to upload it to the cloud and then use it for my program and so I did.
At first I ran a simple freq proc program and ofcourse I ran into some errors:
Tumblr media
I used the wrong LIBNAME. I also used the wrong file path. I used the file path on my computer knowing all the while that this was all happening on the cloud and it made no sense, I quickly realised that after the error and corrected that.
And then it ran:
Tumblr media
No errors, no warnings :)
The result looked as follows:
Tumblr media
Here its visible that only 190/213(total) countries have recorded income per person data and therefore are relevant for further analysis.
Then I went on to add the IF statements to the code to subset my data because I was feeling super-programmery now you know.
Here’s what I coded in:
LIBNAME mydata"/courses/d1406ae5ba27fe300" access=readonly; proc import datafile="/home/deepanshu52580/my_courses/CodebookDeepanshu.csv" out=imported replace; DATA new; set imported; Label YorN="Is GDP per capita data available?" internetYorN="Is internet user rate data available?" employrateYorN="Is employment rate data available?"; if YorN=1; if internetYorN=1; proc sort; by country; proc freq; tables YorN internetYorN employrateYorN; run;
I ran into some errors again but i resolved them and results were obtained:
Tumblr media
Here you can see 3 tables. Basically what my subsetting did was it trimmed down the data to only the entries that had a response in the incomperperson column and then gave out the frequency tables for the countries that had internet user rate data present from among the entries that had the income data present. Further the countries out of the already made subset that also have the employment rate data are filtered(the third table).
So we have countries that have valid responses for both these variables i.e. internetuserrate, incomeperperson, employrate.
We can see that 190(out of a total of 213) entries have a response 1 for incomperperson data and then further 183 out of those 190 have a valid response(1) for internetuserrate and 7/190 do not. Further out of the 183, only 164 have valid data records for employment rate and 19 do not.
Therefore from the data of 213 countries, only 190 have valid income data, only 183 have both valid income and internetuser data and only 164 countries have all the 3 i.e income, internetuse, eployment rate data. This way filtering is happening leading to only those data that have valid responses present to the variables of interest to me.
FIN 
PS I do not post on GitHub like a hotshot because it seems like overkill and this way its easier for people to review as all the required info is in one screen. Please leave a comment if you prefer the GitHub way.
0 notes
deepanshu5258 · 8 years ago
Text
Data Analysis and Interpretation, Assignment 1
Area of interest:internetuserate After going through the data from GapMinder I have selected internet user rate as my area of interest. I believe that is a really good parameter and can provide relevant insights into the standards of living of people in different areas of the world.
Tumblr media
Additional areas of interest:incomepercapita, employrate, femaleemployrate As I am considering internet user rate as a parameter of the standard of living of populations, I think it is appropriate to explore the effects of internet on the income of the populations in different countries of the world. GDP increase infact would be affected by more people working than were before. Therefore I would also explore the effects of internet availability on the employment rates for females and also for the population in general. Therefore the questions i would want my data to answer would be:
1. How is internet use rate related to the trends followed by GDP and its growth? 2. How is internet use rate related to the employment rate for females and for the population in general?
I have prepared a code book relevant to my areas, a screen grab of which follows:
Tumblr media
Following research relevant to my research project turned up with a simple google search using search terms like GDP vs Internet penetration, Employment rate vs Internet users:
1.       A chart showing patterns b/w GDP and Inet-penetration:
http://visual.ly/internet-users-countries-1990-2011?view=true
2.       Current Inet users and penetration:
http://www.internetlivestats.com/internet-users-by-country/
3.       Pew research center report showing ‘correlations’ b/w income increase and increasing technology use:
http://www.pewglobal.org/2014/02/13/developing-technology-use/
4.       World bank(additional data for contemporary references):
http://data.worldbank.org/indicator/IT.NET.USER.P2?end=2015&start=2015&view=map
5.       Paper showing active correlation between Inet use and employment rate(Kellog):
https://www.kellogg.northwestern.edu/faculty/dranove/htm/dranove/coursepages/Mgmt%20469/stevenson041107.pdf
The research already carried out has looked at the effect of increasing GDP on internet penetration but the converse has not been explored as far as i could find. Therefore I would like to consider that question.
My hypotheses are as follows:
I believe that access to internet should ideally lead to a more educated and aware population and make them more hirable as they would possess a better set of soft skills and a better suited personality. The awareness created should particularly help women be able to get out and work.
0 notes