roydatamanagement
roydatamanagement
Roy's Data Management Blog
3 posts
Don't wanna be here? Send us removal request.
roydatamanagement · 5 years ago
Text
Making Data Management Decisions - Data Management and Visualization Week 3 - Wesleyan @ Coursera
STEP 1: Make and implement data management decisions for the variables you selected.
Data management includes such things as coding out missing data, coding in valid data, recoding variables, creating secondary variables and binning or grouping variables. Not everyone does all of these, but some is required.
I focused on the three variables from the Gapminder data set that I’ve inferred have the closest relationship to my initial question at the beginning of the program. My question is “How is access the broadband network directly related to employment?’’
The variables I am taking in consideration are:
Employment rate
Internet use rate
Average income per capita
Many countries in the Gapminder that set have missing values in the variables that I’m focusing on, by creating my subset I make sure that I can select the countries that give me the necessary information to work with. After this step I confirm this by converting any missing values into a NaN value.
Tumblr media
STEP 2: Run frequency distributions for your chosen variables and select columns, and possibly rows.
Your output should be interpretable (i.e. organized and labeled).
I take my subset and make a copy of it, this time converting the selected value into groups of 5, giving me a clearer view of what countries fall into the same threshold, giving me a broader view of the correlation between countries in certain bins.
Tumblr media Tumblr media
My program is available on the following link:
https://www.dropbox.com/s/p6snxrassxo8jcd/assignment_week3_roy.py?dl=0
0 notes
roydatamanagement · 5 years ago
Text
Running your first program - Data Management & Visualization Course - Wesleyan University on Coursera
Research Topic: Is there an association between internet use rate per person and employment rate?
The assignment for this week asked me to run my first program with Python. I ran the week 2 assignment on the data set I had chosen already — GapMinder.csv.
First I converted the variables Im working with (‘incomeperperson’, ‘employrate’ & ‘internetuserate’) to floats.
I then applied the frequency and percentage counts function on the three variables. Due to the fact that each variable had a unique result I gathered the data in 5 bins, grouping the information in a clearer way.
Tumblr media
Then I created a segment from the list, only showing results where internet use rate is greater or equal than 60%, average income per person is 6,000 USD per month, and internet use rate is greater or equal than 60%.
Tumblr media
The code can be reviewed on the following link:
https://www.dropbox.com/s/1ggfqjtnnyc5b4e/firstexample_week2_backup.py?dl=0
Reviewing the data I gathered and segmented we can draw the following conclusions:
The majority of the countries on the Gapminder data set does not have a broad access to the internet (34% of the countries have a 0 to 19% active internet user rate), 76% of the countries in the data have a GDP in the lowest bracket of the bins we created, and the majority of the countries in our data set have a consequent issue with their employment rate (33% of the countries have an employment rate ranging from 52% to 62%.
0 notes
roydatamanagement · 5 years ago
Text
Getting Your Research Project Started
STEP 1: Choose a data set that you would like to work with.
I chose the GapMinder data set to develop my hypothesis.
First question
How is internet use and access related to employment rate among the population?
STEP 2. Identify a specific topic of interest
TOPICS OF INTEREST:
Labor Substance consumption Employment Social Security  Internet use rate Social media use Mental health
STEP 3. Prepare a codebook of your own (i.e., print individual pages or copy screen and paste into a new document) from the larger codebook that includes the questions/items/variables that measure your selected topics.)
GapMinder Codebook
incomeperperson
2010 Gross Domestic Product per capita in constant 2000 US$. The World Bank Work Development inflation but not the differences in the cost of living between countries Indicatorshas been taken into account.
alcconsumption
2008 alcohol consumption per adult (age 15+), litres Recorded and estimated average alcohol consumption, adult (15+) per capita consumption in litres pure alcohol
femaleemployrate
2007 female employees age 15+ (% of population) Percentage of female population, age above 15, that has been employed during the given year.
employrate
2007 total employees age 15+ (% of population) International Labour Percentage of total population, age above 15, that has been employed Organization during the given year.
lifeexpectancy
2011 life expectancy at birth (years)
The average number of years a newborn child would live if current mortality patterns were to stay the same.
suicideper100TH
2005 Suicide, age adjusted, per 100 000. Mortality due to self-inflicted injury, per 100 000 standard population, age adjusted
urbanrate
2008 urban population (% of total) World Bank Urban population refers to people living in urban areas as defined by national statistical offices (calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects)
Internetuserate
2010 Internet users (per 100 people) World Bank Internet users are people with access to the worldwide network.
STEP 4. Identify a second topic that you would like to explore in terms of its association with your original topic.
STEP 5. Add questions/items/variables documenting this second topic to your personal codebook.
Second questions/topics/variables
How does internet access and technological knowledge improve an individual’s chance at getting a job?
Social Capital: Broadly defined, social capital is the resources,imagined and physically realized, gained by building and maintaining relationships with others.
Internet Job Search: Nearly 80% of people use the internet to help get a new job, and web search is one primary method to finding new employment online.
Unemployment and Labor Force: Measuring the labor force and unemployment rate are major functions of the BLS. Formally defined, the labor force is the percent of the population working or looking for work; the unemployment rate is the percent of the labor force not currently working
STEP 6. Perform a literature review to see what research has been previously done on this topic. Use sites such as Google Scholar (http://scholar.google.com) to search for published academic work in the area(s) of interest. Try to find multiple sources, and take note of basic bibliographic information.
STEP 7. Based on your literature review, develop a hypothesis about what you believe the association might be between these topics. Be sure to integrate the specific variables you selected into the hypothesis.
According to various of the articles that I reviewed for this assignment, policies that brought high-speed broadband to communities that did not have access to it have impacted in a positive way the chances of getting out of unemployment. Internet access has become a priority necessity in many parts of the world, making it a must for many basic needs and protocols regarding education, healthcare, security and employment (specially talking about job applications and job hunting). Taking this data in consideration, my hypothesis would be that the GapMinder countries with the highest internet usage rate will be the countries with a lower unemployment rate. This assumption could also be corelationed with the urban rate of each country, proving that areas with more dense population and better broadband access have a per capita lower unemployment and job search rate, compared to smaller cities and towns.
Bibliography
Aaron Smith. 2015. Searching for work in the digital era. Pew Research Center: Internet, Science & Tech., Article published on 19 (2015).
Beard, T. R., Ford, G. S., Saba, R. P., & Seals Jr, R. A. (2012). Internet use and job search. Telecommunications Policy, 36(4), 260-273.
Chancellor, S., & Counts, S. (2018). Measuring Employment Demand Using Internet Search Data. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18. doi:10.1145/3173574.3173696
Hjort, J., & Poulsen, J. (2019). The arrival of fast internet and employment in Africa. American Economic Review, 109(3), 1032-79.
Lapointe, P. (2015). Does speed matter? The employment impact of increasing access to fiber Internet. Journal of the Washington Academy of Sciences, 101(1), 9-28.
Peter Kuhn and Hani Mansour. 2014. Is Internet job search still ineffective? The Economic Journal 124, 581 (2014), 1213–1233.
0 notes