#querytool | Explore Tumblr posts and blogs

digitalanalyticsandmarketing · 8 years ago

Text

Regression Modeling in Practice Assignment 1 - Writing About Your Data

1) Describe your Sample

A. Describe the study population (who or what was studied).

The sample data comes from the Gapminder data set. Gapminder is a Swedish non-profit agency that produces a comprehensive set of annual data by county that leverages multiple sources to get the largest most comprehensive values for each country. In total it covers 519 different socio and economic variables.

The data goes back hundreds of years for some of the variables. Gapminder uses various sources with very detailed documentation around the data, how it was sources and how it accounts for changes in added or removed countries over time.

Ultimately what Gapminder is trying to do is produce an unbiased highest quality set of data that represents the best actual unbiased value for the variable in question

B. Report the level of analysis studied (individual, group, or aggregate).

All the data is presented as a quantitative variable by country annually. For the research question, the most current data available, but making sure that we are using the same year of collection for all of our data. In our case this will be data from 2012.

C. Report the number of observations in the data set.

The number of observations in the data set is equal to the number of countries. Approximately n= 260 per variable. In some cases there will be missing data for a country on a given variable. Any country missing a variable observation will be removed from our analysis.

D. Describe your data analytic sample (the sample you are using for your analyses).

The variables being looked at are quantitative and represent a total value, or average for the country in question. Because the Gapminder data draws from multiple sources, the data is meant to be the best quantitative value for a country that is known for that year.

The data I am looking at from Gapminder is as follows:

Life expectancy (years)

CO2 emissions (tons per person)

Income per person (GDP/capita, PPP$ inflation-adjusted)

Government health spending per person (US$)

2) Describe the procedures that were used to collect the data

A. Report the study design that generated that data (for example: data reporting, surveys, observation, and experiment).

The gapminder dataset is an observational dataset. Data for each of the countries has been recorded with no hypothesis bias behind the data collection process.

B. Describe the original purpose of the data collection.

The original purpose of the Gapminder data is to provide a fact based view of the world. Gapminder collects and presents qualitative data annually for each country in the world on hundreds of variables.

C. Describe how the data were collected.

D. Report when the data were collected.

E. Report where the data were collected.

The above 3 questions will be answered collectively, as GapMinder will use a different method in collecting its data depending upon the data being collected. Each data variable that is consolidated by Gapminder is collected in its own unique way. Often the data is collected from a variety of sources – such as country government agencies, the UN and the World Health Organization. Each variable that is collected has an entire details document. Below I will summarize information about the data with links provided to the details documentation.

Life expectancy (years)

Various sources - often from the reporting countries statistics agency and World Health Data

The average number of years a newborn child would live if current mortality patterns were to stay the same. Data after 1990 comes from: Global Burden of Disease Study 2015. The original data was actually compiled from hundreds of sources.

Link to details of the data collection: http://ghdx.healthdata.org/gbd-results-tool?params=querytool-permalink/6a531d4f63c3c8cfcf2a157ec702d5f9

CO2 emissions (tons per person)

Collected by: CDIAC (Carbon Dioxide Information Analysis Center)

Collected Annually

Link to details of data collection: http://cdiac.ornl.gov/trends/emis/meth_reg.html

Income per person (GDP/capita, PPP$ inflation-adjusted)

Various Sources

Cross-country data for 2011 is mainly based on the 2011 round of the International Comparison Program. Estimates based on other sources were used for the other countries. Real growth rates were linked to the 2011 levels. Several sources are used for these growth rates, such as the data of Angus Maddison.

Link to details of data collection: http://www.gapminder.org/data/documentation/

Government health spending per person (US$)

World Health Organization Global Health Expenditure Database

Link to details of data collection: http://apps.who.int/nha/database

3) Describe your variables

A. Describe what your explanatory and response variables measured.

My relationship that I am looking at is Life Expectancy is dependent upon the following 3 variables: CO2 emissions (we expect higher emissions leads to lower life expectancy), GDP per person (Higher GDP or Income per person, leads to higher Life Expectancy) and Government Health Spending per capita (Higher Heath Spending leads to Higher Life Expectancy)

Life Expectancy (related to):

CO2 Emissions

GDP Per Capita

Health Spending

B. Describe the response scales for your explanatory and response variables.

The following scales are used for each of the data variables:

Life Expectancy - Measured in Years

CO2 Emissions – Measured in tons per capita

GDP Per Capita – Measured in US$ Per Capita index to 2011 PPP Levels

Health Spending - Per capita general government expenditure on health expressed at average exchange rate for that year in US dollars.

C. Describe how you managed your explanatory and response variables.

The management for the explanatory and response variables was to remove all countries that are missing any piece of data. I am also just using the data obtained from 2012 which is the current data year containing the most information by country for each of the variables.

0 notes

glennbergthings-blog · 8 years ago

Text

Assignment 1 - Introduction to Regression

1) The Sample

a. Describe the study population (who or what was studied).

The sample data comes from the Gapminder data set. Gapminder is a Swedish non profit agency that produces a comprehensive set of annual data by county that leverages multiple sources to get the largest most comprehensive values for each country. In total it covers 519 different socio and economic variables. The data goes back hundreds of years for some of the variables. Gapminder uses various sources with very detailed documentation around the data, how it was sources and how it accounts for changes in added or removed countries over time. Ultimately what Gapminder is trying to do is produce a unbiased highest quality set of data that represents the best actual unbiased value for the variable in question

b. Report the level of analysis studied (individual, group, or aggregate).

c. Report the number of observations in the data set.

d. Describe your data analytic sample (the sample you are using for your analyses).

o Life expectancy (years)

o CO2 emissions (tonnes per person)

o Income per person (GDP/capita, PPP$ inflation-adjusted)

o Government health spending per person (US$)

2) Report the study design that generated that data (for example: data reporting, surveys, observation, experiment).

a. Describe the original purpose of the data collection.

b. Describe how the data were collected.

c. Report when the data were collected.

d. Report where the data were collected.

Life expectancy (years)

Various sources - often from the reporting countries statistics agency and World Health Data

Link to details of the data collection: http://ghdx.healthdata.org/gbd-results-tool?params=querytool-permalink/6a531d4f63c3c8cfcf2a157ec702d5f9

CO2 emissions (tons per person)

Collected by: CDIAC (Carbon Dioxide Information Analysis Center)

Collected Annually

Link to details of data collection: http://cdiac.ornl.gov/trends/emis/meth_reg.html

Income per person (GDP/capita, PPP$ inflation-adjusted)

Various Sources

link to details of data collection: http://www.gapminder.org/data/documentation/

Government health spending per person (US$)

World Health Organization Global Health Expenditure Database

Link to details of data collection: http://apps.who.int/nha/database/PreDataExplorer.aspx?d=1

2) A measures section describing your variables and how you managed them to address your own research question.

a. Describe what your explanatory and response variables measured.

My relationship that I am looking at is Life Expectancy is dependent upon the following 3 variables: CO2 emissions (we expect Higher emissions leads to lower life expectancy), GDP per person (Higher GDP or Income per person, leads to higher Life Expectancy) and Government Health Spending per capita (Higher Heath Spending leads to Higher Life Expectancy)

Life Expectancy (related to) CO2 Emissions GDP Per Capita Health Spending

b. Describe the response scales for your explanatory and response variables.

The following scales are used for each of the data variables:

Life Expectancy - Measured in Years

CO2 Emissions – Measured in tons per capita

GDP Per Capita – Measured in US$ Per Capita index to 2011 PPP LEVELS

Health Spending - Per capita general government expenditure on health expressed at average exchange rate for that year in US dollars.

c. Describe how you managed your explanatory and response variables.

The management for the explanatory and response variables was to remove any countries that are missing any piece of data. I am also just using the data obtained from 2012 which is the mist current data year containing the most information by country for each of the variables

0 notes