Don't wanna be here? Send us removal request.
Text
Logistic Regression Model
1. Summary of Results
I conducted a logistic regression analysis to test whether income level (binary: low vs high) and urbanization rate are associated with the odds of a country having high life expectancy (above the sample median). The overall model was statistically significant (LLR p = 2.17e-08), with a pseudo R-squared of 0.178, indicating that the model explains approximately 17.8% of the variability in the binary outcome.
2. Did the results support my hypothesis?
The hypothesis was that higher income would be significantly associated with higher odds of having high life expectancy. While the direction of the association was positive (OR > 1), the result was not statistically significant (p = 0.108). Therefore, the data did not provide sufficient evidence to support the hypothesis for income level. However, urbanization rate was significantly associated with the outcome and may play a stronger role in predicting life expectancy in this model.
3. Evidence of Confounding?
To explore confounding, I first ran a model with only income_binary, which gave an odds ratio of approximately OR = 3.38 (not shown here). After adding urbanrate to the model, the odds ratio for income dropped to OR = 2.61. This suggests that urbanization partially confounds the relationship between income and life expectancy — that is, part of the observed effect of income is shared with urbanization, which independently contributes to the outcome.
0 notes
Text
Multiple regression model
1. Summary of Multiple Regression Results
I conducted a multiple linear regression to test whether income level (binary: 0 = low, 1 = high) and urbanization rate predict life expectancy. The results were statistically significant:
F(2, N) = 56.61, p < 0.001
R-squared = 0.462, meaning that the model explains 46.2% of the variance in life expectancy across countries.
Regression Coefficients: PredictorCoefficient (B)p-value95% CIIntercept66.43< 0.001[64.91, 67.94]Income (binary)9.07< 0.001[5.02, 13.13]Urbanization (centered)0.169< 0.001[0.096, 0.242]
Interpretation:
Countries in the high-income group have, on average, 9.07 more years of life expectancy than low-income countries, controlling for urbanization.
For every 1% increase in urbanization rate, life expectancy increases by 0.17 years, holding income constant.
2. Did the results support my hypothesis?
Yes. The hypothesis that higher income is associated with greater life expectancy was supported. The relationship remained statistically significant (p < 0.001) even after adjusting for urbanization.
3. Evidence of Confounding?
Yes, there was some evidence of partial confounding. In a previous simple regression model using only income_binary, the coefficient was 15.11. After adding urbanization to the model, the coefficient for income dropped to 9.07.
This indicates that part of the effect of income on life expectancy is shared with urbanization — countries with higher income also tend to be more urbanized, and urbanization itself has an independent effect on life expectancy.
4. Diagnostic Plots Interpretation
a) Q-Q Plot
The Q-Q plot shows substantial deviation from the reference line, especially in the tails. This suggests the residuals are not perfectly normally distributed, with possible skewness or heavy tails.
b) Standardized Residuals
The histogram shows a distribution close to normal, but slightly right-skewed. Most residuals fall within the ±2 range, indicating reasonable model fit.
c) Leverage Plot (Cook’s Distance)
The leverage plot shows some influential observations, but none with excessive leverage or Cook’s distance > 1. Thus, no single country is unduly influencing the model.
Final Notes
Despite minor deviations from normality in residuals, the model assumptions are largely satisfied. The results are robust and support the idea that both income and urbanization are important predictors of life expectancy at the country level.
0 notes
Text
linear regression
I ran a simple linear regression to examine whether life expectancy differs between countries with Low income (coded as 0) and High income (coded as 1). The outcome variable was life expectancy, and the explanatory variable was a binary version of income category. The regression model was statistically significant (F(1, 66) = 80.04, p < 0.001), with an R-squared of 0.376, indicating that income category explains 37.6% of the variation in life expectancy among the countries in this sample. The regression coefficient for income_binary was 15.11 (p < 0.001), meaning that countries in the High income group have, on average, a 15.11-year longer life expectancy than those in the Low income group. The intercept was 65.18, representing the average life expectancy for countries in the Low income category.
0 notes
Text
Data Source, Sample, and Measures – Gapminder Analysis
1. Sample Description
The dataset used for this analysis comes from Gapminder, a global non-profit organization that collects and visualizes international statistics on development, health, economics, and population. This dataset includes country-level data from around the world, with each observation representing a country. Variables cover a wide range of social, economic, and health indicators.
For this analysis, I used a cleaned and updated version of the dataset, which includes variables such as:
incomeperperson: Gross National Income per capita (in US dollars)
lifeexpectancy: Average life expectancy at birth
urbanrate: Percentage of the population living in urban areas
The final sample consists of countries with valid data available for all selected variables (after removing rows with missing values).
2. Data Collection Procedure
Gapminder compiles its data from trusted international sources such as:
The World Bank
The World Health Organization (WHO)
The United Nations (UN)
Each variable is sourced from reputable databases and periodically updated to reflect the most recent data available.
3. Measures and Data Management
To address my research questions (such as how income and urbanization relate to life expectancy), I performed the following data management steps:
Converted variables to numeric format using pandas.to_numeric() to handle potential non-numeric entries.
Created new categorical variables:
income_category: Categorized countries into Low, Middle, and High income based on incomeperperson
lifeexp_group: Grouped countries into Low, Medium, High, and Very High life expectancy
urban_group: Grouped urbanization into 4 bins from Very Low to High
Mapped categorical variables to ordinal numbers to allow for correlation analysis:
income_ord: 1 to 3
lifeexp_ord: 1 to 4
Created a derived variable called development_level, combining income and life expectancy to classify countries as Developed, Developing, or Undeveloped.
Filtered the dataset to include only complete cases for the relevant variables (dropna()), ensuring consistent and interpretable analyses.
These steps allowed me to run statistical analyses such as ANOVA, Chi-Square tests, correlation coefficients, and moderation tests to explore relationships between development indicators across countries.
0 notes
Text
Potencial moderator
Testing a Moderator: Urbanization Level
To test whether urbanization level moderates the relationship between income and life expectancy, I calculated Pearson correlation coefficients between ordinal versions of income and life expectancy within each level of urbanization (Very Low, Low, Medium, High).
The results showed that:
In highly urbanized countries, the correlation was stronger (e.g., r = 0.65, p < 0.001)
In very low urbanized countries, the correlation was weaker or non-significant
This suggests that urbanization acts as a moderator: the relationship between income and health outcomes (life expectancy) is stronger in more urbanized environments. Factors like access to infrastructure, services, and healthcare may amplify the benefits of higher income in these contexts.
0 notes
Text
Correlation coefficient
Correlation Analysis Between Income Category and Life Expectancy Group
I calculated a Pearson correlation coefficient between two ordered categorical variables: income category (Low = 1, High = 3) and life expectancy group (Low = 1, Very High = 4).
The correlation coefficient was r = 0.61, with a p-value < 0.001, indicating a strong and statistically significant positive association between the two variables.
The R-squared value was approximately 0.55, meaning that 55% of the variability in life expectancy group can be explained by a country's income level.
These findings support the idea that higher income levels are closely associated with better health outcomes, as reflected in life expectancy.
0 notes
Text
HW2 chi square test
Chi-Square Test of Independence with Bonferroni Adjustment
I conducted a Chi-Square Test of Independence to examine the relationship between income category (Low, Middle, High) and life expectancy group (Low, Medium, High, Very High). The test was statistically significant, with a p-value of less than 0.0001, indicating a strong association between the two variables.
To explore which group combinations were driving the significance, I computed standardized residuals and applied a Bonferroni correction to adjust for multiple comparisons across the 12 cells in the contingency table.
The results revealed that:
High income countries were significantly overrepresented in the Very High life expectancy group.
Low income countries were overrepresented in the Low and Medium life expectancy groups.
Middle income countries were more balanced but slightly underrepresented in the extremes.
These findings confirm that income level is not only related to life expectancy, but specific patterns emerge when analyzing subgroup combinations. Bonferroni-adjusted p-values ensured that our conclusions are robust against multiple comparisons.
0 notes
Text
Course Data Analysis Tools HW 1
I conducted a one-way ANOVA to test whether life expectancy differs significantly across income categories (Low, Middle, High). The ANOVA results showed a statistically significant effect of income category on life expectancy (p < 0.05).
A follow-up Tukey HSD test indicated that all pairwise comparisons between income groups were significant. Specifically, countries in the High income group had a significantly higher life expectancy than those in the Middle and Low income groups, and the Middle income group had higher life expectancy than the Low income group.
This suggests a clear and strong relationship between income level and life expectancy, consistent with broader development patterns.
0 notes
Text
HW 4 creating graphics
income per person graph
urban rate dist graph
life expectancy graph
income vs life expectancy graph
I created univariate plots to explore the distribution of my three key variables: income per person, urbanization rate, and life expectancy.
The income distribution is right-skewed, with most countries earning less than $5,000 per person and only a few earning over $20,000. The urban rate shows a more uniform distribution, while life expectancy is concentrated between 60 and 80 years.
The bivariate scatter plot between income per person and life expectancy shows a clear positive correlation: countries with higher incomes tend to have longer life expectancy. Furthermore, when we color-code by development level, we see that developed countries cluster in the top-right quadrant, while undeveloped ones remain in the lower-left.
These visualizations help confirm that economic wealth is a strong predictor of health and development outcomes.
0 notes
Text
HW3 data managment and new secondary vaiable
After managing the dataset, I created three new variables: income_category, urban_group, and development_level. These allow for easier interpretation and categorization of complex continuous variables.
The Income Category variable shows that the majority of countries fall into the Low Income group, with a smaller proportion classified as High Income.
For Urban Rate, most countries fall between the Medium and High categories, showing relatively strong urbanization trends across the dataset.
The Life Expectancy variable has a more balanced distribution, with many countries falling in the Medium and High categories, and fewer in the Very High range.
I also created a new Development Level variable combining income and life expectancy. Most countries are classified as Developing, with only a few categorized as Developed or Undeveloped.
Missing data was handled by dropping rows with NaN values in the selected variables. This step ensured that all frequency distributions were based on clean, complete records.
1 note
·
View note
Text
frequency distribution og incomeperperson, urban rate and life expentacy.
I selected three key variables from the Gapminder dataset: income per person, urbanization rate, and life expectancy. To better understand their distributions, I grouped each variable into five equal-width bins and calculated their frequency distributions.
The income per person variable shows that most countries fall into the lowest income brackets, with a majority of them earning between $295 and $4,520 per person. Only a few countries fall into the highest income group (above $17,200), which reflects global income inequality.
For the urbanization rate, the data is more evenly distributed across the five categories. This suggests a relatively balanced spread between countries with low, medium, and high levels of urbanization.
The life expectancy variable shows a concentration of countries in the mid-to-high range, indicating that many countries now have relatively long life expectancies, although some still fall behind.
Before the analysis, all non-numeric or missing values were converted to NaN and removed using the .dropna() function. This step ensured that the frequency distributions were based on complete, valid observations only. A small number of rows were excluded due to missing values.
0 notes
Text
the correlation between income per person and the urban rate GAPMINDER
1.- I`ve decided to research the correlation between the income per person and the urban rate
hypothesis: income per person directly affects the urban ratio.
Reference: Bishaw, A., & Posey, K. G. (2016, 8 de diciembre). A comparison of rural and urban America: Household income and poverty. U.S. Census Bureau – Random Samplings Blog. https://www.census.gov/newsroom/blogs/random-samplings/2016/12/a_comparison_of_rura.html
Reference: Zhong, S., Wang, M., Zhu, Y., Chen, Z., & Huang, X. (2022). Urban expansion and the urban–rural income gap: Empirical evidence from China. Cities, 129, Article 103831. https://doi.org/10.1016/j.cities.2022.103831 https://www.sciencedirect.com/science/article/pii/S0264275122002700
Summary
Bishaw & Posey (2016) – U.S. Census Bureau
This official blog post from the U.S. Census Bureau provides a comparison between urban and rural areas in the United States, focusing on household income and poverty levels. Using data from the American Community Survey (ACS), the authors show that in 2015, rural households had significantly lower median incomes and higher poverty rates than urban households. The post highlights the persistent economic gap between the two areas despite overall economic growth. It also considers factors such as demographic composition, access to services, and labor market conditions, emphasizing the structural differences that influence economic well-being across rural and urban populations.
Zhong et al. (2022) – Cities (ScienceDirect)
This peer-reviewed study analyzes how urban expansion in China affects the income gap between urban and rural regions. Using econometric methods on provincial-level data from 2000 to 2017, the authors find that urban expansion can both narrow and widen the income gap, depending on the mechanisms involved. When urban growth promotes rural labor integration and enhances physical and economic connectivity, inequality tends to decrease. However, if urban development is driven by real estate speculation or excludes rural populations, it can exacerbate inequality. The article offers a nuanced view of urban development, stressing that the distributive impact largely depends on the policy frameworks guiding expansion.
Overall Comparison
Both sources address urban–rural inequality but from different national contexts (the U.S. and China) and methodological approaches: descriptive statistical analysis versus econometric modeling. They agree that rural areas face structural disadvantages in terms of income and poverty. However, they diverge in their exploration of underlying causes. Zhong et al. (2022) delve into the policy-driven nature of urban development and its effects on inequality, while Bishaw and Posey (2016) focus more on presenting disparities without a deep analysis of causal mechanisms.
0 notes