Tumgik
#IncomeLevels
usnewsper-politics · 8 months
Text
Trump Dominates Republican Field in Iowa: Poll Reveals Strong Support for President's Promises and Performance #agegroups #approvalrating #BillWeld #campaignpromises #demographics #economy #educationalbackgrounds #financialstatus #frontrunner #healthcare #immigration #incomelevels #Iowacaucuses #jobs #Republicanfield #Republicanpresidentialnomination. #trump
0 notes
rankrancherpro · 1 year
Text
A Quick Look into the Cost of Living in Loudoun County, VA
Tumblr media
Loudoun County, located in the picturesque state of Virginia, is a sought-after destination for residents seeking a high quality of life. As with any region, it's crucial to clearly understand the cost of living before deciding to relocate or settle down in a particular area. This comprehensive guide will explore the cost of living in Loudoun County, VA, covering various aspects such as housing, transportation, healthcare, education, and more. By the end of this article, you'll thoroughly understand the expenses associated with residing in this beautiful county.
Cost of Living in Loudoun County, VA
Loudoun County, Virginia, offers a diverse and vibrant community nestled within stunning natural landscapes and conveniently located near the nation's capital. Before delving into the specifics, let's explore the overall cost of living in Loudoun County, VA, to set the stage for a detailed examination of each expense category.
Housing Costs in Loudoun County, VA
The cost of housing is a significant factor when considering the overall cost of living in any area. In Loudoun County, VA, the real estate market is competitive, with various options available to suit different budgets and preferences.
Average Home Prices in Loudoun County, VA
The average home prices in Loudoun County, VA, vary depending on the specific location within the county. According to recent data, the median home price in the county is around $600,000. However, prices can range from the mid-$400,000s for townhomes to multi-million dollar luxury estates.
Renting in Loudoun County, VA
Loudoun County offers a range of options for those who prefer to rent, from apartments to single-family homes. Rental prices typically vary based on location, size, and amenities. On average, you can pay around $2,000 monthly for a two-bedroom apartment in Loudoun County.
Cost of Utilities
When budgeting for housing expenses, it's essential to consider utility costs. These include electricity, water, gas, and other necessary services. On average, residents in Loudoun County can expect to pay around $150-$200 per month for utilities, depending on usage and the property size.
Transportation Costs in Loudoun County, VA
Transportation expenses encompass various factors such as commuting, fuel costs, and public transportation options. Understanding these costs is essential for individuals living or planning to move to Loudoun County, VA.
Commuting Costs
Loudoun County benefits from its proximity to major cities like Washington, D.C., which opens up numerous employment opportunities. However, commuting expenses can vary depending on the distance traveled and the mode of transportation used. For example, if you commute to D.C. via car, you may need to budget for tolls, parking fees, and fuel costs.
Public Transportation
Loudoun County offers several options for individuals who prefer public transportation, including buses and commuter trains. The Washington Metropolitan Area Transit Authority (WMATA) operates bus services throughout the county, providing an affordable and convenient alternative to driving. Monthly passes for bus services in Loudoun County generally range from $100 to $200, depending on the specific routes and zones.
Healthcare Costs in Loudoun County, VA
Access to quality healthcare is critical when assessing the cost of living in any area. Loudoun County boasts excellent medical facilities and a range of healthcare services.
Healthcare Facilities in Loudoun County
Loudoun County is home to reputable hospitals, medical centers, and clinics that offer comprehensive healthcare services. These include the Inova Loudoun Hospital, StoneSprings Hospital Center, and various specialized medical practices. The county's healthcare system ensures residents have access to quality care for routine check-ups, emergencies, and specialized treatments.
Health Insurance Costs
To handle healthcare expenses efficiently, evaluating available health insurance alternatives is crucial. The cost of health insurance in Loudoun County, VA, can vary depending on coverage level, individual or family plans, and employer contributions. Exploring different health insurance providers and plans is advisable to find one that suits your specific needs and budget.
Education Costs in Loudoun County, VA
For families with children, education expenses are a crucial factor to consider when evaluating the cost of living in an area. Loudoun County, VA, boasts a reputable education system, offering a range of public and private schooling options.
Public Schools in Loudoun County
Loudoun County Public Schools (LCPS) are known for their excellence in education. The county has a robust network of elementary, middle, and high schools that provide quality education to its students. Public schooling in Loudoun County is funded through property taxes, eliminating the need for tuition fees.
Private Schools in Loudoun County
In addition to public schools, Loudoun County is home to several private educational institutions. Private schools often offer specialized curricula, smaller classes, and unique educational approaches. However, it's important to note that private schools typically involve tuition costs, which can vary significantly depending on the institution and grade level.
Entertainment and Lifestyle Costs in Loudoun County, VA
Living in Loudoun County offers a wide range of recreational and lifestyle opportunities. It's important to consider entertainment costs when assessing the overall cost of living in the area.
Dining and Entertainment
Loudoun County boasts a vibrant culinary scene, with numerous restaurants, cafes, and bars offering diverse cuisines and dining experiences. The cost of dining out can vary depending on the establishment and the type of cuisine. On average, a meal for two at a mid-range restaurant in Loudoun County can cost between $50 and $100, excluding beverages.
Recreation and Outdoor Activities
Loudoun County is renowned for its beautiful parks, nature trails, and outdoor recreational opportunities. Residents can enjoy hiking, biking, fishing, and exploring the county's historical landmarks. Many recreational activities are low-cost or free, providing ample opportunities for affordable entertainment and enjoyment.
FAQs
Q: What is the overall cost of living in Loudoun County, VA?
The overall cost of living in Loudoun County, VA, includes housing, transportation, healthcare, education, and lifestyle expenses. It's critical to assess each category to understand the costs associated with residing in the county.
Q: How much does housing cost in Loudoun County, VA?
Housing costs in Loudoun County, VA, vary depending on location, property type, and size. The median home price in the county is approximately $600,000, while rental fees for a two-bedroom apartment average around $2,000 monthly.
Q: What are the transportation options in Loudoun County, VA?
Loudoun County offers various transportation options, including private vehicles, public buses, and commuter trains. Commuting costs can vary based on the distance traveled and the mode of transportation used.
Q: What healthcare facilities are available in Loudoun County, VA?
Loudoun County is home to reputable hospitals, medical centers, and clinics. The county's healthcare system ensures that residents have
For more detailed insight about the cost of living in Loudoun County, please visit this page: https://juanitatool.com/loudon-county-cost-of-living/
0 notes
cjdathw4 · 3 years
Text
Homework, Week 4
This homework research is to test potential moderator in the context of ANOVA, Chi-Square, and Pearson Coefficient Test, respectively.  In each of these tests, the basic question is, is the explanatory variable associated with the response variable for each population subgroup, or each level of our third variable (i.e. moderator)?
The Moderation on ANOVA Test:
Our initial ANOVA test is to  test differences in the mean number of cigarettes smoked among young adults of 5 different levels of health conditions without taking into the consideration the moderator, Nicotine Dependence.
The data for this study is from  U.S. National Epidemiological Survey on Alcohol and Related Conditions (NESARC). The response variable “NUMCIGMO_EST” is the number cigarettes smoked in a month based on the young adults age 18 to 25 who have smoked in the past 12 month.  The explanatory variable in the test is column “S1Q16”, which is described as “SELF-PERCEIVED CURRENT HEALTH”, with values from 1 (excellent) to 5 (poor).   We run ANOVA using OLS model to calculate F-statistic and p-value and here is the output from the OLS output:
Tumblr media
With F-statistic is 5.001 and p-value is 0,000567,  I can safely reject the null hypothesis and conclude that there is an association between the number of cigarettes smoked and the health conditions.
Now, let’s consider the number of cigarettes smoked and the levels of the health conditions associated for those having nicotine dependence, and for those not having nicotine dependence, here we are going to run two separate ANOVA's, one for each of the two values of nicotine dependence.  The following code snippet is to separate the original data to two population subgroups:
Tumblr media
We run ANOVA test against two data sets, sub2_wonicdep & sub2_wnicdep, using OLS model, and get the following results:
* sub2_wonicdep (subgroup without nicotine dependence)
Tumblr media
For the sample population without nicotine dependence, p-value is 0.380, so we cannot reject the null hypothesis, meaning there is no association between cigarettes smoked and the health conditions.  This is reversed from the result of our initial ANOVA test.
* sub2_wnicdep (subgroup with nicotine dependence)
Tumblr media
For the sample population with nicotine dependence, p-value is 0.0369, so we can reject the null hypothesis, meaning there is an association between cigarettes smoked and the health conditions.  The result is the same as that of our initial ANOVA test.
Conclusion:
Comparing the ANOVA results of the two population subgroups indicates that there is a significant statistical interaction between the variables number of cigarettes smoked and health conditions and the nicotine dependence status, the third variable moderates the association between the first two variables.
The Moderation on Chi-SquareTest:
Our Chi-Square Test is  how much the family income level for working adults age 18 to 66 related to social phobia. The data for this study is from  U.S. National Epidemiological Survey on Alcohol and Related Conditions (NESARC) in nesarc.csv. Social Phobia (SOCPDX12) is used as our categorical response variable for working adults with and without social phobia, and "S1Q11B" is the total family income with 21 bins, we recode the “S1Q11B” values to fall within 3 bins into “INCOMELEVEL” as our categorical explanatory variable.
The code statement:
Tumblr media
generates the observed contingency table of  “ SOCPDX12 “  and “INCOMELEVEL”.  Here is the contingency table:
INCOMELEVEL     1         2        3 SOCPDX12                          0.0                  4809   5405  1604      1.0                    196     155      56
The table of column percentage of social phobia for different income groups are calculated as:
INCOMELEVEL         1                 2                3 SOCPDX12                                   0.0                      0.960839    0.972122   0.966265   1.0                      0.039161    0.027878   0.033735
To help the comparison with the Chi-Square Tests of the two population subgroups, we create the bar chart for the percentage table of the test:
Tumblr media
From this figure, we see the lowest social phobia rate happens in the group of the middle income.  
Running Chi-Square Test with the following line:
Here shows the result of  the Chi-Square Test for the population variables without moderation: 
chi-square value, p value, expected counts (without moderator): (10.430752023058572, 0.005432390450963244, 2, array([[4838.37137014, 5374.89406953, 1604.73456033],  [ 166.62862986,  185.10593047,   55.26543967]]))
The p-value is 0.0054 (less than 0.05), the null hypothesis is rejected, that suggests there is an association between the Social Phobia Rate and the Income Level.
Now, we are interested in finding whether the depression status statistically moderates the relationship between the social phobia rate and the income level, so that we are going to run two Chi-Square tests against two population subgroups, one with depression status, the one without depression..  The following code snippet is to separate the original data to two population subgroups of two different depression statuses:
Tumblr media
* Subgroup sub1_WithoutDespression:
The Chi-Square test for the sample without depression is performed as indicated below:
Tumblr media
The column percentage of the social phobia rate among the different income levels and the line char for this category are:
Tumblr media Tumblr media
And here is the result of the Chi-Square Test:
chi-square value, p value, expected counts (without depression): (2.175664280282396, 0.33694615456278704, 2, array([[3760.80646833, 4389.08468954, 1350.10884213],       [  58.19353167,   67.91531046,   20.89115787]]))
The p-value is 0,337 (greater than 0.05), so that the null hypothesis can not be rejected, suggesting there is no association between the social phobia rate and the income level for the subgroup without depression, and the interaction happens when we have the moderator “depression status” in place.  This result is reversed from what we have for the population without considering the moderator.
* Subgroup sub1_WithDespression:
The Chi-Square test for the sample with depression is performed as indicated below:
Tumblr media
The column percentage of the social phobia rate among the different income levels and the line char for this category are:
Tumblr media Tumblr media
And here is the result of the Chi-Square Test:
chi-square value, p value, expected counts (with depression): (9.606839887834782, 0.008201649848468563, 2, array([[1066.3878976 ,  991.7587277 ,  259.85337471],  [ 119.6121024 ,  111.2412723 ,   29.14662529]]))
The p-value is 0.0082 (smaller than 0.05), so that the null hypothesis is rejected, this test is the same as the the population without considering the moderator.  
Conclusion:
Comparing the line charts of two subgroups indicates a diagnosis of no major depression does moderate the relationship between social phobia rate and income level.  
The Moderation on Pearson Coefficient Test:
For Pearson Coefficient Tests here, we research the relationship between income and life expectancy among the countries in the world, using the data from GapMinder, a non-­‐profit venture promoting sustainable global development and achievement.  The source data is GapMinder-published “gapminder.csv”.  We use  ‘incomeperperson’ and ‘liefexpectancy’ columns as our two quantitative variables.
Before performing Pearson Correlation Test, we create the scatter plot of the two quantitative variables  ‘incomeperperson’ and ‘liefexpectancy’:
Tumblr media
From looking at the scatter plots, we can guess the associations are positive. That is, a higher life expectancy is associated with higher income.
Next, let us find the correlation coefficients.  We  request a Pearson correlation measuring the association between  ‘incomeperperson’ and ‘liefexpectancy’:
Tumblr media
Here is the result for the two quantitative variables;
Tumblr media
The result shows that this is a significant association with a correlation of 0.61.  But might this relationship, this correlation between income and life expectancy differs based on countries with different polities?   To explore this question, we create a third variable that classifies a country as one of the three categorial values: Autocracy, Anocracy, and Democracy, the third variable “politygroup” is derived from column “polityscore”, the following code performs the classification:
Tumblr media
Here are three subgroups of three politygroup(s):
Tumblr media
Running pearsonr() function against the three subgroups with “incomeperperson” and “lifeexpectancy” as x, y parameters, we have the following Person Coefficients for each subgroup:
Tumblr media
The democracy group has the Pearson Coefficient as significant as the original population’s.  The other two groups show weaker Person Coefficient in no such significant sense (higher p-value).  
Python script for the homework 4:
# -*- coding: utf-8 -*- """ Created on Sat Mar 27 23:02:32 2021
@author: Chris """
import numpy import pandas import seaborn import scipy import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi     # to conduct post hoc paired comparisons in the context of my ANOVA import matplotlib.pyplot as plt
'''
Synopsis:
The standard way of asking this question in the context of analysis of variance, is to move to the use of a two way or two factor analysis of variance, rather than the one-way or one-factor ANOVA that we've been using. Instead, we're actually going to be asking the question, is our explanatory variable associated with our response variable for each population subgroup, or each level of our third variable?
there is a significant statistical interaction between the variables diet and weight loss and the type of exercise, our third variable moderates the association between diet and weight loss.
data columns:
   S1Q16:          SELF-PERCEIVED CURRENT HEALTH    TAB12MDX:       NICOTINE DEPENDENCE IN THE LAST 12 MONTHS    CHECK321:       CIGARETTE SMOKING STATUS (smoke in the past  12 month)
'''
''' =====    ANOVA    ===== '''
data = pandas.read_csv('../Week 1/nesarc.csv', low_memory=False)
data['S3AQ3B1'] = pandas.to_numeric(data['S3AQ3B1'], errors="coerce")           # how often data['S3AQ3C1'] = pandas.to_numeric(data['S3AQ3C1'], errors="coerce")           # how many cigarettes data['CHECK321'] = pandas.to_numeric(data['CHECK321'], errors="coerce")         # within past 12 month?
#subset data to young adults age 18 to 25 who have smoked in the past 12 months sub1=data[(data['AGE']>=18) & (data['AGE']<=25) & (data['CHECK321']==1)]
#SETTING MISSING DATA sub1['S3AQ3B1']=sub1['S3AQ3B1'].replace(9, numpy.nan) sub1['S3AQ3C1']=sub1['S3AQ3C1'].replace(99, numpy.nan)
#recoding number of days smoked in the past month recode1 = {1: 30, 2: 22, 3: 14, 4: 5, 5: 2.5, 6: 1} sub1['USFREQMO']= sub1['S3AQ3B1'].map(recode1)                                  # *** converted smoking-days/month
#converting new variable USFREQMMO to numeric sub1['USFREQMO']= pandas.to_numeric(sub1['USFREQMO'], errors="coerce")
# Creating a secondary variable multiplying the days smoked/month and the number of cig/per day # This creates a new quantitative variable that measures the number of cigarettes smoked in the past month sub1['NUMCIGMO_EST']=sub1['USFREQMO'] * sub1['S3AQ3C1']                         # *** total cig/per month
sub1['NUMCIGMO_EST']= pandas.to_numeric(sub1['NUMCIGMO_EST'], errors="coerce")
sub1['S1Q16']=sub1['S1Q16'].replace(9, numpy.nan)
# NUMCIGMO_EST - response variable # S1Q16 - explanatory varialbe   # TAB12MDX - moderator sub2 = sub1[['NUMCIGMO_EST', 'S1Q16', 'TAB12MDX']].dropna()              
sub2_wonicdep = sub2[sub2['TAB12MDX']==0]               sub2_wnicdep = sub2[sub2['TAB12MDX']==1]  
'''     ANOVA:  are the mean cigarettes smoked among different health groups             equal?   (without moderator)
'''
''' By running an analysis of variance, we're asking whether the number of cigarettes smoked differs for different groups of health condition ''' model2 = smf.ols(formula='NUMCIGMO_EST ~ C(S1Q16)', data=sub2).fit() print (model2.summary())
print ('means for numcigmo_est by various health groups (no moderator)') m2= sub2.groupby('S1Q16').mean() print (m2['NUMCIGMO_EST'])
print ('std for numcigmo_est by various health groups (no moderator)') sd2 = sub2.groupby('S1Q16').std() print (sd2.NUMCIGMO_EST)
'''     ANOVA:  are the mean cigarettes smoked among different health groups             equal?   (with moderator => Nicotine Dependence: yes)
'''
model3 = smf.ols(formula='NUMCIGMO_EST ~ C(S1Q16)', data=sub2_wnicdep).fit() print (model3.summary())
print ('means for numcigmo_est by vaious health groups (nicotine dependence)') m3= sub2_wnicdep.groupby('S1Q16').mean() print (m3['NUMCIGMO_EST'])
print ('std for numcigmo_est by various health groups (nicotine dependence)') sd3 = sub2_wnicdep.groupby('S1Q16').std() print (sd3['NUMCIGMO_EST'])
'''     ANOVA:  are the mean cigarettes smoked among different health groups             equal?   (with moderator => Nicotine Dependence: no )
''' model4 = smf.ols(formula='NUMCIGMO_EST ~ C(S1Q16)', data=sub2_wonicdep).fit() print (model4.summary())
print ('means for numcigmo_est by various health groups (no nicotine dependence)') m4= sub2_wonicdep.groupby('S1Q16').mean() print (m4['NUMCIGMO_EST'])
print ('std for numcigmo_est by various health groups (no nicotine dependence)') sd4 = sub2_wonicdep.groupby('S1Q16').std() print (sd4['NUMCIGMO_EST'])
''' ====================================       Chi-Square Test with a moderator    ==================================== '''
data = pandas.read_csv('../Week 1/nesarc.csv', low_memory=False) data['CHECK321'] = pandas.to_numeric(data['CHECK321'], errors='coerce')
'''    Income level vs. Social Phobia, without moderator '''
sub1 = data[ (data['AGE'] >= 18) &  (data['AGE'] <= 66)] sub1['INCOMELEVEL']= sub1['S1Q11B'].apply(lambda x:  (x <= 8  and 1) or                                            (x <= 14 and 2) or 3)
colpct1 = pandas.crosstab(sub1['SOCPDX12'],                          sub1['INCOMELEVEL'], normalize='columns') print('\ncolumn percentage (without moderator):\n', colpct1)
plt.figure() seaborn.factorplot('INCOMELEVEL', 'SOCPDX12',                   data=sub1, kind='bar',  ci=None)
#seaborn.factorplot(x="USQUAN", y="TAB12MDX", data=sub2, kind="bar", ci=None) plt.xlabel ('Working Adult Income Level') plt.ylabel('Social Phobia Rate') plt.show()
ct1 = pandas.crosstab(sub1['SOCPDX12'], sub1['INCOMELEVEL']) print('\ncontingency table (without moderator):\n', ct1)
cs1= scipy.stats.chi2_contingency(ct1)   # ct1's index: response, columns: explanatory print('\nchi-square value, p value, expected counts (without moderator):\n',cs1)
'''      split data to two groups: with deression and without depression '''
sub1_WithDepression = sub1[sub1['MAJORDEPLIFE']==1] sub1_WithoutDepression = sub1[sub1['MAJORDEPLIFE']==0]
'''    Income level vs. Social Phobia, moderator: with depression '''
colpct_wdep = pandas.crosstab(sub1_WithDepression['SOCPDX12'],                              sub1_WithDepression['INCOMELEVEL'],                              normalize='columns') print('\ncolumn percentage (with depression):\n', colpct_wdep)
ct_wdep = pandas.crosstab(sub1_WithDepression['SOCPDX12'],                          sub1_WithDepression['INCOMELEVEL']) print('\ncontingency table (with depression):\n', ct_wdep)
ct_wdep = pandas.crosstab(sub1_WithDepression['SOCPDX12'],                          sub1_WithDepression['INCOMELEVEL']) cs_wdep = scipy.stats.chi2_contingency(ct_wdep) print('\nchi-square value, p value, expected counts (with depression):\n', cs_wdep)
'''    Income level vs. Social Phobia, moderator: without depression '''
colpct_wodep = pandas.crosstab(sub1_WithoutDepression['SOCPDX12'],                               sub1_WithoutDepression['INCOMELEVEL'],                               normalize='columns') print('\ncolumn percentage (without depression):\n', colpct_wodep)
ct_wodep = pandas.crosstab(sub1_WithoutDepression['SOCPDX12'],                           sub1_WithoutDepression['INCOMELEVEL']) print('\ncontingency table (wiitout depression):\n', ct_wodep) ct_wodep = pandas.crosstab(sub1_WithoutDepression['SOCPDX12'],                           sub1_WithoutDepression['INCOMELEVEL']) cs_wodep = scipy.stats.chi2_contingency(ct_wodep) print('\nchi-square value, p value, expected counts (without depression):\n', cs_wodep)
plt.figure() seaborn.factorplot(x="INCOMELEVEL", y="SOCPDX12", data=sub1_WithDepression,                   kind="point", ci=None) plt.xlabel('Working Adult Income Level') plt.ylabel('Social Phobia Rate') plt.title('association between  Income Level and Social Phobia for those WITH depression') plt.show()
plt.figure() seaborn.factorplot(x="INCOMELEVEL", y="SOCPDX12", data=sub1_WithoutDepression,                   kind="point", ci=None) plt.xlabel('Working Adult Income Level') plt.ylabel('Social Phobia Rate') plt.title('association between  Income Level and Social Phobia for those WITHOUT depression') plt.show()
'''   Pearson Test with a moderator '''
data = pandas.read_csv(r'../../Resources/gapminder.csv', low_memory=False) data['incomeperperson'] = pandas.to_numeric(data['incomeperperson'], errors='coerce')   data['lifeexpectancy'] = pandas.to_numeric(data['lifeexpectancy'], errors='coerce')         data['polityscore'] = pandas.to_numeric(data['polityscore'], errors='coerce')  
plt.figure()       scat1 = seaborn.regplot(x="incomeperperson", y="lifeexpectancy", fit_reg=True, data=data) plt.xlabel('Income per Person') plt.ylabel('life expectancy') plt.title('Scatterplot for the Association Between Income per Person and life expectancy') plt.show() data_clean=data.dropna()   # drop rows with NaN in any column
print ('association between incomeperperson and lifeexpectancy') print (scipy.stats.pearsonr(data_clean['incomeperperson'], data_clean['lifeexpectancy']))
# classification is based on info in PolityProject website, https://www.systemicpeace.org/polityproject.html data_clean['politygroup'] = data_clean['polityscore'].apply(    lambda x: (x <=(-6) and 1) or (x <= 5 and 2) or 3)
autocracy_data = data_clean[data_clean['politygroup']==1] anocracy_data = data_clean[data_clean['politygroup']==2] democracy_data = data_clean[data_clean['politygroup']==3]
print ('association between incomeperperson and lifeexpectancy in Autocracy Polity') print (scipy.stats.pearsonr(autocracy_data['incomeperperson'], autocracy_data['lifeexpectancy']))
print ('association between incomeperperson and lifeexpectancy in Anocracy Polity') print (scipy.stats.pearsonr(anocracy_data['incomeperperson'], anocracy_data['lifeexpectancy']))
print ('association between incomeperperson and lifeexpectancy in Democracy Polity') print (scipy.stats.pearsonr(democracy_data['incomeperperson'], democracy_data['lifeexpectancy']))
0 notes
cjdathw2 · 4 years
Text
Assignment, Week 2
This research is to study how much the family income level for working adults age 18 to 66 related to social phobia. The data for this study is from  U.S. National Epidemiological Survey on Alcohol and Related Conditions (NESARC) in nesarc.csv. Social Phobia (SOCPDX12) is used as our categorical response variable for working adults with and without social phobia, and "S1Q11B" is the total family income with 21 bins, we recode the "S1Q11B" values to fall within 3 bins into "INCOMELEVEL" as our categorical explanatory variable.
The family income “ S1Q11B “ is recoded into 3 income levels “INCOMELEVEL” as follows:
   group 1: family income below $30,000
   group 2: family income $30,000 ~ $79,999
   group 3: family income above $80,000
The code statement: 
Tumblr media
generates the observed contingency table of  “ SOCPDX12 “  and "INCOMELEVEL".  Here is the contingency table:
INCOMELEVEL     1         2        3 SOCPDX12                           0.0                  4809   5405  1604       1.0                    196     155      56
The table of column percentage of social phobia for different income groups are calculated as:
INCOMELEVEL         1                 2                3 SOCPDX12                                    0.0                      0.960839    0.972122   0.966265    1.0                      0.039161    0.027878   0.033735
With the library function "stats.chi2_contingency", 
Tumblr media
we compute the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table. The expected frequencies are computed based on the marginal sums under the assumption of independence, and the result is shown here:
chi-square value, p value, expected counts (10.430752023058572, 0.005432390450963244, 2,                  array([[4838.37137014, 5374.89406953, 1604.73456033],                    166.62862986,  185.10593047,   55.26543967]]))
The p-value of chi-square test indicates there is no association between  categorical response variable “ SOCPDX12″ and categorical explanatory variable “ INCOMELEVEL’.  However, It does not tell us what way the rates of social phobia ( SOCPDX12) are not equal across the income levels.
To determine which income levels are different from the others, we will again need to perform a post hoc test by conducting post hoc comparisons between pairs of rates in a way that avoids excessive type 1 error. 
To appropriately protect against type 1 error in the context of a chi-squared test, we will use the postdoc approach known as the Bonferroni Adjustment.  So that we can evaluate which pairs of social phobia rates are different from one another.
We would adjust the p-value to make it more difficult to reject the null hypothesis. The adjusted p-value is calculated by dividing p 0.05 by the number of comparisons that we plan to make. Since we make three comparisons, we would only reject the null hypothesis if the p-value were 0.017 or less. The goal is to examine the p-value for each of the paired comparisons and to use the adjusted Bonferroni p-value of 0.017 to evaluate significance.  
Here is the result of pairs of comparison among income level group 1, group 2 and group 3.
---   group 1 vs group 2 --- 
Column percentage:
COMP1v2           1.0               2.0 SOCPDX12                        0.0                 0.960839     0.972122    1.0                 0.039161     0.027878
contingency table:
COMP1v2        1.0       2.0 SOCPDX12                0.0              4809     5405    1.0                196       155
chi-square value, p value, expected counts:
(10.091784108935123, 0.0014893184228619205, 1, array([[4838.71935637, 5375.28064363],  [ 166.28064363,  184.71935637]]))
---  group 1 vs  group 3 ---
Column percentage:
COMP1v3         1.0             3.0 SOCPDX12                        0.0               0.960839   0.966265    1.0               0.039161   0.033735
Contingency table:
COMP1v3           1.0        3.0 SOCPDX12                0.0                  4809    1604    1.0                    196        56
chi-square value, p value, expected counts:
(0.8651415094814217, 0.3523038651698611, 1, array([[4815.76369092, 1597.23630908], [ 189.23630908,   62.76369092]]))
---  group 2 vs  group 3  ---
Column percentage:
COMP2v3         2.0             3.0 SOCPDX12                        0.0              0.972122    0.966265    1.0              0.027878    0.033735
contingency table:
COMP2v3         2.0       3.0 SOCPDX12                0.0                5405    1604    1.0                  155        56
chi-square value, p value, expected counts: (1.3462885240203943, 0.24592801149432877, 1, array([[5397.51246537, 1611.48753463], [ 162.48753463,   48.51246537]]))
Conclusion:
Examining the p-value for each of the paired comparisons and using the adjusted Bonferroni p-value of 0.017 to evaluate significance shows that the income level group 1 is significantly different from group 2 as indicated by the p-value of 0.00149 much less than 0.0017, and the chi-square test strongly rejects the null hypothesis that the social phobia rates of income level 1 and level 2 are statistically equal.  And the chi-square tests of the other pairs of group 1/3 and group 2/3 do not show the significantly different rates of the social phobia as suggested by the p-values of larger than  0.017.
Python script for the homework:
# -*- coding: utf-8 -*- """ Created on Fri Mar 26 11:03:26 2021
@author: Chris """
import pandas import scipy.stats
data = pandas.read_csv('../Week 1/nesarc.csv', low_memory=False) data['CHECK321'] = pandas.to_numeric(data['CHECK321'], errors='coerce')
''' ==================================================    cross table:  Income Level vs. Social Phobia    ================================================== '''
sub1 = data[ (data['AGE'] >= 18) &  (data['AGE'] <= 66)] sub1['INCOMELEVEL']= sub1['S1Q11B'].apply(lambda x:  (x <= 8  and 1) or                                          (x <= 14 and 2) or 3)
colpct1 = pandas.crosstab(sub1['SOCPDX12'], sub1['INCOMELEVEL'],                          normalize='columns') print('\noverall, column pecent\n', colpct1)
ct1 = pandas.crosstab(sub1['SOCPDX12'], sub1['INCOMELEVEL']) print('\noverall, contingency table\n', ct1)
cs1= scipy.stats.chi2_contingency(ct1)   print ('\noverall, chi-square value, p value, expected counts\n', cs1)
''' =========================================    post hoc comparison:  group 1 vs. group 2    ========================================= '''
recode12 = {1:1, 2:2} sub1['COMP1v2']=sub1['INCOMELEVEL'].map(recode12)     colpct12 = pandas.crosstab(sub1['SOCPDX12'], sub1['COMP1v2'],                           normalize='columns')   print("\ngroup 1 vs  groupe 2, Column pecent:\n", colpct12)
ct12 = pandas.crosstab(sub1['SOCPDX12'], sub1['COMP1v2'])   cs12 = scipy.stats.chi2_contingency(ct12) print("\ngroup 1 vs group 2, chi-square value, p value, expected counts\n",      cs12)
''' =========================================    post hoc comparison:  group 1 vs. group 3    ========================================= '''
recode13 = {1:1, 3:3} sub1['COMP1v3']=sub1['INCOMELEVEL'].map(recode13)   colpct13 = pandas.crosstab(sub1['SOCPDX12'], sub1['COMP1v3'],                           normalize='columns')   print("\ngroup 1 vs  groupe 3, Column pecent:\n", colpct13)
ct13 = pandas.crosstab(sub1['SOCPDX12'], sub1['COMP1v3'])   print('\ngroup 1 vs group 3, contingency table\n', ct13) cs13 = scipy.stats.chi2_contingency(ct13) print("\ngroup 1 vs group 3, chi-square value, p value, expected counts\n",      cs13)
''' =========================================    post hoc comparison:  group 2 vs. group 3    ========================================= '''
recode23 = {2:2, 3:3} sub1['COMP2v3']=sub1['INCOMELEVEL'].map(recode23)     colpct23 = pandas.crosstab(sub1['SOCPDX12'], sub1['COMP2v3'],                           normalize='columns')   print("\ngroup 2 vs  groupe 3, Column pecent:\n", colpct23)
ct23 = pandas.crosstab(sub1['SOCPDX12'], sub1['COMP2v3'])   print('\ngroup 2 vs group 3, contingency table\n', ct23) cs23 = scipy.stats.chi2_contingency(ct23) print("\ngroup 2 vs group 3, chi-square value, p value, expected counts\n",      cs23)
0 notes
vj-vzb · 4 years
Text
Data Analysis Tools - Week 3 Assignment (Correlation Coefficient )
Code:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new2; set mydata.gapminder; IF incomeperperson eq . THEN incomegroup=.; IF urbanrate eq . THEN urbanrate=.; IF oilperperson eq . THEN oilperperson=.;
ELSE IF incomeperperson LE 750 THEN incomelevel=1; ELSE IF incomeperperson LE 2500 THEN incomelevel=2; ELSE IF incomeperperson LE 9400 THEN incomelevel=3; ELSE IF incomeperperson GT 9400 THEN incomelevel=3;
PROC SORT; by COUNTRY;
PROC CORR; VAR urbanrate incomeperperson oilperperson;
RUN;
Result for Review:
Tumblr media
Observations and Summary :
Above, in the  Pearson Correlation Coefficient table  where two variables of interest intersect, represent   the correlation coefficients of interest and the associated p values.
oilperperson - oil Consumption per capita (tonnes per year and person)
urbanrate- Urban population refers to people living in urban areas as defined by national statistical offices
incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$.
For the association between urbanrate and oilperperson, the correlation coefficient is approximately 0.62 with a p-value of 0.0001. This means that the relationship is statistically significant.
For the association between incomeperperson and oilperperson , the correlation coefficient is approximately 0.61 and also has a significant p-value.
The association between oilperperson and income is strong and it's also positive. The association between oilperperson and urbanrate is also positive but slightly more strong at 0.62. Both are statistically significant. That is, for both associations, it's highly unlikely that a relationship of this magnitude would be due to chance alone.
As we know here, Post hoc tests are not necessary when conducting Pearson correlation.
In addition to this, small r squared is the fraction of the variability of one variable that can be predicted by the other. If we square our correlation coefficient of 0.62, we get 0.37. This means we can predict 37% of the variability we will see in the rate of Internet use. 
0 notes
Text
Assignment 4: Regression Modelling in Practice
Code:
libname mydata "/courses/d1406ae5ba27fe300 " access=readonly;
data new; set mydata.gapminder;
/* Categorizing Internetuserate & Incomeperperson as binary variables */ If internetuserate < 44 then Internetuse=0; else Internetuse=1; If incomeperperson < 8740 then Incomelevel=0; else Incomelevel=1; If urbanrate<44 then Urbanization = 0; else Urbanization = 1;
/* running Logistic regression */ Proc logistic descending; model Internetuse=Incomelevel; Run;
/* adding Urbanization */ Proc logistic descending; model Internetuse=Incomelevel Urbanization; run;
Output:
Tumblr media Tumblr media Tumblr media
Interpretation:
My Response variable internetuserate  & explanatory variable incomeperperson were quantitative variables in gapminder data set. I have first coverted both into binary categorical variables as Internetuse & Incomelevel on bisis on there mean values. for values less then mean were assigned “0″ & above mean were given value “1″. Mean for internetuserate was 35.6 & incomeperperson was 8740.
On running logistic regression p-value for Incomelevel is <0.0001, hence it’s significant. 
Odd’s ratio is 43.119, so model is statistically significant & gives indication that as Incomelevel increases Internetuse is more likely to be 1. Or alternatively Internetuse is 43.119 times more likely to be 1 when Incomelevel is 1.
There’s 95% probability that Odd’s ratio will vary between 16.58 & 112.135.
Checking Confounding by adding Urbanization: On adding Urbanization as explanatory variable to test confounding, it is observed that p-value for Incomelevel is still < 0.0001 & p-value for Urbanization is < 0.0007. This means Incomelevel is still significant for the model & Urbanization is also a significant explanatory variable. 
Odd’s ratio for Incomelevel is 32.998 with 95% probability of it being in range 12.283 to 88.648. So if Incomelevel is 1 then Internetuse is 33 times likely to be 1.
Odd’s ratio for Urbanization is 5.649 with 95% probability of it being in range 2.072 to 15.404. So if Urbanization is 1 then Internetuse is 5.6 times likely to be 1.
0 notes
Text
Assignment Week 4 - Course Data Management and Visualization
Following my previous SAS program, I included in my analysis the secondary variables I created by categorizing the three quantitative variables I chose –income per capita, female employment rate and breast cancer rate- into four quintiles each. When graphing the information, I considered both the new categorical variables and the original quantitative variables.
As a matter of clarity, the secondary variables categorize the information as follows:
·      Income per capita:represented by the secondary variable Incomelevel and divided into four groups (according to World Bank Data ranges for 2010), with group 1 being the poorest countries and the group 4 the richest.
·      Female employment rate: represented by FemaleEmploymentRate and divided into four quantiles, with group 1 being the countries with the lowest rate and 4 the ones with the highest.
·      Breast cancer rate: represented by BreastCancerRate and divided into four quantiles, with group 1 being the countries with the lowest rate and 4 the ones with the highest.
With the new variables created, I created bar charts for each of them, which are presented below. I included the original quantitative and secondary categorical variables in the graphing process.
As it can be seen in Graph 1, the higher percentages on the left correspond to lower income countries. In the image on the left, it can be seen in terms of the two lower income groups (1 and 2). In the figure on the right, the distribution can be observed in a more detailed way, with the highest percentage –almost 80%- corresponding to countries with an average income per capita of 6,000 USD (constant 2000 rate).
Univariate graph 1:Income per capita (left: secondary var.; right: primary var.)
Tumblr media
As it can be seen in Table 1, the average income per person is 8,740 USD, and the standard deviation is 14,262. The number of observations (190) corresponds to the total minus the missing values. 
Table 1: Income per capita (quantitative variable) 
Tumblr media
Univariate graph 2 represents both primary and secondary variables for female employment rate. The figure on the left shows higher concentration of observations in quartiles 2 and three, meaning in female employment rates between 25% and 75%. The figure on the right shows a more detailed distribution, in which the highest concentration is found around 52%.
Univariate graph 2:Female employment level (left: secondary var.; right: primary var.)
Tumblr media
Table 2 provides the detailed information of the primary variable for female employment rate, in which the mean is 47, 55% and the standard deviation is 14,62%.
Table 2: Female employment rate (quantitative variable)
Tumblr media
Univariate graph 3 is the last univariate bar charts in this assignment, and it illustrates the distribution of the breast cancer rate for our database. As it can be seen, the highest concentration of countries is found in the two first quartiles (under 50%). More specifically, around values between 18% and 30%, which represents almost 60% of the total.
Univariate graph 3: Breast cancer rate (left: secondary var.; right: primary var.)
Tumblr media
Table 3 shows us an average breast cancer rate of 37,40% with a standard deviation of 22,69%.  
Table 3: Breast Cancer Rate (quantitative variable)
Tumblr media
Bivariate graphs 1 and 2 depict the relationship between breast cancer rate (as independent variable) and income per capita (graph 1) and female employment rate (graph 2). From the first graph, we can appreciate a positive slope, meaning a direct relationship between the variables. Preliminarily, it could be argued that higher rates of breast cancer are associated with higher levels of income per capita. Bivariate graph 2, on the other hand, does not show a clear relationship, as the u-shaped bar chart shows us lower levels of female employment rates associated with intermediate breast cancer rates (groups 2 and 3) and higher levels of the former found in the extreme values of breast cancer rates (groups 1 and 4).
Bivariate graph 1: Breast rate cancer and income per capita (secondary variables)
Tumblr media
Bivariate graph 2: Breast cancer rate and female employment rate (secondary variables)
Tumblr media
Ultimately, two scatter plots were made, in order to analyze the interaction between the primary (quantitative) variables of interest. Graph 4 depicts the interaction between breast cancer rate and income per capita, while graph 5 illustrates the interaction between breast cancer rate and female employment rate. In both cases is hard to see a clear pattern. In fact, while in graph 4 a curved line with a positive slow might be plausible, most observations are concentrated in the lower rates of breast cancer and levels of income, showing large deviations from the mean in both cases and making it harder to infer information from the graphic illustration of the interaction. With regards to graph 5, no clear pattern following a mean can be found.
Graph 4: Scatter plot Breast cancer rate & Income per capita
Tumblr media
Graph 5: Scatter plot Breast cancer rate & female employment rate
Tumblr media
Annex:
A.   SAS Program
LIBNAMEmydata "/courses/d1406ae5ba27fe300"access=readonly;  
DATAnew;setmydata.gapminder;  
Keepcountry incomeperperson femaleemployrate breastcancerper100TH Incomelevel FemaleEmploymentRate BreastCancerRate;
/*Based on World Bank Data 2010, we will categorize the countries by level of income  
in Low-income, Lower-Middle Income, Upper-Middle Income and High-Income*/
IFincomeperperson=.THENIncomelevel=.;
ELSEIFincomeperperson >=0andincomeperperson <=975THENIncomelevel=1;
ELSEIFincomeperperson >975andincomeperperson <=3855THENIncomelevel=2;
ELSEIFincomeperperson >3855andincomeperperson <=11905THENIncomelevel=3;
ELSEIFincomeperperson >11905THENIncomelevel=4;
/* Female Employment Rate and Breast Cancer Rate created as categorical variables representing quintiles for each primary variable*/
IFfemaleemployrate=.THENFemaleEmploymentRate=.;
ELSEIFfemaleemployrate >=0andfemaleemployrate <=25THENFemaleEmploymentRate=1;
ELSEIFfemaleemployrate >25andfemaleemployrate <=50THENFemaleEmploymentRate=2;
ELSEIFfemaleemployrate >50andfemaleemployrate <=75THENFemaleEmploymentRate=3;
ELSEIFfemaleemployrate >75andfemaleemployrate <=100THENFemaleEmploymentRate=4;
IFbreastcancerper100TH=.THENBreastCancerRate=.;
ELSEIFbreastcancerper100TH >=0andbreastcancerper100TH <=25THENBreastCancerRate=1;
ELSEIFbreastcancerper100TH >25andbreastcancerper100TH <=50THENBreastCancerRate=2;
ELSEIFbreastcancerper100TH >50andbreastcancerper100TH <=75THENBreastCancerRate=3;  
ELSEIFbreastcancerper100TH >75andbreastcancerper100TH <=100THENBreastCancerRate=4;  
LABELIncomelevel="Income per capita"  
BreastCancerRate="Breast cancer rate"  
FemaleEmploymentRate="Female employment rate";  
PROCSORT;byCountry;
PROCPRINT;VARIncomelevel FemaleEmploymentRate BreastCancerRate;
/* Frequency tables*/  
PROCFREQ;TABLES Incomelevel FemaleEmploymentRate BreastCancerRate;
PROCGCHART;VBARIncomelevel/discretewidth=10TYPE=PCT;
PROCGCHART;VBARincomeperperson/TYPE=PCT;
PROCGCHART;VBARFemaleEmploymentRate/discretewidth=10TYPE=PCT;
PROCGCHART;VBARfemaleemployrate/width=5TYPE=PCT;
PROCGCHART;VBARBreastCancerRate/discretewidth=10TYPE=PCT;
PROCGCHART;VBARbreastcancerper100TH/width=5TYPE=PCT;
PROCUNIVARIATE;VARincomeperperson femaleemployrate breastcancerper100TH;
PROCGCHART;VBARBreastCancerRate/DISCRETETYPE=mean width=10SUMVAR=Incomelevel;
PROCGCHART;VBARBreastCancerRate/DISCRETETYPE=mean width=10SUMVAR=incomeperperson;
PROCGCHART;VBARBreastCancerRate/DISCRETETYPE=mean width=10SUMVAR=FemaleEmploymentRate;
PROCGCHART;VBARBreastCancerRate/DISCRETETYPE=mean width=10SUMVAR=femaleemployrate;
PROCGPLOT;PLOTincomeperperson*breastcancerper100TH;
PROCGPLOT;PLOTfemaleemployrate*breastcancerper100TH;
RUN;
0 notes
Photo
Tumblr media
How reachable is zero-tax income level of Rs 5 lakh using deductions, exemptions? Read more, Visit - bit.ly/2UXY1Ny #zerotax #incomelevel #TheEconomicTimes
0 notes
surveycircle · 5 years
Text
Participants needed for online survey! Topic: "Survey on the Effect of Income Levels on Healthy Eating" https://t.co/j3QrnhwDFz via @SurveyCircle#IncomeLevel #HealthyEating #organic #inorganic #nutrition #food #healthy #survey #surveycircle pic.twitter.com/JjSwyEP88a
— Daily Research (@daily_research) November 20, 2019
0 notes
cnamrata · 5 years
Text
Assignment 2: Data Management and Visualization
Requirement of Assignment 2: Following completion of your first program, create a blog entry where you post 1) your program 2) the output that displays three of your variables as frequency tables and 3) a few sentences describing your frequency distributions in terms of the values the variables take, how often they take them, the presence of missing data, etc.
1. Python program
I ran the program in python and below is the code:
import pandas import numpy
data = pandas.read_csv('gapminder.csv', low_memory = False)
print("Returns the count of number of countries in each employment level") c1=data.groupby("Countrylevelofemployment").size() print (c1)
print("Returns % distribution of countries in each employment level") p1=data.groupby("Countrylevelofemployment").size() * 100/len(data) print(p1)
print("Returns the count of countries falling under various income levels") c2=data.groupby("Incomelevel").size() print(c2)
print ("Returns % of countries falling in each income level") p2=data.groupby("Incomelevel").size()*100/len(data) print (p2)
print("Returns internet usage level across countries") c3=data.groupby("Internetusagelevel").size() print(c3)
print("Returns % countries in each internet usage bracket") p3=data.groupby("Internetusagelevel").size() *100/len(data) print(p3)
print("Returns the number of countries at each level of urbanization") c4=data.groupby("Urbanizationlevel").size() print(c4)
print("Returns % of countries at each level of urbanization") p4=data.groupby("Urbanizationlevel").size()*100/len(data) print(p4)
2. Output of the program
Returns the count of number of countries in each employment level Countrylevelofemployment Average Employment Rate (50% to 70%)    114 Data Not Available                       35 High Employment Rate (> 70%)             27 Low Employment Rate (0% to 50%)          37 dtype: int64
Returns % distribution of countries in each employment level Countrylevelofemployment Average Employment Rate (50% to 70%)   53.521127 Data Not Available                     16.431925 High Employment Rate (> 70%)           12.676056 Low Employment Rate (0% to 50%)        17.370892 dtype: float64
Returns the count of countries falling under various income levels Incomelevel Data Not Available                                  23 High income countries (> $30,000)                   16 Low income countries ($0 to $10000)                143 Mid level income countries ($10,000 to $30,000)     31 dtype: int64
Returns % of countries falling in each income level Incomelevel Data Not Available                                10.798122 High income countries (> $30,000)                  7.511737 Low income countries ($0 to $10000)               67.136150 Mid level income countries ($10,000 to $30,000)   14.553991 dtype: float64
Returns internet usage level across countries Internetusagelevel Data Not Available                    21 High internet usage (> 60%)           47 Low internet usage (0% to 30%)        93 Medium internet usage (30% to 60%)    52 dtype: int64
Returns % countries in each internet usage bracket Internetusagelevel Data Not Available                    9.859155 High internet usage (> 60%)          22.065728 Low internet usage (0% to 30%)       43.661972 Medium internet usage (30% to 60%)   24.413146 dtype: float64
Returns the number of countries at each level of urbanization Urbanizationlevel Data Not Available            10 Developed (> 70%)             64 Developing (40% to 70%)       80 Underdeveloped (0% to 40%)    59 dtype: int64
Returns % of countries at each level of urbanization Urbanizationlevel Data Not Available            4.694836 Developed (> 70%)            30.046948 Developing (40% to 70%)      37.558685 Underdeveloped (0% to 40%)   27.699531 dtype: float64
3. Description of the program
I chose GapMinder dataset in which the entire data is not absolute numbers. Data is either in percentages such as the variable ‘employrate’ which was  % of people employed in the entire population of the country or variables like ‘income per person’ which describe the income earned on a per capita basis. Frequency distribution on such data would not have given relevant results. Hence, I introduced few dummy variables which i used for frequency distribution. They are as follows:
A) Countrylevelofemployment: This variable takes 4 values namely
Low Employment Rate for countries whose employment rate is between 0 to 50%
Average Employment Rate for countries whose employment rate is between 50 to 70% and 
High Employment Rate for countries whose employment rate is greater than 70%
Data Not Available for countries which have no information available
B) Incomelevel: This variable takes 4 values namely
Low income countries for those countries whose per capita income level is less than $10K
Mid level income countries for those countries whose per capita income is between $10K to $30K
High income countries for those countries whose per capita income is greater than $30K
and Data Not Available incase no information was available about that country
C) Internetusagelevel: This variable takes 4 values namely
Low internet usage for countries whose rate is less than 30%
Medium internet usage for countries whose usage rate is between 30 to 60% 
High internet usage for countries whose usage rate is greater than 60% and
Data Not Available for countries on which information was available
D) Urbanization level: This variable takes 4 value namely
Under developed for countries whose urbanization levels were less than 40%
Developing countries were the ones with urbanization rate between 40 to 70%
Developed countries were the ones with urbanization rate greater than 70% and
Data Not Available for those countries which had no information available
Observation from program output:
Based on the output of the program, highest values across each distribution are given below
i) 53.5% of the countries fall under Average Employment Rate (50% to 70%)
ii) 67.1% of countries are low income countries (percapita income of $0 to $10,000)
iii) 43.7% of countries had low internet usage (0% to 30%)
iv) 37.6% of countries were developing (40% to 70%). This is closely followed by 30% of countries which were developed (>70% urbanization rate) which is in turn closely followed by 27.7% of countries whch were under developed (<40% urbanization rate). 
0 notes
youtube
How to Determine Your Income Level - Becoming a Business Consultant
Click Here For Starting Online Marketing Consulting Company Discovery Webinar https://streetsmartbusinessadvisors.com.au/7-figures
I share a simple but powerful formula that reveals how you can quickly increase your profits or Income.
To find out more about Streetsmart Business School: https://www.streetsmartbusinessschool.com.au
Request your Free Business Acceleration Kit valued at $1,580.00 https://www.amazingfreegift.com.au
To discover how to become a Highly Paid, in Demand Business Consultant go to https://streetsmartbusinessadvisors.com.au/7-figures-v
Becoming a Business Coach is the most personal rewarding, lucrative industry in existence.
Coaching Business Owners and seeing the improved results they achieve is so rewarding.
To witness a Real Life Business Turnaround go to : https://www.streetsmartbusinessschool.com.au/webinar-replay-ssbs
Get your Free Copy of Ians latest Business Growth Book : https://theinconvenienttruthaboutbusiness.com/get-your-free-book
Follow us on our Social Media:
Facebook https://www.facebook.com/streetsmartmarketing1 Twitter https://twitter.com/StreetsmartM LinkedIn https://au.linkedin.com/in/ianhmarsh
https://www.youtube.com/watch?v=Pz_3jii3p3c
https://www.youtube.com/watch?v=Yyi7v56SUoQ
https://www.youtube.com/channel/UCgstJPoY6eTzD7vJtIc1nHg?view_as=public
#streetsmartbusinessbriefing #IncomeLevel  #streetsmartbusiness #businesssuccess
0 notes
jwstats3993-blog · 7 years
Text
Visualizing Data
Objective
The overall ADDHEALTH dataset has 6504 observations.  Since I am interested in the effect of adolescent employment on pursuit of higher education, I focused only on the adolescents who claim to have worked in the past 4 weeks. This cut my dataset down to 3687 observations representing only the teens that have worked. The variables of interest to me were: H1EE1, H1EE2, H1EE3, H1EE4, H1EE5, H1EE6, H1EE7, H1EE12, H1EE13, H1EE14.  The Frequency Table results have these variables conveniently labeled.
SAS CODE
LIBNAME mydata "/courses/d1406ae5ba27fe300" access=readonly; data new; set mydata.addhealth_pds; LABEL H1EE1='Scale of 1-5, Want to go to College?' H1EE2='Scale of 1-5, How Likely is College for you?' H1EE3='Past 4 weeks, Work for Pay?' H1EE4='How many hours of work, non-summer week? (Group Responses 1-140)' H1EE5='Total income, non-summer week? (Group Responses 1-900)' H1EE6='How many hours of work, summer week? (Group Responses 1-99)' H1EE7='Total income, summer week? (Group Responses 1-900)' H1EE12='Live to Age 35?' H1EE13='Married by Age 25?' H1EE14='Killed by Age 21?'; /* My subset of the ADDHEALTH dataset, containing variables of interest. The "if" statement selects only the 3687 participants in the study who worked for pay in the last 4 weeks*/ if H1EE1 >=6 then H1EE1=.; if H1EE2 >=6 then H1EE2=.;
if H1EE3 =1; /* Only those who worked past four weeks */
if H1EE4 >=996 then H1EE4=.; if H1EE5 >=996 then H1EE5=.; if H1EE6 >=996 then H1EE6=.; if H1EE7 >=996 then H1EE7=.; if H1EE12 >=6 then H1EE12=.; if H1EE13 >=6 then H1EE13=.; if H1EE14 >=6 then H1EE14=.; /* If statements above remove "refused", "legitimate skip", "don't know", and "not applicable" values. */
if H1EE4 <= 20 then WORKHOURS=1; /* Part-Time */ else if H1EE4 <= 40 then WORKHOURS=2; /* Full-Time*/ else WORKHOURS =3; /* Overtime */
if H1EE5 <= 20 then INCOMELEVEL=1; /* $20 or Less */ else if H1EE5 <= 50 then INCOMELEVEL=2; /* $21-$50 */ else if H1EE5 <= 100 then INCOMELEVEL=3; /* $51-$100 */ else if H1EE5 <= 150 then INCOMELEVEL=4; /* $101-$150 */ else if H1EE5 <= 200 then INCOMELEVEL=5; /* $151-$200 */ else INCOMELEVEL=6; /* Above $200 */
if H1EE6 <= 20 then WORKHOURS_SUM=1; else if H1EE6 <= 40 then WORKHOURS_SUM=2; else WORKHOURS_SUM =3;
if H1EE7 <= 20 then INCOMELEVEL_SUM=1; else if H1EE7 <= 50 then INCOMELEVEL_SUM=2; else if H1EE7 <= 100 then INCOMELEVEL_SUM=3; else if H1EE7 <= 150 then INCOMELEVEL_SUM=4; else if H1EE7 <= 200 then INCOMELEVEL_SUM=5; else INCOMELEVEL_SUM=6; proc sort; by AID;
proc freq; TABLES H1EE1 H1EE2 H1EE3 H1EE4 WORKHOURS H1EE5 INCOMELEVEL H1EE6 WORKHOURS_SUM H1EE7 INCOMELEVEL_SUM H1EE12 H1EE13 H1EE14;
proc gchart; vbar H1EE2/ type=mean sumvar=H1EE1; Title 'Likelihood of College vs Desire for College'; Title2 'H1EE2 vs H1EE1';
proc gchart; vbar H1EE12/ type=mean sumvar=H1EE1; Title 'Life Expectancy vs Desire for College'; Title2 'H1EE12 vs H1EE1';
proc gchart; vbar H1EE13/ type=mean sumvar=H1EE1; Title 'Outlook on Marriage vs Desire for College'; Title2 'H1EE13 vs H1EE1';
proc gchart; vbar H1EE14/ type=mean sumvar=H1EE1; Title 'Extreme Life Circumstances vs Desire for College'; Title2 'H1EE14 vs H1EE1';
proc univariate; var H1EE4 H1EE5 H1EE6 H1EE7;
proc gchart; vbar WORKHOURS/discrete type=mean sumvar=H1EE1; Title 'Non-Summer Work Hours vs Desire for College'; Title2 'H1EE4 vs H1EE1';
proc gchart; vbar INCOMELEVEL/discrete type=mean sumvar=H1EE1; Title 'Non-Summer Income vs Desire for College'; Title2 'H1EE5 vs H1EE1';
proc gchart; vbar WORKHOURS_SUM/discrete type=mean sumvar=H1EE1; Title 'Summer Work Hours vs Desire for College'; Title2 'H1EE6 vs H1EE1';
proc gchart; vbar INCOMELEVEL_SUM/discrete type=mean sumvar=H1EE1; Title 'Summer Income vs Desire for COllege'; Title2 'H1EE7 vs H1EE1';
Results
All Frequency Tables, Graphs and relevant Summary Statistics are included in this section. Please follow the link below...
https://docs.google.com/document/d/1u7NJBatZ9cL3B8Z318Yc84dyt4LOTNHcQHfgqIPcnIE/edit?usp=sharing
Summary 
Before making judgements it is important to understand the nature of the data we are working with.
Here are some key characteristics of the employed teens in this sample show the following:
1) Employed Teens have a relatively strong desire to go to college (70.32%)
2) Employed Teens have a high expectancy of going to college (56.36%)
3) Outside of the summertime, most employed teens only work part-time (79.98%)
4) Outside of the summertime, most employed teens only earn at most $100 a week (81.80%)
5) In the summertime, most employed teens work either part-time (51.69%) or full-time (39.30%)
6) In the summertime, most employed teens only earn at most $100 a week (58.18%), though a considerable amount earn between$101 and $200 a week (28.80%)
7) Most employed teens expect to live to age 35 (56.36%)
8) Most employed teens either believe there is a 50-50 chance of getting married by age 25 (35.27%), or it is pretty likely (30.98%)
9) Most employed teens either believe it is unlikely that they will be killed by age 21 (50.49%), or that it is probable but pretty unlikely (33.82%)
Our interest is in examining potential underlying factors that influence employed teens to want to go to college.  For this reason, all variables, whether quantitative or qualitative, were compared to the variable H1EE1 (Desire for College).  
The first comparison is the “Likelihood of College vs. Desire for College” of employed teens.  We know already that in this sample, 56.36% of employed teens have a strong desire to go to college (see bullet 2 above).  Our data seems to suggest that though the vast majority want to go to college, it is not a given that these teens will be able to attend a university.  The corresponding graph for this relationship shows a positive association between likelihood of attendance and the desire to go to college.  Meaning, those with high likelihood responses often want to go to college.  However, the opposite is also true... Those with low likelihood responses often don’t want to go.  Why is that?  
Perhaps, those teens with low likelihood responses feel that there is no hope for them to attend.  Maybe their parents cannot afford it, or they are limited in their dreams of higher learning because of the extreme circumstances of the realities of where they live. There can be a multitude of reasons.  Since these are employed teens under consideration who don’t have large salaries, it is unlikely that this group of individuals is prone to delinquency.  They probably just feel hopeless. It should be a normative goal of the government and society as a whole to provide a hopeful outlook to these working children.  An investment in scholarships and grants towards these young adults may improve their desire to obtain a degree, which nowadays is a standard for achievement in this generation.
The second comparison is “Life Expectancy vs Desire for College” of employed teens. We know that 56.36% of employed teens expect to live beyond the age of 35 (see bullet 7 above).  It turns out that there was no obvious relationship between life expectancy responses and whether or not the youth wanted to go to college. Life expectancy may not be an important factor in the decision to go to college for employed teens.
The third comparison is “Outlook on Marriage vs Desire for College” of employed teens.  Most employed teens have a 50-50 outlook on being married by age 25.  This variable was initially of interest because marriage requires planning and money for both parties.  I figured that those with a plan to get married at an earlier age might also be more mature individuals, with their minds set on higher learning and achievement in order to obtain financial stability.  This data suggest that the out look on marriage by age 25 is not a determining factor for desire to go to college.
The fourth comparison is “Extreme Life Circumstances vs Desire for College” of employed teens. The teens in this sample generally believe it is unlikely that they will be killed by age 21 (50.49%, see bullet 9).  The data suggests that whether they believe that they will likely die by age 21 or not is of no consequence in determining their desire to go to college. 
The fifth comparison is “Non-Summer Work Hours vs Desire for College” of employed teens.  The Non-summer Work Hours variable was quantitative and thus broken up into three categories of Part-Time=1, Full-Time=2, and Overtime=3.  We know that outside of the summertime, most employed teens work only Part-Time (79.98%, see bullet 3).  The data suggests that the more hours teens work outside of the summer time, their desire for college slightly decreases. There may be a slightly negative relationship between these two variables.
The sixth comparison is “Non-Summer Income vs Desire for College” of employed teens.  We know that outside of the summertime, most employed teens only earn at most $100 a week (81.80%, see bullet 4). The data suggests generally that the more money employed teens make per week, the lower their desire for college becomes.  There may be a slightly negative relationship between these two variables.
The seventh comparison is “Summer Work Hours vs Desire for College” of employed teens.  We know that In the summertime, most employed teens work either part-time (51.69%) or full-time (39.30%, see bullet 5).  The data suggests that the more hours teens work during the summertime, their desire for college slightly decreases.  There may be a slightly negative relationship between these two variables.
The eight and final comparison is “Summer Income vs Desire for College” of employed teens.  We know that In the summertime, most employed teens only earn at most $100 a week (58.18%), though a considerable amount earn between$101 and $200 a week (28.80%, see bullet 6).  The data suggest that there is relatively no association between adolescents summer income and their desire to go to college.
0 notes
Text
Data Analysis Tools: Assignment 2
Code:
libname mydata "/courses/d1406ae5ba27fe300 " access=readonly; data new; set mydata.gapminder; /* Creating a categorical variable IncomeCat from incomeperperson with 4 categories as; Level 1= less than 1000$, Level 2= between 1000$ to 2500$, Level 3= between 2500$ to 9000$ Level 4 = more than 9000$ */ if incomeperperson < 1000 then Incomelevel=1 ; else if incomeperperson < 2500 then Incomelevel=2 ; else if incomeperperson < 9000 then Incomelevel=3 ; else Incomelevel=4; /* Creating a categorical variable Netusage from internetuserate with 3 categories as; Level 1= less than 12.5, Level 2= between 12.5 to 44, Level 3= above 44 */ if internetuserate < 12.5 then Netusage = 1; else if internetuserate < 44 then Netusage = 2; else Netusage = 3; proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; Run; Data Camparison1; set new; if Incomelevel=1 or Incomelevel = 2; Proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; Run; Data Camparison2; set new; if Incomelevel=1 or Incomelevel = 3; Proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; run; Data Camparison3; set new; if Incomelevel=1 or Incomelevel = 4; Proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; run; Data Camparison4; set new; if Incomelevel=2 or Incomelevel = 3; Proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; run; Data Camparison5; set new; if Incomelevel=2 or Incomelevel = 4; Proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; run; Data Camparison6; set new; if Incomelevel=3 or Incomelevel = 4; Proc sort; by country; Proc freq; tables Netusage*Incomelevel/chisq; run;
Model Interpretation for Chi-Square Tests:
I have examined the association between Income per person (independent variable) & Internet usage (dependent or response variable) rate in a country. both are quantitative variables in Gapminder data set.
Ho: No relationship between Incomeperperson & Internetuserate
H1: There is relationship between Incomeperperson & Internetuserate
Income per person is categorized to a categorical variable IncomeCat from incomeperperson with 4 categories as; Level 1= less than 1000$, Level 2= between 1000$ to 2500$, Level 3= between 2500$ to 9000$ Level 4 = more than 9000$.
Internetuserate is categorized to a categorical variable Netusage with 3 categories as Level 1= less than 12.5, Level 2= between 12.5 to 44, Level 3= above 44
Chi-square test is run which has 6 Deg. Of Freedoms with p<.0001. This shows strong statistical evidence of association between 2 categorical variables defined.
 Model Interpretation for post hoc Chi-Square Test results:
Adjusted p value for 6 comparisons is .05/6= .0083
Post hoc comparisons of Netusage by pairs of Incomelevels  reveal that higher internet usage is associated with higher income levels. Income levels 2 & 3 have statistically similar levels of Internet usage.
Output:
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
0 notes
Text
Data Analysis Tools:Assignment 1
Code:
libname mydata "/courses/d1406ae5ba27fe300 " access=readonly; data new; set mydata.gapminder;
/* Creating a categorical variable IncomeCat from incomeperperson with 4 categories as; Level 1= less than 1000$, Level 2= between 1000$ to 2500$, Level 3= between 2500$ to 9000$ Level 4 = more than 9000$ */ if incomeperperson < 1000 then Incomelevel=1 ; else if incomeperperson < 2500 then Incomelevel=2 ; else if incomeperperson < 9000 then Incomelevel=3 ; else Incomelevel=4;
proc sort; by country;
Proc anova; class Incomelevel; Model internetuserate = Incomelevel; Means Incomelevel/duncan;
run;
Interpretation of Model Anova:
I have examined the association between Income per person (independent variable) & Internet usage (dependent or response variable) rate in a country. both are quantitative variables in Gapminder data set. Income per person is categorized to a categorical variable IncomeCat from incomeperperson with 4 categories as; Level 1= less than 1000$, Level 2= between 1000$ to 2500$, Level 3= between 2500$ to 9000$ Level 4 = more than 9000$.
F value = 107.89 & p-value< .0001. This shows income levels have association with Internet usage
Model Interpretation for post hoc ANOVA results: 
by Ducan test means of Internet usage for all 4 income levels are significantly different 
Output:
Tumblr media Tumblr media
0 notes
Text
Assignment 3
Please see result in link:
https://odamid-apse1.oda.sas.com/SASStudio/sasexec/submissions/3a737d26-7a09-4712-bf01-e869d7cf83f9/results
Program Code:
libname mydata "/courses/d1406ae5ba27fe300 " access=readonly; data new; set mydata.gapminder; /* Grouping internetuserate in 4 levels as netusage="internet users per 100 populatio, grouped in 4 levels with level 1= less than 10 users, level 2=between 10 to 32, level 3= between 32 to 56.5 level 4= more than 56.5" */ if internetuserate < 10 then netusage=1; else if internetuserate < 32 then netusage=2; else if internetuserate <56.5 then netusage=3; else netusage=4; /* Grouping incomeperperson in 4 levels as incomelevel="percapita GDP in $, grouped in 4 levels with level 1= less than 744$, level 2=between 744$ to 2550$, level 3= between 2550$ to 9425$ level 4= more than 56.5" */ if incomeperperson < 744 then incomelevel=1; else if incomeperperson < 2550 then incomelevel=2; else if incomeperperson <9425 then incomelevel=3; else incomelevel=4; /* Grouping urbanrate in 4 levels as urbanization=" % of urban population, grouped in 4 levels with level 1= less than 36.8%, level 2=between 36.8% to 57.5%, level 3= between 57.5% to 74.5% level 4= more than 74.5%" */ if urbanrate < 36.8 then urbanization=1; else if urbanrate < 57.5 then urbanization=2; else if urbanrate <74.5 then urbanization=3; else urbanization=4; /* Grouping femaleemployrate in 4 levels as femaleemployment=" % of female population employed, grouped in 4 levels with level 1= less than 38.5%, level 2=between 38.5% to 47.5%, level 3= between 47.5% to 56% level 4= more than 74.5%" */ if femaleemployrate < 38.5 then femaleemployment=1; else if femaleemployrate < 47.5 then femaleemployment=2; else if femaleemployrate <56 then femaleemployment=3; else femaleemployment=4; label netusage= "internet users per 100 populatio, grouped in 4 levels with level 1= less than 10 users, level 2=between 10 to 32, level 3= between 32 to 56.5 level 4= more than 56.5"; label incomelevel="percapita GDP in $, grouped in 4 levels with level 1= less than 744$, level 2=between 744$ to 2550$, level 3= between 2550$ to 9425$ level 4= more than 56.5"; label urbanization=" % of urban population, grouped in 4 levels with level 1= less than 36.8%, level 2=between 36.8% to 57.5%, level 3= between 57.5% to 74.5% level 4= more than 74.5%"; label femaleemployment=" % of female population employed, grouped in 4 levels with level 1= less than 38.5%, level 2=between 38.5% to 47.5%, level 3= between 47.5% to 56% level 4= more than 74.5%"; proc sort; by country; proc freq; table netusage incomelevel urbanization femaleemployment ; run;
Summary:
There are 4 variables for which frequency distribution is shown in the output.
1st is “netusage” which is defined as - internet users per 100 population, grouped in 4 levels with level 1= less than 10 users, level 2=between 10 to 32, level 3= between 32 to 56.5 level 4= more than 56.5
2nd is “incomelevel" defined as - percapita GDP in $, grouped in 4 levels with level 1= less than 744$, level 2=between 744$ to 2550$, level 3= between 2550$ to 9425$ level 4= more than 56.5
3rd is “urbanization=" % of urban population, grouped in 4 levels with level 1= less than 36.8%, level 2=between 36.8% to 57.5%, level 3= between 57.5% to 74.5% level 4= more than 74.5%
4th is “femaleemployment" defined as -  % of female population employed, grouped in 4 levels with level 1= less than 38.5%, level 2=between 38.5% to 47.5%, level 3= between 47.5% to 56% level 4= more than 74.5%
Output:
Tumblr media Tumblr media
0 notes
Text
Assignment Week 3 - Course Data Management and Visualization
All the variables I’m using for my analysis, namely Income per Capita, Female Employment Rate and Breast Cancer Rate, are numeric or continuous, meaning the value of each observation does not correspond to categories but to the actual rate or level of the variable for each case. In this sense, all the information each variable is providing is useful for the process, with the missing values being considered and numbered in the frequency tables. This means, in this case it would not be helpful for my research to make decisions of converting existing data in the database into missing values, or the other way around.
For the past assignment, I created categorical variables for all my three numeric variables (Income per capita, Female Employment Rate and Breast Cancer Rate). Income per capita was divided into four categories –low, lower-middle, middle and high- based on World Bank standards; female employment rate and breast cancer rate were divided into quartiles, thus grouping the levels into four groups representing 25% each.
For this week’s assignment, I decided to focus on low income and lower income countries, and to divide the other two variables in quintiles (five groups of 20% each), as follows:
·      Female employment rate (femaleemployrate):
0-20 % = “Female employment 0-20%”
20-40 %= “Female employment 20% - 40%”
40-60 %= “Female employment 40% - 60%”
60-80 %= “Female employment 60% - 80%”
80-100 %= “Female employment 80% - 100%”
·      Breast cancer rate (breastcancerper100TH):
0-20 % = “Breast cancer prevalence 0-20%”
20-40 %= “Breast cancer prevalence 20% - 40%”
40-60 %= “Breast cancer prevalence 40% - 60%”
60-80 %= “Breast cancer prevalence 60% - 80%”
80-100 %= “Breast cancer prevalence 80% - 100%”
After labeling my new categorical variables and organizing my dataset by Country, I ran the model to obtain frequency tables and information about each observation individually. To do so I used the commands FREQ and PRINT. The latter had the objective of confirming that the newly created categories corresponded indeed to the values of the primary variables. The frequency tables are displayed below:
Table 1: Income per capita
Tumblr media
As it can be seen in Table 1, the number of observations in this case (131) corresponds to the number of countries that fall into the categories of Low and Lower-Middle, according to the categorization discussed above. With 54 observations each, the Low Income and Lower-Middle Income categories represent 50% of the total, respectively. No information was available for 23 countries of the sub-set, which generated the corresponding missing values (here it is worth highlighting the fact that the number of missing values remained unchanged, meaning that the missing information all corresponds to the countries in the two poorest quartiles). As this is a numeric variable, the values each observation takes represent the level of income per capita of each country.
Table 2: Female Employment Rate
Tumblr media
Table 2 summarizes the distribution of the Female Employment Rate variable, divided into five quintiles of 20% each. Among the low and lower-middle income countries, there were three in the first quintile, corresponding to a female employment rate between 80% and 100%, and representing 2.73% of the total; 23 observations were part of the second quintile, with a female employment rate between 60% and 80%, representing 20.91% of the total; the highest number of countries -50- fell in the third quintile, with an employment rate between 40% and 60%, representing almost half of the total observations, 45.45%; the forth quartile included 28 observations and represented 25.45% of the total; ultimately, the last quartile, with the lowest rates of female employment was formed by 6 countries, representing the 5.45% of the total. No data was available for 21 countries of the sub-set. As with the Income level, the female employment rate is a numeric variable, so the values each observation takes represent the levels of female employment of each country.
Table 3: Breast Cancer Rate
Tumblr media
Table 3 summarizes the results for the Breast Cancer Rate variable. Something worth mentioning is the fact that, even among the countries with lower levels of GDP per capita, as the ones considered in this sub-sample, breast cancer rate was never higher than 60%. This follows what some corresponding literature has discussed as this type of cancer being “characteristic of the richer countries”. In fact, what this shows is that all of the 26 countries with rates higher than 60% are middle or high income countries. The first quintile includes 14 observations, which correspond to 12.96% of the total. The largest quintile corresponds to the range of incidence between 20% and 40%, with 57 observations (52.78% of the sample). The quintile corresponding to the lowest incidence rate, from 0 to 20%, consists of 37 observations, which correspond to 34.26%. No information was available for 23 countries, creating a corresponding number of missing values. Similar to the other two variables described above, the breast cancer rate is a numeric variable, so the values each observation takes represent the incidence of breast cancer in each country.
Annex:
A.   SAS Program
LIBNAME mydata "/courses/d1406ae5ba27fe300" access=readonly;
DATA new; set mydata.gapminder;
Keep country incomeperperson femaleemployrate breastcancerper100TH Incomelevel FemaleEmploymentRate BreastCancerRate;
/*Based on World Bank Data 2010, we will categorize the countries by level of income in Low-income, Lower-Middle Income, Upper-Middle Income and High-Income*/
IF incomeperperson >=0and incomeperperson <=975 THEN Incomelevel="Low           ";
ELSE IF incomeperperson >975 and incomeperperson <=3855 THEN Incomelevel="Lower-middle       ";
ELSE IF incomeperperson >3855 and incomeperperson <=11905 THEN Incomelevel="Middle      ";
ELSE IF incomeperperson >11905 THEN Incomelevel="High         ";
/* Female Employment Rate and Breast Cancer Rate created as categorical variables representing quintiles for each primary variable*/
IF femaleemployrate >=0 and femaleemployrate <=20 THEN FemaleEmploymentRate="0-20    ";
ELSE IF femaleemployrate >20 and femaleemployrate <=40 THEN FemaleEmploymentRate="20-40";
ELSE IF femaleemployrate >40 and femaleemployrate <=60 THEN FemaleEmploymentRate="40-60";
ELSE IF femaleemployrate >60 and femaleemployrate <=80 THEN FemaleEmploymentRate="60-80";
ELSE IF femaleemployrate >75 and femaleemployrate <=100 THEN FemaleEmploymentRate="80-100";
IF breastcancerper100TH >=0 and breastcancerper100TH <=20 THEN BreastCancerRate="0-20     ";
ELSE IF breastcancerper100TH >20 and breastcancerper100TH <=40 THEN BreastCancerRate="20-40";
ELSE IF breastcancerper100TH >40 and breastcancerper100TH <=60 THEN BreastCancerRate="40-60";  
ELSE IF breastcancerper100TH >60and breastcancerper100TH <=80 THEN BreastCancerRate="60-80";
ELSE IF breastcancerper100TH >80 and breastcancerper100TH <=100 THEN BreastCancerRate="80-100";  
LABEL incomelevel="Income per capita"
BreastCancerRate="Breast cancer rate (%)"
FemaleEmploymentRate="Female employment rate (%)";
/* Limit the analysis to Low and Lower-Middle income countries*/
IF incomeperperson =<3855;
PROC SORT; by Country;
PROC PRINT; VAR Country incomeperperson Incomelevel femaleemployrate FemaleEmploymentRate breastcancerper100TH BreastCancerRate;
/* Frequency tables*/  
PROC FREQ; tables Incomelevel FemaleEmploymentRate BreastCancerRate;
RUN;
A.   Tables[1]
Tumblr media
[1]For space matters, the descriptive table obtained by the PRINT command was not included in the present document
0 notes