originallydonut-blog - Tumblr blog

originallydonut-blog · 5 years ago

Text

Testing a Potential Moderator

Code

Output

Conclusion

After Pearson correlation measuring the association between female employee rate and Life expectancy rate as well as its associated P value for each of our new data frames. I use the Pearson R function from the scipy.stats library and include the variables female employee rate and life expectancy rates. When we examine the correlation coefficients between female employee rate and Life expectancy rate for each of the income groups, i find the following. For the low income group the correlation between female employee rate and life expectancy is -0.1957 and the P value is 0.917. For the middle income countries, the association between female employee rate and life expectancy is -0.219 and the P-Value is 0.051. For the high income countries, the association between female employee rate and life expectancy is 0.332 and the P-Value is 0.0359. Suggesting that the association between female employee rate and life expectancy is significant for high income countries.

0 notes

originallydonut-blog · 5 years ago

Text

Generating a Correlation Coefficient

Code

Output

Conclusion

For the association between incomeperperson and femaleemployeerate, the correlation coefficient is approximately -0.2876 . And also has p-value 0.0002. the direction of the relationship is positif. So it can be interpreted that an increase in one of the incomeperperson variables is associated with an increase in the femaleemployeerate. But, the value of correlation coefficient is under 0, thats mean the relation between associated variable not strong enough.

For the association between lifeexpectancy and femaleemployeerate, the correlation coefficient is approximately 0.028 . And also has p-value 0.716. the direction of the relationship is negatif. So it can be interpreted that an decrease in one of the lifeexpectancy variables is associated with an decrease in the femaleemployeerate. And, the value of correlation coefficient is above the 0, thats mean the relation between associated variable are strong enough.

#correlation correlationcoefficient dataanalysis

0 notes

originallydonut-blog · 5 years ago

Text

Running a Chi-Square Test of Independence

Code

Output

Conclusion

When examining the association between Taught in school about smoking (categorical response) and Tried cigarrete smoking (categorical explanatory), a chi-square test of independence revealed that the p value is 0.006, thats mean, those variable are being associated. Because they’re below 0.05.

0 notes

originallydonut-blog · 5 years ago

Text

Running an analysis of variance

Code

Output

Conclution

When examining the association between age (quantitative response) and Relation with Parents (categorical explanatory), an Analysis of Variance (ANOVA) revealed that people within range aged 13-20 that have a good relation with their parents is around 16 years, because the mean results are say so. And the P Value results is 2.77e-37, which mean that result are less than 0.05. So, the relation between age and relation with parents are associated.

0 notes

originallydonut-blog · 5 years ago

Text

Creating graphs for your data

Code

Output

1. Spread and Center

2. Output for Univariate

3. Output for Bivariate

Conclusion :

In my opinion, there might be no relation between female employee rate and breast cancer per 100th. The Scotterplot bar explain that the highest rate of Female employee doesn’t mean the number of breast cancer is high too. Maybe it’ll depend on what kind of job that they’re working on.

0 notes

originallydonut-blog · 5 years ago

Text

Making Data Management Decisions

i use add_health for the data set

output :

i used the dataset add_health and looking for association how close between one person with their friends especially for female.

and the percentages result for each variable are :

variable 1 : 0,814828

variable 2 : 0,173457

variable 3 : 0,002798

variable 4 : 0,000699

variable 5 : 0,001923

variable 6 : 0,006295

#datadecisions

0 notes

originallydonut-blog · 5 years ago

Text

Running My First Program

Code

import pandas import numpy # any additional libraries would be imported here

data = pandas.read_csv('gapminder.csv', low_memory=False)

print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns)

#setting variables you will be working with to numeric #data['breastcancerper100th'] = data['breastcancerper100th'].convert_objects(convert_numeric=True) #data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True) #data['incomeperperson'] = data['incomeperperson'].convert_objects(convert_numeric=True)

print ('Count for Breastcancerper100th') c1 = data['breastcancerper100th'].value_counts(sort=False) print (c1)

print ('Precentages for Breastcancerper100th') p1 = data['breastcancerper100th'].value_counts(sort=False, normalize=True) print (p1)

print ('Count for femaleemployrate') c2 = data['femaleemployrate'].value_counts(sort=False) print(c2)

print ('Precentages for femaleemployrate') p2 = data['femaleemployrate'].value_counts(sort=False, normalize=True) print (p2)

print ('Count for incomeperperson') c3 = data['incomeperperson'].value_counts(sort=False) print(c3)

print ('Precentages for incomeperperson') p3 = data['incomeperperson'].value_counts(sort=False, normalize=True) print (p3)

# freqeuncy disributions using the 'bygroup' function ct1= data.groupby('breastcancerper100th').size() print (ct1)

pt1 = data.groupby('breastcancerper100th').size() * 100 / len(data) print (pt1)

ct2= data.groupby('femaleemployrate').size() print (ct2)

pt2 = data.groupby('femaleemployrate').size() * 100 / len(data) print (pt2)

ct3= data.groupby('incomeperperson').size() print (ct3)

pt3 = data.groupby('incomeperperson').size() * 100 / len(data) print (pt3)

#upper-case all DataFrame column names - place afer code for loading data aboave #data.columns = map(str.upper, data.columns)

# bug fix for display formats to avoid run time errors - put after code for loading data above pandas.set_option('display.float_format', lambda x:'%f'%x)

Output :

number of rows

213

number of column

16

Count for Breastcancerper100th 51.1 1 30 1 58.4 2 19.1 1 16.6 2 .. 29.8 2 51.8 1 13.6 1 6.6 1 29.5 1 Name: breastcancerper100th, Length: 137, dtype: int64

Precentages for Breastcancerper100th 51.1 0.004695 30 0.004695 58.4 0.009390 19.1 0.004695 16.6 0.009390

29.8 0.009390 51.8 0.004695 13.6 0.004695 6.6 0.004695 29.5 0.004695 Name: breastcancerper100th, Length: 137, dtype: float64

Count for femaleemployrate 39.4000015258789 1 65.6999969482422 1 36.5 1 49.4000015258789 2 82.1999969482422 1 .. 58.2000007629394 2 51.5999984741211 1 68.9000015258789 2 66.5 1 42.0999984741211 4 Name: femaleemployrate, Length: 154, dtype: int64

Precentages for femaleemployrate 39.4000015258789 0.004695 65.6999969482422 0.004695 36.5 0.004695 49.4000015258789 0.009390 82.1999969482422 0.004695

58.2000007629394 0.009390 51.5999984741211 0.004695 68.9000015258789 0.009390 66.5 0.004695 42.0999984741211 0.018779 Name: femaleemployrate, Length: 154, dtype: float64

Count for incomeperperson 7381.31275080681 1 239.518749365157 1 6334.10519399913 1 3233.42378012982 1 4885.04670142104 1 .. 2025.28266492268 1 11191.8110074347 1 1914.99655094922 1 5248.58232147098 1 772.933344800758 1 Name: incomeperperson, Length: 191, dtype: int64

Precentages for incomeperperson 7381.31275080681 0.004695 239.518749365157 0.004695 6334.10519399913 0.004695 3233.42378012982 0.004695 4885.04670142104 0.004695

2025.28266492268 0.004695 11191.8110074347 0.004695 1914.99655094922 0.004695 5248.58232147098 0.004695 772.933344800758 0.004695 Name: incomeperperson, Length: 191, dtype: float64

Description

On this Assignment : Running Your First Program, i used python for software setup and supporting materials. For the dataset, i used gapminder and took 3 variables, i.e. Breastcancerper100th, femaleemployrate, employrate. Which mean I want to know about association breast cancer risk among female worker, are those all depends on incomeperperson?. The data often appears maximal only 7 times, and the rest of it only once or twice.

Thank you.

#python coursera

0 notes

originallydonut-blog · 5 years ago

Text

Relation between female employee and breast cancer

I use Gapminder for the dataset.

I want to learn about assosiation between female employeerate and breast cancer. wether being a career woman is susceptible to breast cancer?

I decided to choose this topic because there’s a lot of statement that being a professional woman is susceptible to breast cancer. As you can see on this article (https://www.independent.co.uk/life-style/health-and-families/health-news/professional-women-more-susceptible-to-breast-cancer-8651359.html).

Going through the literature that I’ve been read before (https://pubmed.ncbi.nlm.nih.gov/21472744/). Apparently, breast cancer risk among female worker depends on what kind of occupation they do. Just because there’s a lot of female worker in the several country, doesn’t mean those country would have higher risk to breast cancer.

#coursera breastcancer femaleworker

1 note · View note