originallydonut-blog
originallydonut-blog
Inside Me
8 posts
Don't wanna be here? Send us removal request.
originallydonut-blog · 5 years ago
Text
Testing a Potential Moderator
Code
Tumblr media Tumblr media
Output
Tumblr media Tumblr media Tumblr media Tumblr media
Conclusion 
After Pearson correlation measuring the association between female employee rate and Life expectancy rate as well as its associated P value for each of our new data frames. I use the Pearson R function from the scipy.stats library and include the variables female employee rate and life expectancy rates. When we examine the correlation coefficients between female employee rate and Life expectancy rate for each of the income groups, i find the following. For the low income group the correlation between female employee rate and life expectancy is -0.1957 and the P value is 0.917. For the middle income countries, the association between female employee rate and life expectancy is -0.219 and the P-Value is 0.051. For the high income countries, the association between female employee rate and life expectancy is 0.332 and the P-Value is 0.0359.  Suggesting that the association between female employee rate and life expectancy is significant for high income countries.
0 notes
originallydonut-blog · 5 years ago
Text
Generating a Correlation Coefficient
Code
Tumblr media
Output
Tumblr media Tumblr media
Conclusion 
For the association between incomeperperson and femaleemployeerate, the correlation coefficient is approximately -0.2876 . And also has p-value 0.0002.  the direction of the relationship is positif. So it can be interpreted that an increase in one of the incomeperperson variables is associated with an increase in the femaleemployeerate. But, the value of correlation coefficient is under 0, thats mean the relation between associated variable not strong enough.
For the association between lifeexpectancy and femaleemployeerate, the correlation coefficient is approximately 0.028 . And also has p-value 0.716.  the direction of the relationship is negatif. So it can be interpreted that an decrease in one of the lifeexpectancy variables is associated with an decrease in the femaleemployeerate. And, the value of correlation coefficient is above the 0, thats mean the relation between associated variable are strong enough.
0 notes
originallydonut-blog · 5 years ago
Text
Running a Chi-Square Test of Independence
Code
Tumblr media Tumblr media Tumblr media
Output
Tumblr media Tumblr media Tumblr media
Conclusion 
When examining the association between Taught in school about smoking (categorical response) and Tried cigarrete smoking (categorical explanatory), a chi-square test of independence revealed that the p value is 0.006, thats mean, those variable are being associated. Because they’re below 0.05.
0 notes
originallydonut-blog · 5 years ago
Text
Running an analysis of variance
Code
Tumblr media
Output 
Tumblr media Tumblr media
Conclution
When examining the association between age (quantitative response) and Relation with Parents (categorical explanatory), an Analysis of Variance (ANOVA) revealed that people within range aged 13-20 that have a good relation with their parents is around 16 years, because the mean results are say so. And the P Value results is 2.77e-37, which mean that result are less than 0.05. So, the relation between age and relation with parents are associated. 
0 notes
originallydonut-blog · 5 years ago
Text
Creating graphs for your data
Code
Tumblr media
Output 
1. Spread and Center
Tumblr media
2. Output for Univariate 
Tumblr media
3. Output for Bivariate
Tumblr media
Conclusion :
In my opinion, there might be no relation between female employee rate and breast cancer per 100th. The Scotterplot bar explain that the highest rate of Female employee doesn’t mean the number of breast cancer is high too. Maybe it’ll depend on what kind of job that they’re working on.
0 notes
originallydonut-blog · 5 years ago
Text
Making Data Management Decisions
i use add_health for the data set
Tumblr media Tumblr media Tumblr media
output :
Tumblr media Tumblr media Tumblr media
i used the dataset add_health and looking for association how close between one person with their friends especially for female. 
and the percentages result for each variable are :
variable 1 :  0,814828
variable 2 : 0,173457
variable 3 : 0,002798
variable 4 : 0,000699
variable 5 : 0,001923
variable 6 : 0,006295
0 notes
originallydonut-blog · 5 years ago
Text
Running My First Program
Code 
import pandas import numpy # any additional libraries would be imported here
data = pandas.read_csv('gapminder.csv', low_memory=False)
print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns)
#setting variables you will be working with to numeric #data['breastcancerper100th'] = data['breastcancerper100th'].convert_objects(convert_numeric=True) #data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True) #data['incomeperperson'] = data['incomeperperson'].convert_objects(convert_numeric=True)
print ('Count for Breastcancerper100th') c1 = data['breastcancerper100th'].value_counts(sort=False) print (c1)
print ('Precentages for Breastcancerper100th') p1 = data['breastcancerper100th'].value_counts(sort=False, normalize=True) print (p1)
print ('Count for femaleemployrate') c2 = data['femaleemployrate'].value_counts(sort=False) print(c2)
print ('Precentages for femaleemployrate') p2 = data['femaleemployrate'].value_counts(sort=False, normalize=True) print (p2)
print ('Count for incomeperperson') c3 = data['incomeperperson'].value_counts(sort=False) print(c3)
print ('Precentages for incomeperperson') p3 = data['incomeperperson'].value_counts(sort=False, normalize=True) print (p3)
# freqeuncy disributions using the 'bygroup' function ct1= data.groupby('breastcancerper100th').size() print (ct1)
pt1 = data.groupby('breastcancerper100th').size() * 100 / len(data) print (pt1)
ct2= data.groupby('femaleemployrate').size() print (ct2)
pt2 = data.groupby('femaleemployrate').size() * 100 / len(data) print (pt2)
ct3= data.groupby('incomeperperson').size() print (ct3)
pt3 = data.groupby('incomeperperson').size() * 100 / len(data) print (pt3)
#upper-case all DataFrame column names - place afer code for loading data aboave #data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid run time errors - put after code for loading data above pandas.set_option('display.float_format', lambda x:'%f'%x)
Output :
number of rows
213
number of column
16
Count for Breastcancerper100th 51.1    1 30      1 58.4    2 19.1    1 16.6    2       .. 29.8    2 51.8    1 13.6    1 6.6     1 29.5    1 Name: breastcancerper100th, Length: 137, dtype: int64
Precentages for Breastcancerper100th 51.1   0.004695 30     0.004695 58.4   0.009390 19.1   0.004695 16.6   0.009390
29.8   0.009390 51.8   0.004695 13.6   0.004695 6.6    0.004695 29.5   0.004695 Name: breastcancerper100th, Length: 137, dtype: float64
Count for femaleemployrate 39.4000015258789    1 65.6999969482422    1 36.5                1 49.4000015258789    2 82.1999969482422    1                   .. 58.2000007629394    2 51.5999984741211    1 68.9000015258789    2 66.5                1 42.0999984741211    4 Name: femaleemployrate, Length: 154, dtype: int64
Precentages for femaleemployrate 39.4000015258789   0.004695 65.6999969482422   0.004695 36.5               0.004695 49.4000015258789   0.009390 82.1999969482422   0.004695
58.2000007629394   0.009390 51.5999984741211   0.004695 68.9000015258789   0.009390 66.5               0.004695 42.0999984741211   0.018779 Name: femaleemployrate, Length: 154, dtype: float64
Count for incomeperperson 7381.31275080681    1 239.518749365157    1 6334.10519399913    1 3233.42378012982    1 4885.04670142104    1                   .. 2025.28266492268    1 11191.8110074347    1 1914.99655094922    1 5248.58232147098    1 772.933344800758    1 Name: incomeperperson, Length: 191, dtype: int64
Precentages for incomeperperson 7381.31275080681   0.004695 239.518749365157   0.004695 6334.10519399913   0.004695 3233.42378012982   0.004695 4885.04670142104   0.004695
2025.28266492268   0.004695 11191.8110074347   0.004695 1914.99655094922   0.004695 5248.58232147098   0.004695 772.933344800758   0.004695 Name: incomeperperson, Length: 191, dtype: float64
Description 
On this Assignment : Running Your First Program, i used python for software setup and supporting materials. For the dataset, i used gapminder and took 3 variables, i.e.  Breastcancerper100th,  femaleemployrate,  employrate. Which mean I want to know about association breast cancer risk among female worker, are those all depends on incomeperperson?.  The data often appears maximal only 7 times, and the rest of it only once or twice. 
Thank you.
0 notes
originallydonut-blog · 5 years ago
Text
Relation between female employee and breast cancer
I use Gapminder for the dataset.
I want to learn about assosiation between female employeerate and breast cancer.  wether being a career woman is susceptible to breast cancer?
I decided to choose this topic because there’s a lot of statement that being a professional woman is susceptible to breast cancer. As you can see on this article (https://www.independent.co.uk/life-style/health-and-families/health-news/professional-women-more-susceptible-to-breast-cancer-8651359.html). 
Going through the literature that I’ve been read before (https://pubmed.ncbi.nlm.nih.gov/21472744/). Apparently, breast cancer risk among female worker depends on what kind of occupation they do. Just because there’s a lot of female worker in the several country, doesn’t mean those country would have higher risk to breast cancer. 
1 note · View note