codingwiz-blog1
codingwiz-blog1
Untitled
4 posts
Don't wanna be here? Send us removal request.
codingwiz-blog1 · 5 years ago
Text
Assignment - 4
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
0 notes
codingwiz-blog1 · 5 years ago
Text
Assignment-3
import pandas as pd import numpy as np
In [2]:
data = pd.read_csv("nesarc_pds.csv", low_memory=False)
In [3]:
data.columns=map(str.upper,data.columns)
In [4]:
data.head()
Out[4]:UNNAMED: 0ETHRACE2AETOTLCA2IDNUMPSUSTRATUMWEIGHTCDAYCMONCYEAR…SOLP12ABDEPHAL12ABDEPHALP12ABDEPMAR12ABDEPMARP12ABDEPHER12ABDEPHERP12ABDEPOTHB12ABDEPOTHBP12ABDEPNDSYMPTOMS
005140074033928.6135051482001…000000000NaN
1150.0014260456043638.6918451212002…000000000NaN
22531204212185779.03202523112001…000000000NaN
33541709917041071.754303992001…000000000NaN
44251709917044986.95237718102001…000000000NaN
5 rows × 3010 columns
Variables i’ll taking in this codebook:
CONSUMER = Drinking Status
S1Q2C2 = Raised by relatives before 18 age
SMOKER = Tobacco use status.
S3AQ52 = Age started smoking cigars everyday.
S2AQ19 = Age at start of period of Heaviest drinking.
NOTE :
Since, I’ve not used Spyder IDE therefore codes syntax have slight changes as compared to Video Lectures. Hope, you’ll understand each and every code, I’ve created comments for your reference wherever needed.
In [5]:
sub= data[['CONSUMER','S1Q2C2', 'SMOKER', 'S3AQ52', 'S2AQ19']]
In [6]:
sub1 = sub.copy()
In [7]:
sub1.head()
Out[7]:CONSUMERS1Q2C2SMOKERS3AQ52S2AQ19
033
11321
233
32316
42318
CONSUMER :
            1. Current drinker             2. Ex-drinker             3. Lifetime Abstainer
Since this is counter intuitive we can change this to :
            0) Lifetime Abstrainer(One who never did drinking)             1) Ex-Drinker             2) Current Drinker
In [8]:
print("Before labels (CONSUMER) : ") print(sorted(sub1['CONSUMER'].unique()))
Before labels (CONSUMER) : [1, 2, 3]
In [9]:
def recode1(val):    if val==1:        return 2    if val==2:        return 1    if val==3:        return 0
In [10]:
sub1['CONSUMER_NEWL'] = sub1['CONSUMER'].apply(lambda x : recode1(x))
In [11]:
print("After labels (CONSUMER_NEWL) : ") print(sorted(sub1['CONSUMER_NEWL'].unique()))
After labels (CONSUMER_NEWL) : [0, 1, 2]
In [12]:
sub1.head()
Out[12]:CONSUMERS1Q2C2SMOKERS3AQ52S2AQ19CONSUMER_NEWL
0330
113212
2330
323161
423181
SMOKER :
            1. Current user             2. Ex-user             3. Lifetime nonsmoker
Since, this is also counter intuitive we can change this to :
            0) Lifetime nonsmoker             1) Ex-user             2) Current user
In [13]:
print("Before labels (SMOKER) : ") print(sorted(sub1['SMOKER'].unique()))
Before labels (SMOKER) : [1, 2, 3]
In [14]:
#using above 'recode1' function here too. sub1['SMOKER_NEWL'] = sub1['SMOKER'].apply(lambda x : recode1(x))
In [15]:
sub1.head()
Out[15]:CONSUMERS1Q2C2SMOKERS3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300
1132120
23300
3231610
4231810
In [16]:
print("After labels (SMOKER_NEWL) : ") print(sorted(sub1['SMOKER_NEWL'].unique()))
After labels (SMOKER_NEWL) : [0, 1, 2]
In [17]:
columnsTitles = ['CONSUMER','SMOKER', 'S1Q2C2', 'S3AQ52', 'S2AQ19', 'CONSUMER_NEWL','SMOKER_NEWL'] sub1 = sub1.reindex(columns=columnsTitles)
In [18]:
sub1.head()
Out[18]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300
1132120
23300
3231610
4231810
In [19]:
sub1['CONSUMER_NEWL'].value_counts(sort=False)
Out[19]:
0     8266 1     7881 2    26946 Name: CONSUMER_NEWL, dtype: int64
In [20]:
sub1['SMOKER_NEWL'].value_counts(sort=False)
Out[20]:
0    23901 1     8074 2    11118 Name: SMOKER_NEWL, dtype: int64
Managing variable - S3AQ52 (AGE STARTED SMOKING CIGARS EVERY DAY)
In [21]:
sub1['S3AQ52'].unique()
Out[21]:
array([' ', '21', '16', '20', '30', '40', '17', '25', '15', '35', '38',       '37', '26', '53', '24', '54', '18', '28', '55', '45', '32', '22',       '48', '39', '50', '34', '99', '36', '12', '60', '42', '51', '23',       '64', '47', '29', '19', '9', '70', '41', '52', '33', '46', '31',       '59', '8', '10', '44', '43', '65', '57', '69', '58', '27', '66',       '14', '84', '5', '11', '13', '49', '62', '63', '80', '56'],      dtype=object)
In [22]:
sub1[sub1['S3AQ52']==" "]
Out[22]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300
1132120
23300
3231610
4231810
……………………
430883300
43089131820
43090111722
430911122422
43092231710
42374 rows × 7 columns
In [23]:
#Converting blank values or People who never smoked to 0 sub1.loc[sub1['S3AQ52']==" ", 'S3AQ52'] = 0
In [24]:
#Converting String values of Dataframe to Numeric sub1['S3AQ52']= pd.to_numeric(sub1['S3AQ52'])
In [25]:
sub1['S3AQ52'].unique()
Out[25]:
array([ 0, 21, 16, 20, 30, 40, 17, 25, 15, 35, 38, 37, 26, 53, 24, 54, 18,       28, 55, 45, 32, 22, 48, 39, 50, 34, 99, 36, 12, 60, 42, 51, 23, 64,       47, 29, 19,  9, 70, 41, 52, 33, 46, 31, 59,  8, 10, 44, 43, 65, 57,       69, 58, 27, 66, 14, 84,  5, 11, 13, 49, 62, 63, 80, 56],      dtype=int64)
In [26]:
#Converting '99' (People who not answered this question in survey) to NaN. sub1.loc[sub1['S3AQ52']==99, 'S3AQ52'] = np.nan
In [27]:
sub1['S3AQ52'].unique()
Out[27]:
array([ 0., 21., 16., 20., 30., 40., 17., 25., 15., 35., 38., 37., 26.,       53., 24., 54., 18., 28., 55., 45., 32., 22., 48., 39., 50., 34.,       nan, 36., 12., 60., 42., 51., 23., 64., 47., 29., 19.,  9., 70.,       41., 52., 33., 46., 31., 59.,  8., 10., 44., 43., 65., 57., 69.,       58., 27., 66., 14., 84.,  5., 11., 13., 49., 62., 63., 80., 56.])
In [28]:
sub1.head()
Out[28]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
0330.000
1130.02120
2330.000
3230.01610
4230.01810
Now, Column S3AQ52 is managed and prepared.
Managing variable - S2AQ19 (AGE AT START OF PERIOD OF HEAVIEST DRINKING)
In [29]:
sub1['S2AQ19'].unique()
Out[29]:
array([' ', '21', '16', '18', '30', '17', '28', '43', '26', '23', '20',       '51', '19', '40', '35', '27', '42', '22', '15', '36', '25', '24',       '68', '99', '29', '52', '31', '33', '57', '38', '39', '32', '90',       '49', '50', '37', '34', '59', '63', '58', '55', '53', '79', '56',       '77', '41', '64', '8', '73', '6', '70', '13', '72', '44', '47',       '54', '14', '46', '48', '61', '65', '10', '76', '69', '5', '45',       '71', '60', '67', '12', '62', '74', '86', '66', '81', '82', '9',       '75', '83', '80', '78', '7', '87', '11', '85', '84', '91', '88'],      dtype=object)
In [30]:
#Pepole who are lifetime abstainer sub1[sub1['S2AQ19']==" "]
Out[30]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
0330.000
2330.000
22310.002
23330.000
26330.000
……………………
43070310.002
43071330.000
43072330.000
43082330.000
43088330.000
8266 rows × 7 columns
In [31]:
#Converting blank values or People who are lifetime abstainer to 0 sub1.loc[sub1['S2AQ19']==" ", 'S2AQ19'] = 0
In [32]:
#Converting String values of Dataframe to Numeric sub1['S2AQ19']= pd.to_numeric(sub1['S2AQ19'])
In [33]:
sub1['S3AQ52'].unique()
Out[33]:
array([ 0., 21., 16., 20., 30., 40., 17., 25., 15., 35., 38., 37., 26.,       53., 24., 54., 18., 28., 55., 45., 32., 22., 48., 39., 50., 34.,       nan, 36., 12., 60., 42., 51., 23., 64., 47., 29., 19.,  9., 70.,       41., 52., 33., 46., 31., 59.,  8., 10., 44., 43., 65., 57., 69.,       58., 27., 66., 14., 84.,  5., 11., 13., 49., 62., 63., 80., 56.])
In [34]:
sub1.head()
Out[34]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
0330.0000
1130.02120
2330.0000
3230.01610
4230.01810
Now, Column S2AQ19 is also managed and prepared.
Managing variable - S1Q2C2 (RAISED BY RELATIVES BEFORE AGE 18)
In [35]:
sub1['S1Q2C2'].unique()
Out[35]:
array([' ', '1', '2', '9'], dtype=object)
RAISED BY ADOPTIVE PARENTS BEFORE AGE 18 1. Yes 2. No 9. Unknown BL. NA, lived with biological parent(s) before age 18 We can change this to only 3 categories since we have to deal with only those people are were raised by relative: 1. ->1. Yes BL.NA & 2 ->0. No + NA(Lived with biological parent(s) before age of 18) 9. ->NaN. Those who didn’t answered this question in survey.In [36]:
sub1['S1Q2C2'].value_counts(dropna=False)
Out[36]:
    41679 1      649 2      553 9      212 Name: S1Q2C2, dtype: int64
In [47]:
#Converting blank values or People who are raised by parent(s) to 0. sub1.loc[sub1['S1Q2C2']==" ", 'S1Q2C2'] = 0
In [38]:
sub1['S1Q2C2'].value_counts(dropna=False)
Out[38]:
0    41679 1      649 2      553 9      212 Name: S1Q2C2, dtype: int64
In [39]:
#Converting value = 2 to 0. sub1.loc[sub1['S1Q2C2']=="2", 'S1Q2C2'] = 0
In [40]:
#Converting value = 9  to NaN. sub1.loc[sub1['S1Q2C2']=="9", 'S1Q2C2'] = np.nan
In [43]:
sub1['S1Q2C2'].unique()
Out[43]:
array([0, 1, nan], dtype=object)
In [45]:
sub1['S1Q2C2'].value_counts(dropna=False)
Out[45]:
0.0    42232 1.0      649 NaN      212 Name: S1Q2C2, dtype: int64
In [46]:
sub1.head()
Out[46]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300.0000
11300.02120
23300.0000
32300.01610
42300.01810
Now, Column S1Q2C2 is also managed and prepared.
And we can now remove cols CONSUMER and SMOKER for further analysis.
In [ ]:
0 notes
codingwiz-blog1 · 5 years ago
Text
Assignment-2
import pandas as pd import numpy as np
In [2]:
data = pd.read_csv("nesarc_pds.csv", low_memory=False)
In [3]:
data.columns=map(str.upper,data.columns)
In [4]:
data.head()
Out[4]:UNNAMED: 0ETHRACE2AETOTLCA2IDNUMPSUSTRATUMWEIGHTCDAYCMONCYEAR…SOLP12ABDEPHAL12ABDEPHALP12ABDEPMAR12ABDEPMARP12ABDEPHER12ABDEPHERP12ABDEPOTHB12ABDEPOTHBP12ABDEPNDSYMPTOMS
005140074033928.6135051482001…000000000NaN
1150.0014260456043638.6918451212002…000000000NaN
22531204212185779.03202523112001…000000000NaN
33541709917041071.754303992001…000000000NaN
44251709917044986.95237718102001…000000000NaN
5 rows × 3010 columns
Variables i’ll taking in this codebook:
CONSUMER = Drinking Status
S1Q2C2 = Raised by relatives before 18 age
SMOKER = Tobacco use status.
S3AQ52 = Age started smoking cigars everyday.
S2AQ19 = Age at start of period of Heaviest drinking.
NOTE :
Since, I’ve not used Spyder IDE therefore codes syntax have slight changes as compared to Video Lectures. Hope, you’ll understand each and every code, I’ve created comments for your reference wherever needed.
In [5]:
sub= data[['CONSUMER','S1Q2C2', 'SMOKER', 'S3AQ52', 'S2AQ19']]
In [6]:
sub1 = sub.copy()
In [7]:
sub1.head()
Out[7]:CONSUMERS1Q2C2SMOKERS3AQ52S2AQ19
033
11321
233
32316
42318
CONSUMER :
            1. Current drinker             2. Ex-drinker             3. Lifetime Abstainer
Since this is counter intuitive we can change this to :
            0) Lifetime Abstrainer(One who never did drinking)             1) Ex-Drinker             2) Current Drinker
In [8]:
print("Before labels (CONSUMER) : ") print(sorted(sub1['CONSUMER'].unique()))
Before labels (CONSUMER) : [1, 2, 3]
In [9]:
def recode1(val):    if val==1:        return 2    if val==2:        return 1    if val==3:        return 0
In [10]:
sub1['CONSUMER_NEWL'] = sub1['CONSUMER'].apply(lambda x : recode1(x))
In [11]:
print("After labels (CONSUMER_NEWL) : ") print(sorted(sub1['CONSUMER_NEWL'].unique()))
After labels (CONSUMER_NEWL) : [0, 1, 2]
In [12]:
sub1.head()
Out[12]:CONSUMERS1Q2C2SMOKERS3AQ52S2AQ19CONSUMER_NEWL
0330
113212
2330
323161
423181
SMOKER :
            1. Current user             2. Ex-user             3. Lifetime nonsmoker
Since, this is also counter intuitive we can change this to :
            0) Lifetime nonsmoker             1) Ex-user             2) Current user
In [13]:
print("Before labels (SMOKER) : ") print(sorted(sub1['SMOKER'].unique()))
Before labels (SMOKER) : [1, 2, 3]
In [14]:
#using above 'recode1' function here too. sub1['SMOKER_NEWL'] = sub1['SMOKER'].apply(lambda x : recode1(x))
In [15]:
sub1.head()
Out[15]:CONSUMERS1Q2C2SMOKERS3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300
1132120
23300
3231610
4231810
In [16]:
print("After labels (SMOKER_NEWL) : ") print(sorted(sub1['SMOKER_NEWL'].unique()))
After labels (SMOKER_NEWL) : [0, 1, 2]
In [17]:
columnsTitles = ['CONSUMER','SMOKER', 'S1Q2C2', 'S3AQ52', 'S2AQ19', 'CONSUMER_NEWL','SMOKER_NEWL'] sub1 = sub1.reindex(columns=columnsTitles)
In [18]:
sub1.head()
Out[18]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300
1132120
23300
3231610
4231810
In [19]:
sub1['CONSUMER_NEWL'].value_counts(sort=False)
Out[19]:
0     8266 1     7881 2    26946 Name: CONSUMER_NEWL, dtype: int64
In [20]:
sub1['SMOKER_NEWL'].value_counts(sort=False)
Out[20]:
0    23901 1     8074 2    11118 Name: SMOKER_NEWL, dtype: int64
Managing variable - S3AQ52 (AGE STARTED SMOKING CIGARS EVERY DAY)
In [21]:
sub1['S3AQ52'].unique()
Out[21]:
array([' ', '21', '16', '20', '30', '40', '17', '25', '15', '35', '38',       '37', '26', '53', '24', '54', '18', '28', '55', '45', '32', '22',       '48', '39', '50', '34', '99', '36', '12', '60', '42', '51', '23',       '64', '47', '29', '19', '9', '70', '41', '52', '33', '46', '31',       '59', '8', '10', '44', '43', '65', '57', '69', '58', '27', '66',       '14', '84', '5', '11', '13', '49', '62', '63', '80', '56'],      dtype=object)
In [22]:
sub1[sub1['S3AQ52']==" "]
Out[22]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300
1132120
23300
3231610
4231810
……………………
430883300
43089131820
43090111722
430911122422
43092231710
42374 rows × 7 columns
In [23]:
#Converting blank values or People who never smoked to 0 sub1.loc[sub1['S3AQ52']==" ", 'S3AQ52'] = 0
In [24]:
#Converting String values of Dataframe to Numeric sub1['S3AQ52']= pd.to_numeric(sub1['S3AQ52'])
In [25]:
sub1['S3AQ52'].unique()
Out[25]:
array([ 0, 21, 16, 20, 30, 40, 17, 25, 15, 35, 38, 37, 26, 53, 24, 54, 18,       28, 55, 45, 32, 22, 48, 39, 50, 34, 99, 36, 12, 60, 42, 51, 23, 64,       47, 29, 19,  9, 70, 41, 52, 33, 46, 31, 59,  8, 10, 44, 43, 65, 57,       69, 58, 27, 66, 14, 84,  5, 11, 13, 49, 62, 63, 80, 56],      dtype=int64)
In [26]:
#Converting '99' (People who not answered this question in survey) to NaN. sub1.loc[sub1['S3AQ52']==99, 'S3AQ52'] = np.nan
In [27]:
sub1['S3AQ52'].unique()
Out[27]:
array([ 0., 21., 16., 20., 30., 40., 17., 25., 15., 35., 38., 37., 26.,       53., 24., 54., 18., 28., 55., 45., 32., 22., 48., 39., 50., 34.,       nan, 36., 12., 60., 42., 51., 23., 64., 47., 29., 19.,  9., 70.,       41., 52., 33., 46., 31., 59.,  8., 10., 44., 43., 65., 57., 69.,       58., 27., 66., 14., 84.,  5., 11., 13., 49., 62., 63., 80., 56.])
In [28]:
sub1.head()
Out[28]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
0330.000
1130.02120
2330.000
3230.01610
4230.01810
Now, Column S3AQ52 is managed and prepared.
Managing variable - S2AQ19 (AGE AT START OF PERIOD OF HEAVIEST DRINKING)
In [29]:
sub1['S2AQ19'].unique()
Out[29]:
array([' ', '21', '16', '18', '30', '17', '28', '43', '26', '23', '20',       '51', '19', '40', '35', '27', '42', '22', '15', '36', '25', '24',       '68', '99', '29', '52', '31', '33', '57', '38', '39', '32', '90',       '49', '50', '37', '34', '59', '63', '58', '55', '53', '79', '56',       '77', '41', '64', '8', '73', '6', '70', '13', '72', '44', '47',       '54', '14', '46', '48', '61', '65', '10', '76', '69', '5', '45',       '71', '60', '67', '12', '62', '74', '86', '66', '81', '82', '9',       '75', '83', '80', '78', '7', '87', '11', '85', '84', '91', '88'],      dtype=object)
In [30]:
#Pepole who are lifetime abstainer sub1[sub1['S2AQ19']==" "]
Out[30]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
0330.000
2330.000
22310.002
23330.000
26330.000
……………………
43070310.002
43071330.000
43072330.000
43082330.000
43088330.000
8266 rows × 7 columns
In [31]:
#Converting blank values or People who are lifetime abstainer to 0 sub1.loc[sub1['S2AQ19']==" ", 'S2AQ19'] = 0
In [32]:
#Converting String values of Dataframe to Numeric sub1['S2AQ19']= pd.to_numeric(sub1['S2AQ19'])
In [33]:
sub1['S3AQ52'].unique()
Out[33]:
array([ 0., 21., 16., 20., 30., 40., 17., 25., 15., 35., 38., 37., 26.,       53., 24., 54., 18., 28., 55., 45., 32., 22., 48., 39., 50., 34.,       nan, 36., 12., 60., 42., 51., 23., 64., 47., 29., 19.,  9., 70.,       41., 52., 33., 46., 31., 59.,  8., 10., 44., 43., 65., 57., 69.,       58., 27., 66., 14., 84.,  5., 11., 13., 49., 62., 63., 80., 56.])
In [34]:
sub1.head()
Out[34]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
0330.0000
1130.02120
2330.0000
3230.01610
4230.01810
Now, Column S2AQ19 is also managed and prepared.
Managing variable - S1Q2C2 (RAISED BY RELATIVES BEFORE AGE 18)
In [35]:
sub1['S1Q2C2'].unique()
Out[35]:
array([' ', '1', '2', '9'], dtype=object)
RAISED BY ADOPTIVE PARENTS BEFORE AGE 18 1. Yes 2. No 9. Unknown BL. NA, lived with biological parent(s) before age 18 We can change this to only 3 categories since we have to deal with only those people are were raised by relative: 1. ->1. Yes BL.NA & 2 ->0. No + NA(Lived with biological parent(s) before age of 18) 9. ->NaN. Those who didn’t answered this question in survey.In [36]:
sub1['S1Q2C2'].value_counts(dropna=False)
Out[36]:
    41679 1      649 2      553 9      212 Name: S1Q2C2, dtype: int64
In [47]:
#Converting blank values or People who are raised by parent(s) to 0. sub1.loc[sub1['S1Q2C2']==" ", 'S1Q2C2'] = 0
In [38]:
sub1['S1Q2C2'].value_counts(dropna=False)
Out[38]:
0    41679 1      649 2      553 9      212 Name: S1Q2C2, dtype: int64
In [39]:
#Converting value = 2 to 0. sub1.loc[sub1['S1Q2C2']=="2", 'S1Q2C2'] = 0
In [40]:
#Converting value = 9  to NaN. sub1.loc[sub1['S1Q2C2']=="9", 'S1Q2C2'] = np.nan
In [43]:
sub1['S1Q2C2'].unique()
Out[43]:
array([0, 1, nan], dtype=object)
In [45]:
sub1['S1Q2C2'].value_counts(dropna=False)
Out[45]:
0.0    42232 1.0      649 NaN      212 Name: S1Q2C2, dtype: int64
In [46]:
sub1.head()
Out[46]:CONSUMERSMOKERS1Q2C2S3AQ52S2AQ19CONSUMER_NEWLSMOKER_NEWL
03300.0000
11300.02120
23300.0000
32300.01610
42300.01810
Now, Column S1Q2C2 is also managed and prepared.
And we can now remove cols CONSUMER and SMOKER for further analysis.
In [ ]:
0 notes
codingwiz-blog1 · 5 years ago
Text
Gap minder dataset
Data set: GapMinder Data. Research question: Is a fertility rate associated with a number of breast cancer cases? Items included in the CodeBook: for fertility rate: Children per woman (total fertility) Children per woman (total fertility), with projections for breast cancer: Breast cancer, deaths per 100,000 women Breast cancer, new cases per 100,000 women Breast cancer, number of female deaths Breast cancer, number of new female cases Literature Review: From original source: http://ww5.komen.org/KomenPerspectives/Does-pregnancy-affect-breast-cancer-risk-and-survival-.html The more children a woman has given birth to, the lower her risk of breast cancer tends to be. Women who have never given birth have a slightly higher risk of breast cancer compared to women who have had more than one child. The hypothesis to explore using GapMinder data set: the higher fertility rate, the lower risk of breast cancer.
1 note · View note