lbesquire
lbesquire
Data Analysis and Visualization Course
4 posts
 
Don't wanna be here? Send us removal request.
lbesquire · 8 years ago
Text
Visualizing My Data
I am working with the AddHealth Data set trying to determine whether or not students with a parent at home after school achieve greater academic success compared to peers who do not have a parent at home after school.  My explanatory variable is the presence of a parent at home after school (H1RM12 or H1RF12 data sets =1 for parent at home or =5 for no parent at home).  My response variable is what grades the students received in Math, English, History, or Science (A, B, C, or D/F)
To generate univariate graphs, I wrote the program below to only display letter grades for students with a parent at home after school.
Program to Generate Univariate Data Graphs:
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sat Feb 18 11:18:16 2017
@author: lizmearls """ import pandas import numpy import seaborn as sns import matplotlib.pyplot as plt
data = pandas.read_csv("/users/lizmearls/Desktop/AddHealth Wave 1 Codebooks/addhealth_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x)
#setting variables you will be working with to numeric #data['H1RM12'] = pandas.to_numeric(data['H1RM12'])
#.convert_objects(convert_numeric=True)
data['H1RM12'] = pandas.to_numeric(data['H1RM12'])    #data['H1RM12'].convert_objects(convert_numeric=True) data['H1RF12'] = pandas.to_numeric(data['H1RF12'])    #data['H1RF12'].convert_objects(convert_numeric=True) data['H1ED11'] = pandas.to_numeric(data['H1ED11'])    #data['H1ED11'].convert_objects(convert_numeric=True) data['H1ED12'] = pandas.to_numeric(data['H1ED12'])    #data['H1ED12'].convert_objects(convert_numeric=True) data['H1ED13'] = pandas.to_numeric(data['H1ED13'])    #data['H1ED13'].convert_objects(convert_numeric=True) data['H1ED14'] = pandas.to_numeric(data['H1ED14'])    #data['H1ED14'].convert_objects(convert_numeric=True)
# Specify Grades in Math for Students with a Parent at home sub1=data[((data['H1RM12']==1) & (data['H1RF12']==1)) & ((data['H1ED11']==1) | (data['H1ED11']==2) | (data['H1ED11']==3) | (data['H1ED11']==4))]
#make a copy of new subsetted data sub2 = sub1.copy()
iStudentsReceivingA=sub2["H1ED11"].value_counts(sort=True)[1] iStudentsReceivingB=sub2["H1ED11"].value_counts(sort=True)[2] iStudentsReceivingC=sub2["H1ED11"].value_counts(sort=True)[3] iStudentsReceivingDF=sub2["H1ED11"].value_counts(sort=True)[4]
paPercentages=pandas.DataFrame(index=range(0,1),columns=['A','B','C','DF'], dtype='float')
paPercentages['A']=((iStudentsReceivingA/(len(sub2))*100)) paPercentages['B']=((iStudentsReceivingB/(len(sub2))*100)) paPercentages['C']=((iStudentsReceivingC/(len(sub2))*100)) paPercentages['DF']=((iStudentsReceivingDF/(len(sub2))*100))
print(paPercentages)
sns.barplot(data=paPercentages)
plt.xlabel('Letter Grades') plt.title('Math Grades for Students with a Parent at Home') plt.ylabel('Percent')
Univariate Output:
Tumblr media
To generate graphs for other subjects (English, Social Studies, and Science), I simply replaced H1ED11 (Data set for Math) with H1ED12 (English), H1ED13 (Social Studies), or H1ED14 (Science) in the code and re-ran the program.  The outputs are below:
Tumblr media Tumblr media Tumblr media
I was also able to generate univariate graphs of letter grades for students who did not have a parent at home by subsetting the data so that [’H1RM12′]==5  and [’H1RF12′}==5, rather than setting them ==1.  I am not going to display these graphs though to save space (and because all this data will appear in the bivariate graphs below).
Program for Bivariate Graphs:
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sat Feb 18 23:49:13 2017
@author: lizmearls """
import pandas import numpy import seaborn as sns import matplotlib.pyplot as plt
data = pandas.read_csv("/users/lizmearls/Desktop/AddHealth Wave 1 Codebooks/addhealth_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x)
# Specify Grades in Math for Students with a Parent at home sub1=data[((data['H1RM12']==1) & (data['H1RF12']==1)) & ((data['H1ED11']==1) | (data['H1ED11']==2) | (data['H1ED11']==3) | (data['H1ED11']==4))] sub3=data[((data['H1RM12']==5) & (data['H1RF12']==5)) & ((data['H1ED11']==1) | (data['H1ED11']==2) | (data['H1ED11']==3) | (data['H1ED11']==4))] #make a copy of new subsetted data sub2 = sub1.copy() sub4 = sub3.copy()
pStudentsReceivingA=sub2["H1ED11"].value_counts(sort=True)[1] pStudentsReceivingB=sub2["H1ED11"].value_counts(sort=True)[2] pStudentsReceivingC=sub2["H1ED11"].value_counts(sort=True)[3] pStudentsReceivingDF=sub2["H1ED11"].value_counts(sort=True)[4]
iStudentsReceivingA=sub4["H1ED11"].value_counts(sort=True)[1] iStudentsReceivingB=sub4["H1ED11"].value_counts(sort=True)[2] iStudentsReceivingC=sub4["H1ED11"].value_counts(sort=True)[3] iStudentsReceivingDF=sub4["H1ED11"].value_counts(sort=True)[4]
paPercentages=pandas.DataFrame(index=range(0,1),columns=['(A)+P','(A)-P', '(B)+P','(B)-P','(C)+P','(C)-P','(DF)+P', '(DF)-P'], dtype='float')
paPercentages['(A)+P']=((pStudentsReceivingA/len(sub2)*100)) paPercentages['(B)+P']=((pStudentsReceivingB/len(sub2)*100)) paPercentages['(C)+P']=((pStudentsReceivingC/len(sub2)*100)) paPercentages['(DF)+P']=((pStudentsReceivingDF/len(sub2)*100)) paPercentages['(A)-P']=((iStudentsReceivingA/len(sub4)*100)) paPercentages['(B)-P']=((iStudentsReceivingB/len(sub4)*100)) paPercentages['(C)-P']=((iStudentsReceivingC/len(sub4)*100)) paPercentages['(DF)-P']=((iStudentsReceivingDF/len(sub4)*100))
print(paPercentages)
sns.barplot(data=paPercentages)
plt.xlabel('Letter Grades (+P = with parent, -P = without Parent') plt.ylabel('Percent') plt.title('Comparsion of Math Letter Grades for Students with a Parent at Home vs. No Parent at Home After School')
Bivariate Output:
Tumblr media
In the above graph, I can see the relationship between the explanatory variable (Parent at home after school or not) and the response variable (Letter grade).  +P= parent at home and -P = no parent at home.  When comparing letter grades for Math, I see that the graph screws left and has a unimodal distribution.
I was able to run other data sets looking at English, Social studies, and Science by changing the data set from H1ED11 to H1ED12, H1ED13, or H1ED14, and then re-running the program.  Below are the respective outputs:
Tumblr media Tumblr media Tumblr media
My graphs all reveal a similar result, which is that students who do not have a parent at home after school appear to earn higher grades than peers who do have a parent at home after school.  This trend mostly holds across all subject areas, with the exception of Math, which deviates slightly from this pattern.
These results are the opposite of what I expected to find, but I think this is really interesting.  I guess parents actually just bother their children and don’t help them with school work!
I enjoyed this project, though I did find it very hard to finalize and I wish I knew how to adjust graph colors. I also did not have much luck with the tutorial/example codes and had to research other codes like sns.barplot to get this code to work (seaborn would not work otherwise).
Thanks for reading!
0 notes
lbesquire · 8 years ago
Text
Missing Data
I am working with the AddHealth Dataset, exploring whether or not students who have a parent at home after school achieve greater academic success than peers that do not have a parent at home after school.  There were several responses that could be characterized as missing data. First, I removed missing data from my dataset by using numpy.nan. Then I removed this data from the dataset by using the dropna() function, and recalculated the percentages based on the total number of responses that were not “nan” for each subject area.
Here is my program (sorry it’s long):
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sun Feb 12 13:56:21 2017
@author: lizmearls """
import pandas
import numpy
data = pandas.read_csv("/users/lizmearls/Desktop/AddHealth Wave 1 Codebooks/addhealth_pds.csv", low_memory=False)
# bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x)
# Specify how many students have either Mom or Dad at home always parentathome=data[(data["H1RM12"]==1) | (data["H1RF12"]==1)] print("total" + "=" + str(len(parentathome)))
parentathome.is_copy = False
#Assign a list for clean (dataset without Nan) clean=[]
#Recode missing values from Math Grades to python missing (NaN) parentathome['H1ED11']=parentathome['H1ED11'].replace(5, numpy.nan) parentathome['H1ED11']=parentathome['H1ED11'].replace(6, numpy.nan) parentathome['H1ED11']=parentathome['H1ED11'].replace(96, numpy.nan) parentathome['H1ED11']=parentathome['H1ED11'].replace(97, numpy.nan) parentathome['H1ED11']=parentathome['H1ED11'].replace(98, numpy.nan) #create new dataset without NaN for Math clean11=parentathome['H1ED11'].dropna()
#Recode missing values from English Grades to python missing (NaN) parentathome['H1ED12']=parentathome['H1ED12'].replace(5, numpy.nan) parentathome['H1ED12']=parentathome['H1ED12'].replace(6, numpy.nan) parentathome['H1ED12']=parentathome['H1ED12'].replace(96, numpy.nan) parentathome['H1ED12']=parentathome['H1ED12'].replace(97, numpy.nan) parentathome['H1ED12']=parentathome['H1ED12'].replace(98, numpy.nan) #create new dataset without NaN for English clean12=parentathome['H1ED12'].dropna()
#Recode missing values from Social Studies Grades to python missing (NaN) parentathome['H1ED13']=parentathome['H1ED13'].replace(5, numpy.nan) parentathome['H1ED13']=parentathome['H1ED13'].replace(6, numpy.nan) parentathome['H1ED13']=parentathome['H1ED13'].replace(96, numpy.nan) parentathome['H1ED13']=parentathome['H1ED13'].replace(97, numpy.nan) parentathome['H1ED13']=parentathome['H1ED13'].replace(98, numpy.nan) parentathome['H1ED13']=parentathome['H1ED13'].dropna() #create new dataset without NaN for SS clean13=parentathome['H1ED13'].dropna()
#Recode missing values from Science Grades to python missing (NaN) parentathome['H1ED14']=parentathome['H1ED14'].replace(5, numpy.nan) parentathome['H1ED14']=parentathome['H1ED14'].replace(6, numpy.nan) parentathome['H1ED14']=parentathome['H1ED14'].replace(96, numpy.nan) parentathome['H1ED14']=parentathome['H1ED14'].replace(97, numpy.nan) parentathome['H1ED14']=parentathome['H1ED14'].replace(98, numpy.nan) parentathome['H1ED14']=parentathome['H1ED14'].dropna() #create new dataset without NaN for Science clean14=parentathome['H1ED14'].dropna()
print ('Combined Math Grades for students with parent at home')
iStudentsReceivingAB=clean11.value_counts(sort=True)[1] \                                 + clean11.value_counts(sort=True) [2]
#print ("Percentage of AB grades") iPercentABs=(iStudentsReceivingAB/len(clean11))
iStudentsReceivingCDF=clean11.value_counts(sort=True)[3] \                                  + clean11.value_counts(sort=True) [4]                                 #print ("Percentage of CDF grades") iPercentCDF=(iStudentsReceivingCDF/len(clean11))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
print ('Combined English Grades for students with parent at home')
iStudentsReceivingAB=clean12.value_counts(sort=True)[1] \                                 + clean12.value_counts(sort=True) [2]
#print ("Percentage of A grades") iPercentABs=(iStudentsReceivingAB/len(clean12))
iStudentsReceivingCDF=clean12.value_counts(sort=True)[3] \                                  + clean12.value_counts(sort=True) [4]                                 #print ("Percentage of A grades") iPercentCDF=(iStudentsReceivingCDF/len(clean12))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
print ('Combined Social Studies Grades for students with parent at home')
iStudentsReceivingAB=clean13.value_counts(sort=True)[1] \                                 + clean13.value_counts(sort=True) [2]
#print ("Percentage of AB grades") iPercentABs=(iStudentsReceivingAB/len(clean13))
iStudentsReceivingCDF=clean13.value_counts(sort=True)[3] \                                  + clean13.value_counts(sort=True) [4]                                 #print ("Percentage of CDF grades") iPercentCDF=(iStudentsReceivingCDF/len(clean13))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
print ('Combined Science Grades for students with parent at home')
iStudentsReceivingAB=clean14.value_counts(sort=True)[1] \                                 + clean14.value_counts(sort=True) [2]
#print ("Percentage of AB grades") iPercentABs=(iStudentsReceivingAB/len(clean14))
iStudentsReceivingCDF=clean14.value_counts(sort=True)[3] \                                  + clean14.value_counts(sort=True) [4]                                 #print ("Percentage of CDF grades") iPercentCDF=(iStudentsReceivingCDF/len(clean14))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
# specifying number of students with absent parents parentnothome=data[(data["H1RM12"]==5) | (data["H1RF12"]==5)]
parentnothome.is_copy = False
#Recode missing values from Math Grades to python missing (NaN) parentnothome['H1ED11']=parentnothome['H1ED11'].replace(5, numpy.nan) parentnothome['H1ED11']=parentnothome['H1ED11'].replace(6, numpy.nan) parentnothome['H1ED11']=parentnothome['H1ED11'].replace(96, numpy.nan) parentnothome['H1ED11']=parentnothome['H1ED11'].replace(97, numpy.nan) parentnothome['H1ED11']=parentnothome['H1ED11'].replace(98, numpy.nan) #create new dataset without NaN for Math clean11NP=parentnothome['H1ED11'].dropna()
#Recode missing values from English Grades to python missing (NaN) parentnothome['H1ED12']=parentnothome['H1ED12'].replace(5, numpy.nan) parentnothome['H1ED12']=parentnothome['H1ED12'].replace(6, numpy.nan) parentnothome['H1ED12']=parentnothome['H1ED12'].replace(96, numpy.nan) parentnothome['H1ED12']=parentnothome['H1ED12'].replace(97, numpy.nan) parentnothome['H1ED12']=parentnothome['H1ED12'].replace(98, numpy.nan) #create new dataset without NaN for English clean12NP=parentnothome['H1ED12'].dropna()
#Recode missing values from Social Studies Grades to python missing (NaN) parentnothome['H1ED13']=parentnothome['H1ED13'].replace(5, numpy.nan) parentnothome['H1ED13']=parentnothome['H1ED13'].replace(6, numpy.nan) parentnothome['H1ED13']=parentnothome['H1ED13'].replace(96, numpy.nan) parentnothome['H1ED13']=parentnothome['H1ED13'].replace(97, numpy.nan) parentnothome['H1ED13']=parentnothome['H1ED13'].replace(98, numpy.nan) #create new dataset without NaN for SS clean13NP=parentnothome['H1ED13'].dropna()
#Recode missing values from Science Grades to python missing (NaN) parentnothome['H1ED14']=parentnothome['H1ED14'].replace(5, numpy.nan) parentnothome['H1ED14']=parentnothome['H1ED14'].replace(6, numpy.nan) parentnothome['H1ED14']=parentnothome['H1ED14'].replace(96, numpy.nan) parentnothome['H1ED14']=parentnothome['H1ED14'].replace(97, numpy.nan) parentnothome['H1ED14']=parentnothome['H1ED14'].replace(98, numpy.nan) #create new dataset without NaN for Science clean14NP=parentnothome['H1ED14'].dropna()
print ('Combined Math Grades - No Parent')
iStudentsReceivingAB=clean11NP.value_counts(sort=True)[1] \                                 + clean11NP.value_counts(sort=True) [2]
#print ("Percentage of AB grades") iPercentABs=(iStudentsReceivingAB/len(clean11NP))
iStudentsReceivingCDF=clean11NP.value_counts(sort=True)[3] \                                  + clean11NP.value_counts(sort=True) [4]                                 #print ("Percentage of CDF grades") iPercentCDF=(iStudentsReceivingCDF/len(clean11NP))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
print ('Combined English Grades - No Parent')
iStudentsReceivingAB=clean12NP.value_counts(sort=True)[1] \                                 + clean12NP.value_counts(sort=True) [2]
#print ("Percentage of A grades") iPercentABs=(iStudentsReceivingAB/len(clean12NP))
iStudentsReceivingCDF=clean12NP.value_counts(sort=True)[3] \                                  + clean12NP.value_counts(sort=True) [4]                                 #print ("Percentage of A grades") iPercentCDF=(iStudentsReceivingCDF/len(clean12NP))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
print ('Combined Social Studies Grades - No Parent')
iStudentsReceivingAB=clean13NP.value_counts(sort=True)[1] \                                 + clean13NP.value_counts(sort=True) [2]
#print ("Percentage of AB grades") iPercentABs=(iStudentsReceivingAB/len(clean13NP))
iStudentsReceivingCDF=clean13NP.value_counts(sort=True)[3] \                                  + clean13NP.value_counts(sort=True) [4]                                 #print ("Percentage of CDF grades") iPercentCDF=(iStudentsReceivingCDF/len(clean13NP))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
print ('Combined Science Grades - No Parent')
iStudentsReceivingAB=clean14NP.value_counts(sort=True)[1] \                                 + clean14NP.value_counts(sort=True) [2]
#print ("Percentage of AB grades") iPercentABs=(iStudentsReceivingAB/len(clean14NP))
iStudentsReceivingCDF=clean14NP.value_counts(sort=True)[3] \                                  + clean14NP.value_counts(sort=True) [4]                                 #print ("Percentage of CDF grades") iPercentCDF=(iStudentsReceivingCDF/len(clean14))
print("[Grade  AB]" + " " + str(iStudentsReceivingAB) + " " + str(iPercentABs * 100)) print("[Grade  CDF]" + " " + str(iStudentsReceivingCDF) + " " + str(iPercentCDF * 100))
Results/output: runfile('/Users/lizmearls/Desktop/AddHealth Wave 1 Codebooks/Missing Test', wdir='/Users/lizmearls/Desktop/AddHealth Wave 1 Codebooks') total=1956 Combined Math Grades for students with parent at home [Grade  AB] 1189 64.7603485839 [Grade  CDF] 647 35.2396514161
Combined English Grades for students with parent at home [Grade  AB] 1006 56.6122678672 [Grade  CDF] 771 43.3877321328
Combined Social Studies Grades for students with parent at home [Grade  AB] 1093 64.6363098758 [Grade  CDF] 598 35.3636901242
Combined Science Grades for students with parent at home [Grade  AB] 1057 62.3598820059 [Grade  CDF] 638 37.6401179941
Combined Math Grades - No Parent [Grade  AB] 1440 68.1818181818 [Grade  CDF] 672 31.8181818182
Combined English Grades - No Parent [Grade  AB] 1243 61.6567460317 [Grade  CDF] 773 38.3432539683
Combined Social Studies Grades - No Parent [Grade  AB] 1322 69.4327731092 [Grade  CDF] 582 30.5672268908
Combined Science Grades - No Parent [Grade  AB] 1276 67.12256707 [Grade  CDF] 625 36.8731563422
Summary:
In my program, I wanted to compare student grades.  My variables were whether or not the student had a parent at home after school, and what grades they received in math, English, social studies, and science. There were several codes that indicated missing responses, so I removed them from the dataset first so that I could calculate grade percentages on the remaining data only.   I grouped grades A and B together, and grades C, D, and F together for simplicity.  The output lists the category first (i.e.Combined Math Grades for students with parent at home) followed by grade groupings (i.e. Grade AB), the number of students who received a grade in that grouping (i.e. 1189), and the percentage of students who received grades in that grouping (i.e. 64).
It appears that students without a parent at home after school perform better in each of the four subject areas!
0 notes
lbesquire · 8 years ago
Text
My First Program
Summary:
My research question was to determine grade distributions for students who always had a parent at home after school vs. students who never had a student at home.  My literature search suggested that students with a parent at home were likely to have greater academic success, but most of the research was focused on a younger student body than the AddHealth Data, so I wasn’t completely certain what I would find.
The following are the steps I followed in this process:
1) First, I defined a set of students who always had a parent at home (line 18)
2) Next, I determined how many of these students got an A, B, C, D or lower in Math, English, Social Studies, or Science.  I  used the term “Grade M” to represent any missing data in each of the sets.  I also calculated the percentage of students who got the representative grade based on the total number of students in the category (example See lines 23 and 26)
3) I repeated this process for an alternative set of students, those who never have a parent at home after school.
Below is my code:
import pandas import numpy data = pandas.read_csv("/users/lizmearls/Desktop/AddHealth Wave 1 Codebooks/addhealth_pds.csv", low_memory=False)
#print(len(data)) #number of observations rows #print(len(data.columns)) # number of variables columns # another option for displaying observations or rows in the data frame is
print("Letter Grades in Math - Number of Students with a Parent at Home") # Specify how many students have either Mom or Dad at home always parentathome=data[(data["H1RM12"]==1) | (data["H1RF12"]==1)]
print("total" + "=" + str(len(parentathome)))
# Determine number of students with parent at home who get an A in math iStudentsReceivingA=parentathome["H1ED12"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentathome))
# Determine number of students with parent at home who get a B in math iStudentsReceivingB=parentathome["H1ED12"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentathome))
# Determine number of students with parent at home who get a C in math iStudentsReceivingC=parentathome["H1ED12"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentathome))
# Determine number of students with parent at home who get less than a D in math iStudentsReceivingDorLess=parentathome["H1ED12"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentathome))
# Determine number of students with parent at home missing data iStudentsMissing=parentathome["H1ED12"].value_counts(sort=True)[5] \ + parentathome["H1ED12"].value_counts(sort=True)[6] \ + parentathome["H1ED12"].value_counts(sort=True)[97] \ + parentathome["H1ED12"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentathome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print("Letter Grades in English - Number of Students with a Parent at Home") # Specify how many students have either Mom or Dad at home always parentathome=data[(data["H1RM12"]==1) | (data["H1RF12"]==1)] print("total" + "=" + str(len(parentathome)))
# Determine number of students with parent at home who get an A in english iStudentsReceivingA=parentathome["H1ED11"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentathome))
# Determine number of students with parent at home who get a B in english iStudentsReceivingB=parentathome["H1ED11"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentathome))
# Determine number of students with parent at home who get a C in english iStudentsReceivingC=parentathome["H1ED11"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentathome))
# Determine number of students with parent at home who get less than a D in english iStudentsReceivingDorLess=parentathome["H1ED11"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentathome))
# Determine number of students with parent at home missing data iStudentsMissing=parentathome["H1ED11"].value_counts(sort=True)[5] \ + parentathome["H1ED11"].value_counts(sort=True)[6] \ + parentathome["H1ED11"].value_counts(sort=True)[97] \ + parentathome["H1ED11"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentathome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print("Letter Grades in Social Studies - Number of Students with a Parent at Home") # Specify how many students have either Mom or Dad at home always parentathome=data[(data["H1RM12"]==1) | (data["H1RF12"]==1)] print("total" + "=" + str(len(parentathome)))
# Determine number of students with parent at home who get an A in SS iStudentsReceivingA=parentathome["H1ED13"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentathome))
# Determine number of students with parent at home who get a B in SS iStudentsReceivingB=parentathome["H1ED13"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentathome))
# Determine number of students with parent at home who get a C in SS iStudentsReceivingC=parentathome["H1ED13"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentathome))
# Determine number of students with parent at home who get less than a D in SS iStudentsReceivingDorLess=parentathome["H1ED13"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentathome))
# Determine number of students with parent at home missing data iStudentsMissing=parentathome["H1ED13"].value_counts(sort=True)[5] \ + parentathome["H1ED13"].value_counts(sort=True)[6] \ + parentathome["H1ED13"].value_counts(sort=True)[97] \ + parentathome["H1ED13"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentathome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print("Letter Grades in Science - Number of Students with a Parent at Home") # Specify how many students have either Mom or Dad at home always parentathome=data[(data["H1RM12"]==1) | (data["H1RF12"]==1)] print("total" + "=" + str(len(parentathome)))
# Determine number of students with parent at home who get an A in science iStudentsReceivingA=parentathome["H1ED14"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentathome))
# Determine number of students with parent at home who get a B in science iStudentsReceivingB=parentathome["H1ED14"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentathome))
# Determine number of students with parent at home who get a C in science iStudentsReceivingC=parentathome["H1ED14"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentathome))
# Determine number of students with parent at home who get less than a D in science iStudentsReceivingDorLess=parentathome["H1ED14"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentathome))
# Determine number of students with parent at home missing data iStudentsMissing=parentathome["H1ED14"].value_counts(sort=True)[5] \ + parentathome["H1ED14"].value_counts(sort=True)[6] \ + parentathome["H1ED14"].value_counts(sort=True)[97] \ + parentathome["H1ED14"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentathome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print ("Letter Grades in Math - Number of Students Without a Parent at Home") # specifying number of students with absent parents parentnothome=data[(data["H1RM12"]==5) | (data["H1RF12"]==5)] print("total" + "=" + str(len(parentnothome)))
# Determine number of students without parent at home who get an A in math iStudentsReceivingA=parentnothome["H1ED12"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentnothome))
# Determine number of students with parent at home who get a B in math iStudentsReceivingB=parentnothome["H1ED12"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentnothome))
# Determine number of students with parent at home who get a C in math iStudentsReceivingC=parentnothome["H1ED12"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentnothome))
# Determine number of students with parent at home who get less than a D in math iStudentsReceivingDorLess=parentnothome["H1ED12"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentnothome))
# Determine number of students with parent at home missing data iStudentsMissing=parentnothome["H1ED12"].value_counts(sort=True)[5] \ + parentathome["H1ED12"].value_counts(sort=True)[6] \ + parentathome["H1ED12"].value_counts(sort=True)[97] \ + parentathome["H1ED12"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentnothome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print ("Letter Grades in English - Number of Students Without a Parent at Home") # specifying number of students with absent parents parentnothome=data[(data["H1RM12"]==5) | (data["H1RF12"]==5)] print("total" + "=" + str(len(parentnothome)))
# Determine number of students without parent at home who get an A in english iStudentsReceivingA=parentnothome["H1ED11"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentnothome))
# Determine number of students with parent at home who get a B in english iStudentsReceivingB=parentnothome["H1ED11"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentnothome))
# Determine number of students with parent at home who get a C in english iStudentsReceivingC=parentnothome["H1ED11"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentnothome))
# Determine number of students with parent at home who get less than a D in english iStudentsReceivingDorLess=parentnothome["H1ED11"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentnothome))
# Determine number of students with parent at home missing data iStudentsMissing=parentnothome["H1ED11"].value_counts(sort=True)[5] \ + parentathome["H1ED11"].value_counts(sort=True)[6] \ + parentathome["H1ED11"].value_counts(sort=True)[97] \ + parentathome["H1ED11"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentnothome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print ("Letter Grades in Social Studies - Number of Students Without a Parent at Home") # specifying number of students with absent parents parentnothome=data[(data["H1RM12"]==5) | (data["H1RF12"]==5)] print("total" + "=" + str(len(parentnothome)))
# Determine number of students without parent at home who get an A in SS iStudentsReceivingA=parentnothome["H1ED13"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentnothome))
# Determine number of students with parent at home who get a B in SS iStudentsReceivingB=parentnothome["H1ED13"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentnothome))
# Determine number of students with parent at home who get a C in SS iStudentsReceivingC=parentnothome["H1ED13"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentnothome))
# Determine number of students with parent at home who get less than a D in SS iStudentsReceivingDorLess=parentnothome["H1ED13"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentnothome))
# Determine number of students with parent at home missing data iStudentsMissing=parentnothome["H1ED13"].value_counts(sort=True)[5] \ + parentathome["H1ED13"].value_counts(sort=True)[6] \ + parentathome["H1ED13"].value_counts(sort=True)[97] \ + parentathome["H1ED13"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentnothome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
print ("Letter Grades in Science - Number of Students Without a Parent at Home") # specifying number of students with absent parents parentnothome=data[(data["H1RM12"]==5) | (data["H1RF12"]==5)] print("total" + "=" + str(len(parentnothome)))
# Determine number of students without parent at home who get an A in science iStudentsReceivingA=parentnothome["H1ED14"].value_counts(sort=True)[1]
#print ("Percentage of A grades") iPercentAs=(iStudentsReceivingA/len(parentnothome))
# Determine number of students with parent at home who get a B in science iStudentsReceivingB=parentnothome["H1ED14"].value_counts(sort=True)[2]
#print ("Percentage of B grades") iPercentBs=(iStudentsReceivingB/len(parentnothome))
# Determine number of students with parent at home who get a C in science iStudentsReceivingC=parentnothome["H1ED14"].value_counts(sort=True)[3]
#print ("Percentage of C grades") iPercentCs=(iStudentsReceivingC/len(parentnothome))
# Determine number of students with parent at home who get less than a D in science iStudentsReceivingDorLess=parentnothome["H1ED14"].value_counts(sort=True)[4]
#print ("Percentage of D or less grades") iPercentDorLess=(iStudentsReceivingDorLess/len(parentnothome))
# Determine number of students with parent at home missing data iStudentsMissing=parentnothome["H1ED14"].value_counts(sort=True)[5] \ + parentathome["H1ED14"].value_counts(sort=True)[6] \ + parentathome["H1ED14"].value_counts(sort=True)[97] \ + parentathome["H1ED14"].value_counts(sort=True)[98]        
#print ("Percentage of Missing Data") iPercentMissing=(iStudentsMissing/len(parentnothome))
print("[Grade  A]" + " " + str(iStudentsReceivingA) + " " + str(iPercentAs *100)) print("[Grade  B]" + " " + str(iStudentsReceivingB) + " " + str(iPercentBs *100)) print("[Grade  C]" + " " + str(iStudentsReceivingC) + " " + str(iPercentCs *100)) print("[Grade DF]" + " " + str(iStudentsReceivingDorLess) + " " + str(iPercentDorLess *100)) print("[Grade  M]" + " " +str(iStudentsMissing) + " " + str(iPercentMissing *100))
Here is the Output:
Results:
Letter Grades in Math - Number of Students with a Parent at Home
total=1956
[Grade  A] 457 23.36400818
[Grade  B] 549 28.0674846626
[Grade  C] 463 23.6707566462
[Grade DF] 308 15.7464212679
[Grade  M] 179 9.15132924335
Letter Grades in English - Number of Students with a Parent at Home
total=1956
[Grade  A] 465 23.773006135
[Grade  B] 724 37.0143149284
[Grade  C] 457 23.36400818
[Grade DF] 190 9.71370143149
[Grade  M] 120 6.13496932515
Letter Grades in Social Studies - Number of Students with a Parent at Home
total=1956
[Grade  A] 542 27.7096114519
[Grade  B] 551 28.1697341513
[Grade  C] 378 19.3251533742
[Grade DF] 220 11.2474437628
[Grade  M] 265 13.5480572597
Letter Grades in Science - Number of Students with a Parent at Home
total=1956
[Grade  A] 484 24.7443762781
[Grade  B] 573 29.2944785276
[Grade  C] 413 21.1145194274
[Grade DF] 225 11.5030674847
[Grade  M] 261 13.3435582822
Letter Grades in Math - Number of Students Without a Parent at Home
total=2213
[Grade  A] 561 25.3502033439
[Grade  B] 682 30.8178942612
[Grade  C] 463 20.9218255761
[Grade DF] 310 14.0081337551
[Grade  M] 207 9.35381834614
Letter Grades in English - Number of Students Without a Parent at Home
total=2213
[Grade  A] 632 28.5585178491
[Grade  B] 808 36.5115228197[
Grade  C] 452 20.4247627655
[Grade DF] 220 9.94125621329
[Grade  M] 113 5.10619069137
Letter Grades in Social Studies - Number of Students Without a Parent at Home
total=2213[
Grade  A] 659 29.7785811116
[Grade  B] 663 29.9593312246
[Grade  C] 377 17.0356981473
[Grade DF] 205 9.26344328965
[Grade  M] 324 14.6407591505
Letter Grades in Science - Number of Students Without a Parent at Home
total=2213
[Grade  A] 612 27.6547672842[
Grade  B] 664 30.0045187528
[Grade  C] 396 17.8942611839
[Grade DF] 229 10.3479439675
[Grade  M] 321 14.5051965657
Data Table:
Tumblr media
Explanation:
The variables used were the H1RM12 or H1RF12 data sets to determine whether a student had parent at home or not.  Following this, I used the data sets H1ED11, H1ED12, H1ED13, H1ED14 to assess the letter grades of the students in Math, English, Science, and Social Studies.  I then calculated percentages for each of the grades A, B, C, D or if the data was Missing because a student didn’t answer or was able to skip, M.  The three columns below designate the letter grade, the number of students who received that grade, and the percentage of students who received that grade based on the total students in that category.
For example, in Math, 457 students with a parent at home recieved an A, which is roughly 23.4%
Surprisingly, I found that overall, students appeared to perform better in all subjects when they do not have a parent at home!
0 notes
lbesquire · 8 years ago
Text
My Research Question
For my Research Topic, I chose to look into the AddHealth Data Set.  I was most interested in how having a parent at home influences the academic success of their children.  My curiosity in this area was motivated by a Pew Research Study, which found that 60% of children believe they are better off with a parent at home (D’Vera Cohn, Gretchen Livingston, and Wendy Wang. 2014).  In the US, most children (>60%) have either two working parents, or one parent who is exclusively working, with no stay at home counterpart (Bureau of Labor Statistics, 2015 https://www.bls.gov/news.release/famee.nr0.htm).
To find literature related to this topic, I used the search terms “stay-at-home parents and student success”, “latchkey children and academic success”and “after-school parental supervision and academic success”.  Of the research studies I found, a number of them showed a positive relationship between the presence of Stay-at-Home parents and greater academic success for their children (Bettinger 2014, Fan 2001, Fehrman 1987, Ingram 2007, Rajalakshm 2015).  Some of these studies attributed the students’ greater academic success at least in part to better supervision and help with homework after-school (Fan 2001, Fehrman 1987, Ingram 2007), but others only showed that there was a benefit for students who had a parent at home and did not determine a specific factor to be more influential than others (Bettinger 2014). The study by Rajalakshm  et al. actually did not find much of a relationship between having a parent at home and academic success, unless the students were low income. Each of these studies took into account many different variables relating to the age of students studied, the geographic region where they go to school, and their socio-economic status. For this reason, it is hard to get an overall picture of how parental supervision afterschool influences students academically in a general sense.  I therefore think it might be useful and interesting to perform a broader study using the AddHealth Data set, which includes data from students in grades 7-12 from around the United States.
I am therefore proposing the following research question: Does having a parent at home, specifically after school, lead to greater academic success?  I hypothesize that students who have a parent at home when they get home from school will have more academic success than their unsupervised counterparts, as measured by course grades. My question is not designed to go into the details of what the parents are or are not doing with their child/children while at home, it simply asks whether or not they are there with the child, and is this sufficient for greater student success overall.
My Codebook
The variables I am considering relate to how often a parent is at home when a child returns from school (H1RM12 and H1RF12) and I will be comparing students who always have a parent at home with those who never have a parent at home.  For my second data set, I will choose to measure academic success based on letter grades in English, Math, History, and Science (H13D11-14).
Literature Cited
1)   Bettinger E. “Home with Mom: The Effects of Stay-at-Home Parents on Children’s Long-Run Educational Outcomes.” Journal of Labor Economics. 2014.
Summary: Study follows the success of students who benefited from a government paid program wherein their parent was paid to stay home with a younger sibling.  Students showed marked academic improvement over their peers.
2)   Fehrmann P,  Keith T, and Reimers T. “Home Influence on School Learning: Direct and Indirect Effects of Parental Involvement on High School Grades.”  The Journal of Educational Research. 1987.
Summary: Examines the effects of parental involvement in student homework help and TV monitoring. Concluded that parental involvement with homework leads to improved grades.
3)    Fan X, Chen M. “Parental Involvement and Students’ Academic Achievement: A Meta-Analysis”.  Educational Psychology Review. 2001.
Summary: Analyzed the connection between parental involvement in students’ education and academic success (in terms of GPA and grades).  Found there was a link between parental home supervision and improved grades, though authors acknowledge it was the weakest of the relationships studied.
4)   Ingram, M. “The Role of Parents in High-Achieving Schools serving Low-Income, At-Risk Populations.”  Education and Urban Society. 2007.
Summary: This study picks apart the exact ways in which parents are involved in their students education, and includes data for Home Learning and supervision. Study concludes that these features are positively related to student success.
5)    Rajalakshmi J,  Thanasekaran T. “The Effects and Behaviours of Home Alone Situation by Latchkey Children”. American Journal of Nursing Science. 2015.
Summary: References several areas where unsupervised children experience hardship compared to their supervised peers.  However, acknowledges that academically there is little difference between latchkey kids and supervised kids who are upper or middle class. Lower class students do suffer academically when unsupervised.
1 note · View note