data-management-konny86 - Tumblr blog

data-management-konny86 · 4 years ago

Text

Creating graphs for your data

Here is my solution for the 4th assignment.

The code:

#!/usr/bin/env python3

# -*- coding: utf-8 -*- """ Created on Wed Jun 9 14:17:38 2021 @author: konny My first Python program with the selected data set in relation to my research question """ import pandas import numpy import seaborn import matplotlib.pyplot as plt #Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None) # bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x) data = pandas.read_csv('ool_pds.csv', low_memory=False) #upper-case all DataFrame column names data.columns = map(str.upper, data.columns) print() print('Content of the ool_pds.csv dataset:') print('rows:', len(data)) # Number of observations (rows) print('columns:', len(data.columns)) # Number of variables (colums) print('\n=============================================================================') # setting variables I will be working with to numeric (updated) data['W1_P17'] = pandas.to_numeric(data['W1_P17']) #Do you have any biological or adopted children data['W1_P17A'] = pandas.to_numeric(data['W1_P17A']) #How many children do you have data['W1_P19'] = pandas.to_numeric(data['W1_P19']) #Where do your minor children (under 18) live # replace missing values to python missing (NaN) data['W1_P17']=data['W1_P17'].replace(-1, numpy.nan) data['W1_P17A']=data['W1_P17A'].replace(-1, numpy.nan) data['W1_P19']=data['W1_P19'].replace(-1, numpy.nan) #counts and percentages (i.e. frequency distributions) for each variable print('\ncounts for W1_P17 - \nDo you have any biological or adopted children, yes=1, no=2') c1 = data['W1_P17'].value_counts(sort=True, dropna=False) print (c1) print('\npercentages for W1_P17 - \nDo you have any biological or adopted children, yes=1, no=2') p1 = data['W1_P17'].value_counts(sort=True, normalize=True, dropna=False) print (p1,'\n') data['HAVECHILDS'] = data['W1_P17'] c1a=data.groupby('HAVECHILDS').size() print(c1a) data['HAVECHILDS'] = data['HAVECHILDS'].astype('category') print('\nDescription: People have children') desc1 = data['HAVECHILDS'].describe() print(desc1) print('\nHave childs category') c2a = data['HAVECHILDS'].value_counts(sort=False) print(c2a) plot1=seaborn.countplot(x='HAVECHILDS', data=data) plt.xlabel('yes=1, no=2') plt.title('Do you have children?') plt.show(plot1) print('\n=============================================================================') print('\ncounts for W1_P17A - Original data\nHow many biological or adopted children do you have?') c2 = data['W1_P17A'].value_counts(sort=True, dropna=False) print (c2) print('\npercentages for W1_P17A - Original data\nHow many biological or adopted children do you have?') p2 = data['W1_P17A'].value_counts(sort=True, normalize=True, dropna=False) print (p2) # recode the childrengroup recode2 = {1: '1-3', 2: '1-3', 3: '1-3', 4: '4-6', 5: '4-6', 6: '4-6', 7: '7-10', 8: '7-10', 9: '7-10', 10: '7-10'} data['CHILDRENGROUP']= data['W1_P17A'].map(recode2) print('\nHow many children do you have? - Recoding and grouping the data') c5 = data['CHILDRENGROUP'].value_counts(sort=True) print (c5) print('\nHow many children do you have? - Recoding and grouping the data - percentages') p5 = data['CHILDRENGROUP'].value_counts(sort=True, normalize=True) print (p5) # First change format from numeric to categorical plot2=seaborn.countplot(x="CHILDRENGROUP", data=data) plt.xlabel('Number of children') plt.title('How many children do you have?') plt.show(plot2) print('\n=============================================================================') print('\ncounts of W1_P19 - Original data\nWhere do your minor children live? \nLive with both parents=1\nLive with father or mother (different expressions)=2-4\nLive with no parent=5') c3 = data['W1_P19'].value_counts(sort=True, dropna=False) print (c3)15:50

print('\npercentages of W1_P19 - - Original data\nWhere do your minor children live? \nLive with both parents=1\nLive with father or mother (different expressions)=2-4\nLive with no parent=5') p3 = data['W1_P19'].value_counts(sort=True, dropna=False, normalize=True) print (p3) data['P3']=p3 # recode the parentgroup recode1 = {1: 'both parents', 2: 'father or mother', 3: 'father or mother', 4: 'father or mother', 5: 'relatives'} data['PARENTGROUP']= data['W1_P19'].map(recode1) #data['PARENTGROUP'] = data.apply(lambda row: PARENTGROUP(row),axis=1) # live with at least one of the parents (father/mother/both) # building a group print('\nWhere do your children live? - Recoding and grouping the data') c4 = data['PARENTGROUP'].value_counts(sort=True) print (c4) print('\nWhere do your children live? - Recoding and grouping the data - percentages') p4 = data['PARENTGROUP'].value_counts(sort=True, normalize=True) print (p4) plot3=seaborn.countplot(x="PARENTGROUP", data=data) plt.xlabel('The children live with...') plt.title('Where do your children live?') plt.show(plot3) print('\n=============================================================================') # Looking to see which group of children mostly live with both parents # recode the parentgroup # only value 1 (living with both parents) is needed recode3 = {1: 1, 2: 0, 3: 0, 4: 0, 5: 0} data['BOTHPARENTS']= data['W1_P19'].map(recode3) plot4=seaborn.catplot(x='W1_P17A', y='BOTHPARENTS', data=data, kind="bar", ci=None) plt.xlabel('Number of children') plt.ylabel('Children live with both parents') plt.title('Which children group live mostly with their both parents') plt.show(plot4)

0 notes

data-management-konny86 · 4 years ago

Text

Summary description of the following solution for my 3rd assignment:

The following frequency distributions show the original data which comes from the dataset and, in the next step, the recoded and grouped data. Furthermore the missing data is shown in the original data records as ‘NaN’

The data records of W1_P19 have many different characteristics for the question “Do children live with one parent, father or mother”. For example the mother or father lives alone or with a new partner together with their children etc. The important question for me is here: Do the children live with both parents, live with one parent (father or mother), or do not live with their parents at all (live with relatives). By recoding and grouping the original data it is possible to answer the question.

Analogously, this also applies to the data records of W1_W17A ‘How many children do you have’.

0 notes

data-management-konny86 · 4 years ago

Text

Making Data Management Decisions

Here is my solution of the 3rd assignment:

My program:

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Wed Jun 9 14:17:38 2021 @author: konny My first Python program with the selected data set in relation to my research question """ import pandas import numpy data = pandas.read_csv('ool_pds.csv', low_memory=False) #upper-case all DataFrame column names data.columns = map(str.upper, data.columns) #bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x) print('Content of the ool_pds.csv dataset:') print('rows:', len(data)) # Number of observations (rows) Anzahl der Zeilen print('columns:', len(data.columns)) # Number of variables (colums) Anzahl der Spalten print('\n=============================================================================') # setting variables I will be working with to numeric (updated) data['W1_P17'] = pandas.to_numeric(data['W1_P17']) #Do you have any biological or adopted children data['W1_P17A'] = pandas.to_numeric(data['W1_P17A']) #How many children do you have data['W1_P19'] = pandas.to_numeric(data['W1_P19']) #Where do your minor children (under 18) live # replace missing values to python missing (NaN) # nan: specifies missing values in python data['W1_P17']=data['W1_P17'].replace(-1, numpy.nan) data['W1_P17A']=data['W1_P17A'].replace(-1, numpy.nan) data['W1_P19']=data['W1_P19'].replace(-1, numpy.nan) #counts and percentages (i.e. frequency distributions) for each variable print('\ncounts for W1_P17 - \nDo you have any biological or adopted children, yes=1, no=2') c1 = data['W1_P17'].value_counts(sort=True, dropna=False) print (c1) print('\npercentages for W1_P17 - \nDo you have any biological or adopted children, yes=1, no=2') p1 = data['W1_P17'].value_counts(sort=True, normalize=True, dropna=False) print (p1) print('\n=============================================================================') print('\ncounts for W1_P17A - Original data\nHow many biological or adopted children do you have?') c2 = data['W1_P17A'].value_counts(sort=True, dropna=False) print (c2) print('\npercentages for W1_P17A - Original data\nHow many biological or adopted children do you have?') p2 = data['W1_P17A'].value_counts(sort=True, normalize=True, dropna=False) print (p2) # recode the childrengroup recode2 = {1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 3} data['CHILDRENGROUP']= data['W1_P17A'].map(recode2) # building a group print('\nHow many Children do they have? - Recoding and grouping the data\n1-3 children = 1\n4-6 children = 2\n7-10 children = 3') c5 = data['CHILDRENGROUP'].value_counts(sort=True) print (c5) print('\nHow many Children do they have? - Recoding and grouping the data - percentages \n1-3 children = 1\n4-6 children = 2\n7-10 children = 3') p5 = data['CHILDRENGROUP'].value_counts(sort=True, normalize=True) print (p5) print('\n=============================================================================') print('\ncounts of W1_P19 - Original data\nWhere do your minor children live? \nLive with both parents=1\nLive with father or mother (different expressions)=2-4\nLive with no parent=5') c3 = data['W1_P19'].value_counts(sort=True, dropna=False) print (c3) print('\npercentages of W1_P19 - - Original data\nWhere do your minor children live? \nLive with both parents=1\nLive with father or mother (different expressions)=2-4\nLive with no parent=5') p3 = data['W1_P19'].value_counts(sort=True, dropna=False, normalize=True) print (p3) # recode the parentgroup recode1 = {1: 1, 2: 2, 3: 2, 4: 2, 5: 3} data['PARENTGROUP']= data['W1_P19'].map(recode1) # building a group print('\nWhere do the children live? - Recoding and grouping the data\n1=live with both parents\n2=live with father or mother\n3=live with ralatives') c4 = data['PARENTGROUP'].value_counts(sort=True) print (c4)

print('\nWhere do the children live? - Recoding and grouping the data - percentages\n1=live with both parents\n2=live with father or mother\n3=live with ralatives') p4 = data['PARENTGROUP'].value_counts(sort=True, normalize=True) print (p4)

========================================================

Results/Output:

========================================================

Content of the ool_pds.csv dataset: rows: 2294 columns: 436 ============================================================================= counts for W1_P17 - Do you have any biological or adopted children, yes=1, no=2 1.000000 1304 2.000000 949 NaN 41 Name: W1_P17, dtype: int64 percentages for W1_P17 - Do you have any biological or adopted children, yes=1, no=2 1.000000 0.568439 2.000000 0.413688 NaN 0.017873 Name: W1_P17, dtype: float64 ============================================================================= counts for W1_P17A - Original data How many biological or adopted children do you have? NaN 999 2.000000 481 1.000000 306 3.000000 267 4.000000 138 5.000000 51 6.000000 25 7.000000 11 10.000000 10 8.000000 5 9.000000 1 Name: W1_P17A, dtype: int64 percentages for W1_P17A - Original data How many biological or adopted children do you have? NaN 0.435484 2.000000 0.209677 1.000000 0.133391 3.000000 0.116391 4.000000 0.060157 5.000000 0.022232 6.000000 0.010898 7.000000 0.004795 10.000000 0.004359 8.000000 0.002180 9.000000 0.000436 Name: W1_P17A, dtype: float64 How many Children do they have? - Recoding and grouping the data 1-3 children = 1 4-6 children = 2 7-10 children = 3 1.000000 1054 2.000000 214 3.000000 27 Name: CHILDRENGROUP, dtype: int64 How many Children do they have? - Recoding and grouping the data - percentages 1-3 children = 1 4-6 children = 2 7-10 children = 3 1.000000 0.813900 2.000000 0.165251 3.000000 0.020849 Name: CHILDRENGROUP, dtype: float64 ============================================================================= counts of W1_P19 - Original data Where do your minor children live? Live with both parents=1 Live with father or mother (different characteristics)=2-4 Live with no parent=5 NaN 1529 1.000000 414 4.000000 206 2.000000 74 5.000000 36 3.000000 35 Name: W1_P19, dtype: int64 percentages of W1_P19 - Original data Where do your minor children live? Live with both parents=1 Live with father or mother (different characteristics)=2-4 Live with no parent=5 NaN 0.666521 1.000000 0.180471 4.000000 0.089799 2.000000 0.032258 5.000000 0.015693 3.000000 0.015257 Name: W1_P19, dtype: float64 Where do the children live? - Recoding and grouping the data 1=live with both parents 2=live with father or mother 3=live with ralatives 1.000000 414 2.000000 315 3.000000 36 Name: PARENTGROUP, dtype: int64 Where do the children live? - Recoding and grouping the data - percentages 1=live with both parents 2=live with father or mother 3=live with ralatives 1.000000 0.541176 2.000000 0.411765 3.000000 0.047059 Name: PARENTGROUP, dtype: float64

0 notes

data-management-konny86 · 4 years ago

Text

Here is mysolution of the 2nd assignment:

1) My Program:

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Wed Jun 9 14:17:38 2021

@author: Konny

My first Python program with the selected data set in relation to my research question """

import pandas import numpy

data = pandas.read_csv('ool_pds.csv', low_memory=False)

#upper-case all DataFrame column names data.columns = map(str.upper, data.columns)

#bug fix for display formats to avoid run time errors pandas.set_option('display.float_format', lambda x:'%f'%x)

print('Zeilen:', len(data)) # Number of observations (rows) Anzahl der Zeilen print('Spalten:', len(data.columns)) # Number of variables (colums) Anzahl der Spalten

# Alternative Option für die Anzeige von Beobachtungen oder Zelen in einem Dataframe #print('Index:', len(data.index))

# checking the format of your variables print(data['W1_P17'].dtype)

# setting variables I will be working with to numeric (updated) data['W1_P17'] = pandas.to_numeric(data['W1_P17']) #Do you have any biological or adopted children data['W1_P17A'] = pandas.to_numeric(data['W1_P17A']) #How many children do you have data['W1_P19'] = pandas.to_numeric(data['W1_P19']) #Where do your minor children (under 18) live

#counts and percentages (i.e. frequency distributions) for each variable print('counts for W1_P17 - Do you have any biological or adopted children, yes=1') c1 = data['W1_P17'].value_counts(sort=True) print (c1)

print('percentages for W1_P17 - Do you have any biological or adopted children, yes=1') p1 = data['W1_P17'].value_counts(sort=True, normalize=True) print (p1)

print('counts for W1_P17A - How many biological or adopted children do you have?') c2 = data['W1_P17A'].value_counts(sort=True) print (c2)

print('percentages for W1_P17A - How many biological or adopted children do you have?') p2 = data['W1_P17A'].value_counts(sort=True, normalize=True) print (p2)

print('counts for W1_P17A - Where do your minor children live? Live with both parents=1') c3 = data['W1_P19'].value_counts(sort=True, dropna=False) print (c3)

print('percentages for W1_P17A - Where do your minor children live? Live with both parents=1') p3 = data['W1_P19'].value_counts(sort=True, dropna=False, normalize=True) print (p3)

==========================================================

2) Output:

Zeilen: 2294 Spalten: 436 float64 counts for W1_P17 - Do you have any biological or adopted children, yes=1 1.000000 1304 2.000000 949 -1.000000 40 Name: W1_P17, dtype: int64 percentages for W1_P17 - Do you have any biological or adopted children, yes=1 1.000000 0.568687 2.000000 0.413868 -1.000000 0.017444 Name: W1_P17, dtype: float64 counts for W1_P17A - How many biological or adopted children do you have? 2.000000 481 1.000000 306 3.000000 267 4.000000 138 5.000000 51 6.000000 25 7.000000 11 10.000000 10 -1.000000 9 8.000000 5 9.000000 1 Name: W1_P17A, dtype: int64 percentages for W1_P17A - How many biological or adopted children do you have? 2.000000 0.368865 1.000000 0.234663 3.000000 0.204755 4.000000 0.105828 5.000000 0.039110 6.000000 0.019172 7.000000 0.008436 10.000000 0.007669 -1.000000 0.006902 8.000000 0.003834 9.000000 0.000767 Name: W1_P17A, dtype: float64 counts for W1_P17A - Where do your minor children live? Live with both parents=1 NaN 990 -1.000000 539 1.000000 414 4.000000 206 2.000000 74 5.000000 36 3.000000 35 Name: W1_P19, dtype: int64 percentages for W1_P17A - Where do your minor children live? Live with both parents=1 NaN 0.431561 -1.000000 0.234961 1.000000 0.180471 4.000000 0.089799 2.000000 0.032258 5.000000 0.015693 3.000000 0.015257 Name: W1_P19, dtype: float64

=========================================================

3) Description:

The most interesting information for me was that only 56 % of the people have an own or adopted child. I expected a bigger quantity.

The next interesting piece of information was that families mostly have 2 children (own and adopted) and there are even families with up to 10 children.

The third statement tells us that only 18.0% of the children live together with both parents. However there is a big amount of missing data (’NaN’: 43.2%) and people who refused an answer (’-1′: 23.5%).

0 notes