Tumgik
Visualization
Creating Graphs for Data
Input CSV: Gapminder
Variables: Internet use rate, urban rate and employment rate
Python with pandas, numpy and seaborn
Individual Gaphs for the 3 variables
ScatterPlot Graphs for the association between
Urban Rate -> Employ Rate
Internet Use Rate -> Employ Rate
Internet Use Rate -> Urban Rate
Create Graphs for each individual variables, Count and Frequency Distributions
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
ScatterPlot Diagrams for Association Between Quantitiaive -> Quantitative Variables
Tumblr media
Conclusion: No association between the urban rate and the employment rate
Tumblr media
Conclusion: No association between the internet use rate and the employ rate
Tumblr media
Conclusion: There is a positive association between the Internet use rate and the urban rate
0 notes
0 notes
program
# -*- coding: utf-8 -*-
""" Created on Sun Aug 30 10:50:43 2015 @author: ldierker """ import pandas import numpy # any additional libraries would be imported here data = pandas.read_csv('nesarc_pds.csv', low_memory=False) print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns) # checking the format of your variables data['ETHRACE2A'].dtype #setting variables you will be working with to numeric data['TAB12MDX'] = pandas.to_numeric(data['TAB12MDX']) data['CHECK321'] = pandas.to_numeric(data['CHECK321']) data['S3AQ3B1'] = pandas.to_numeric(data['S3AQ3B1']) data['S3AQ3C1'] = pandas.to_numeric(data['S3AQ3C1']) data['AGE'] = pandas.to_numeric(data['AGE']) #counts and percentages (i.e. frequency distributions) for each variable c1 = data['TAB12MDX'].value_counts(sort=False) print (c1) p1 = data['TAB12MDX'].value_counts(sort=False, normalize=True) print (p1) c2 = data['CHECK321'].value_counts(sort=False) print(c2) p2 = data['CHECK321'].value_counts(sort=False, normalize=True) print (p2) c3 = data['S3AQ3B1'].value_counts(sort=False) print(c3) p3 = data['S3AQ3B1'].value_counts(sort=False, normalize=True) print (p3) c4 = data['S3AQ3C1'].value_counts(sort=False) print(c4) p4 = data['S3AQ3C1'].value_counts(sort=False, normalize=True) print (p4) c4 = data['S3AQ3C1'].value_counts(sort=False) print(c4) p4 = data['S3AQ3C1'].value_counts(sort=False, normalize=True) print (p4) #ADDING TITLES print ('counts for TAB12MDX') c1 = data['TAB12MDX'].value_counts(sort=False) print (c1) #print (len(data['TAB12MDX'])) #number of observations (rows) print ('percentages for TAB12MDX') p1 = data['TAB12MDX'].value_counts(sort=False, normalize=True) print (p1) print ('counts for CHECK321') c2 = data['CHECK321'].value_counts(sort=False) print(c2) print ('percentages for CHECK321') p2 = data['CHECK321'].value_counts(sort=False, normalize=True) print (p2) print ('counts for S3AQ3B1') c3 = data['S3AQ3B1'].value_counts(sort=False, dropna=False) print(c3) print ('percentages for S3AQ3B1') p3 = data['S3AQ3B1'].value_counts(sort=False, normalize=True) print (p3) print ('counts for S3AQ3C1') c4 = data['S3AQ3C1'].value_counts(sort=False, dropna=False) print(c4) print ('percentages for S3AQ3C1') p4 = data['S3AQ3C1'].value_counts(sort=False, dropna=False, normalize=True) print (p4) #ADDING MORE DESCRIPTIVE TITLES print('counts for TAB12MDX – nicotine dependence in the past 12 months') c1 = data['TAB12MDX'].value_counts(sort=False) print (c1) print('percentages for TAB12MDX nicotine dependence in the past 12 months') p1 = data['TAB12MDX'].value_counts(sort=False, normalize=True) print (p1) print('counts for CHECK321 smoked in the past year') c2 = data['CHECK321'].value_counts(sort=False) print(c2) print('percentages for CHECK321 smoked in the past year') p2 = data['CHECK321'].value_counts(sort=False, normalize=True) print (p2) print('counts for S3AQ3B1 –usual frequency when smoked cigarettes') c3 = data['S3AQ3B1'].value_counts(sort=False) print(c3) print('percentages for S3AQ3B1 - usual frequency when smoked cigarettes') p3 = data['S3AQ3B1'].value_counts(sort=False, normalize=True) print (p3) print('counts for S3AQ3C1 usual quantity when smoked cigarettes') c4 = data['S3AQ3C1'].value_counts(sort=False, dropna=False) print(c4) print('percentages for S3AQ3C1 usual quantity when smoked cigarettes') p4 = data['S3AQ3C1'].value_counts(sort=False, normalize=True) print (p4) # frequency distributions using the 'bygroup' function ct1= data.groupby('TAB12MDX').size() print(ct1) pt1 = data.groupby('TAB12MDX').size() * 100 / len(data) print(pt1) #subset data to young adults age 18 to 25 who have smoked in the past 12 months sub1=data[(data['AGE']>=18) & (data['AGE']<=25) & (data['CHECK321']==1)] #make a copy of my new subsetted data sub2 = sub1.copy() # frequency distributions on new sub2 data frame print('counts for AGE') c5 = sub2['AGE'].value_counts(sort=False)
0 notes
Literature survey
correlation of Addiction and gambling:
I have decided on doing a analysis on the dataset following the topic "addiction and gambling". For this, I have made a personal codeblock and a excel sheet containing only the variables that are required for my analysis. I have copied the column of variables- id,sex,consumer,smoker and gambling behaviour.
I want to look at the daytasets and bring about a factor wheather a person with active addiction in inclined towards gambling or not.
0 notes
I could not go ahead with the codeblock as there weren't any csv file for wave3 data collection. Also, for wave 1 , i did not get the description about the variable weight.
So I will carry out the dataset provided on "National Epidemiologic Survey of Drug Use and Health Code Book".
Introduction
I am a fellow learner from coursera doing a course on Data management and visualization using the ADDHEALTH dataset provided to us. Join me on this beautiful journey where I will learn and explore new things.
1 note · View note
Introduction
I am a fellow learner from coursera doing a course on Data management and visualization using the ADDHEALTH dataset provided to us. Join me on this beautiful journey where I will learn and explore new things.
1 note · View note