progressonaddhealthdataset - Tumblr blog

progressonaddhealthdataset

progress on Addhealth dataset

5 posts

Don't wanna be here? Send us removal request.

Last Seen Blogs

violetgalaxie

PAFL brainrot never ceases help me h

fuckyeahpipesofpan

Pipes of Pan

meagancandraw

*Insert Witty and Creative Title Here*

violetgalaxie

PAFL brainrot never ceases help me h

violetgalaxie

PAFL brainrot never ceases help me h

progressonaddhealthdataset · 1 year

Text

Visualization

Creating Graphs for Data

Input CSV: Gapminder

Variables: Internet use rate, urban rate and employment rate

Python with pandas, numpy and seaborn

Individual Gaphs for the 3 variables

ScatterPlot Graphs for the association between

Urban Rate -> Employ Rate

Internet Use Rate -> Employ Rate

Internet Use Rate -> Urban Rate

Create Graphs for each individual variables, Count and Frequency Distributions

ScatterPlot Diagrams for Association Between Quantitiaive -> Quantitative Variables

Conclusion: No association between the urban rate and the employment rate

Conclusion: No association between the internet use rate and the employ rate

Conclusion: There is a positive association between the Internet use rate and the urban rate

0 notes

progressonaddhealthdataset · 1 year

Text

program

# -*- coding: utf-8 -*-

""" Created on Sun Aug 30 10:50:43 2015 @author: ldierker """ import pandas import numpy # any additional libraries would be imported here data = pandas.read_csv('nesarc_pds.csv', low_memory=False) print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns) # checking the format of your variables data['ETHRACE2A'].dtype #setting variables you will be working with to numeric data['TAB12MDX'] = pandas.to_numeric(data['TAB12MDX']) data['CHECK321'] = pandas.to_numeric(data['CHECK321']) data['S3AQ3B1'] = pandas.to_numeric(data['S3AQ3B1']) data['S3AQ3C1'] = pandas.to_numeric(data['S3AQ3C1']) data['AGE'] = pandas.to_numeric(data['AGE']) #counts and percentages (i.e. frequency distributions) for each variable c1 = data['TAB12MDX'].value_counts(sort=False) print (c1) p1 = data['TAB12MDX'].value_counts(sort=False, normalize=True) print (p1) c2 = data['CHECK321'].value_counts(sort=False) print(c2) p2 = data['CHECK321'].value_counts(sort=False, normalize=True) print (p2) c3 = data['S3AQ3B1'].value_counts(sort=False) print(c3) p3 = data['S3AQ3B1'].value_counts(sort=False, normalize=True) print (p3) c4 = data['S3AQ3C1'].value_counts(sort=False) print(c4) p4 = data['S3AQ3C1'].value_counts(sort=False, normalize=True) print (p4) c4 = data['S3AQ3C1'].value_counts(sort=False) print(c4) p4 = data['S3AQ3C1'].value_counts(sort=False, normalize=True) print (p4) #ADDING TITLES print ('counts for TAB12MDX') c1 = data['TAB12MDX'].value_counts(sort=False) print (c1) #print (len(data['TAB12MDX'])) #number of observations (rows) print ('percentages for TAB12MDX') p1 = data['TAB12MDX'].value_counts(sort=False, normalize=True) print (p1) print ('counts for CHECK321') c2 = data['CHECK321'].value_counts(sort=False) print(c2) print ('percentages for CHECK321') p2 = data['CHECK321'].value_counts(sort=False, normalize=True) print (p2) print ('counts for S3AQ3B1') c3 = data['S3AQ3B1'].value_counts(sort=False, dropna=False) print(c3) print ('percentages for S3AQ3B1') p3 = data['S3AQ3B1'].value_counts(sort=False, normalize=True) print (p3) print ('counts for S3AQ3C1') c4 = data['S3AQ3C1'].value_counts(sort=False, dropna=False) print(c4) print ('percentages for S3AQ3C1') p4 = data['S3AQ3C1'].value_counts(sort=False, dropna=False, normalize=True) print (p4) #ADDING MORE DESCRIPTIVE TITLES print('counts for TAB12MDX â€“ nicotine dependence in the past 12 months') c1 = data['TAB12MDX'].value_counts(sort=False) print (c1) print('percentages for TAB12MDX nicotine dependence in the past 12 months') p1 = data['TAB12MDX'].value_counts(sort=False, normalize=True) print (p1) print('counts for CHECK321 smoked in the past year') c2 = data['CHECK321'].value_counts(sort=False) print(c2) print('percentages for CHECK321 smoked in the past year') p2 = data['CHECK321'].value_counts(sort=False, normalize=True) print (p2) print('counts for S3AQ3B1 â€“usual frequency when smoked cigarettes') c3 = data['S3AQ3B1'].value_counts(sort=False) print(c3) print('percentages for S3AQ3B1 - usual frequency when smoked cigarettes') p3 = data['S3AQ3B1'].value_counts(sort=False, normalize=True) print (p3) print('counts for S3AQ3C1 usual quantity when smoked cigarettes') c4 = data['S3AQ3C1'].value_counts(sort=False, dropna=False) print(c4) print('percentages for S3AQ3C1 usual quantity when smoked cigarettes') p4 = data['S3AQ3C1'].value_counts(sort=False, normalize=True) print (p4) # frequency distributions using the 'bygroup' function ct1= data.groupby('TAB12MDX').size() print(ct1) pt1 = data.groupby('TAB12MDX').size() * 100 / len(data) print(pt1) #subset data to young adults age 18 to 25 who have smoked in the past 12 months sub1=data[(data['AGE']>=18) & (data['AGE']<=25) & (data['CHECK321']==1)] #make a copy of my new subsetted data sub2 = sub1.copy() # frequency distributions on new sub2 data frame print('counts for AGE') c5 = sub2['AGE'].value_counts(sort=False)

0 notes

progressonaddhealthdataset · 1 year

Text

Literature survey

correlation of Addiction and gambling:

I have decided on doing a analysis on the dataset following the topic "addiction and gambling". For this, I have made a personal codeblock and a excel sheet containing only the variables that are required for my analysis. I have copied the column of variables- id,sex,consumer,smoker and gambling behaviour.

I want to look at the daytasets and bring about a factor wheather a person with active addiction in inclined towards gambling or not.

0 notes

progressonaddhealthdataset · 1 year

Text

I could not go ahead with the codeblock as there weren't any csv file for wave3 data collection. Also, for wave 1 , i did not get the description about the variable weight.

So I will carry out the dataset provided on "National Epidemiologic Survey of Drug Use and Health Code Book".

Introduction

I am a fellow learner from coursera doing a course on Data management and visualization using the ADDHEALTH dataset provided to us. Join me on this beautiful journey where I will learn and explore new things.

1 note · View note

progressonaddhealthdataset · 1 year

Text

Introduction

1 note · View note

Statistics

We looked inside some of the posts by progressonaddhealthdataset and here's what we found interesting.

Average Info

Notes Per Post

Likes Per Post

Reblog Per Post

Reply Per Post

Time Between Posts

1 day

Number of Posts By Type

Text

Explore Tagged Posts

Fun Fact

Tumblr has been providing a Korean-language service since 2013.