apoorvaml-week1 - Tumblr blog

apoorvaml-week1 · 2 years ago

Text

Week 1: Peer-graded Assignment: Running a Classification Tree

This assignment is intended for Coursera course "Machine Learning for Data Analysis by Wesleyan University”.

It is for "Week 1: Peer-graded Assignment: Running a Classification Tree".

I am working on decision trees in Python.

1) Syntax used to run Classification Tree

Installation in Linux Ubuntu.

sudo chmod +x Anaconda3-2022.10-Linux-x86_64.sh

./Anaconda3-2022.10-Linux-x86_64.sh

conda install scikit-learn

conda install -n my_environment scikit-learn

pip install sklearn

pip install -U scikit-learn scipy matplotlib

sudo apt-get install graphviz

sudo apt-get install pydotplus

conda create -c conda-forge -n spyder-env spyder numpy scipy pandas matplotlib

sympy cython

conda create -c conda-forge -n spyder-env spyder

conda activate spyder-env

conda config --env --add channels conda-forge

conda config --env --set channel_priority strict

python -m pip install pydotplus

dot -Tpng tree.dot -o tree5.png

2) Code used to run Classification Tree

import numpy as np

import pandas as pd

import matplotlib.pylab as plt

from sklearn.cross_validation import train_test_split

from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import classification_report from sklearn import tree import pydotplus import sklearn.metrics

#Load the dataset

data = pd.read_csv("tree_addhealth.csv")

dc = data.dropna() dc.dtypes dc.describe() """ Modeling and Prediction """

#Split into training and testing sets

predictor = dc[['HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN']]

target = dc.TREG1

pr_train,pr_test,t_train,t_test= train_test_split(predictor,target, test_size=0.4) pr_train.shape pr_test.shape t_train.shape t_test.shape

#Build model on training data

classif = DecisionTreeClassifier() classif = classif.fit(pr_train,t_train) pred=classif.predict(pr_test) sklearn.metrics.confusion_matrix(t_test,pred) sklearn.metrics.accuracy_score(t_test,pred)

#Running Classification Tree

tree.export_graphviz(classif, out_file='tree_race.dot')

3) Corresponding Output

I chose few variables to determine if they can predict regular smoking.

predictor = dc[['HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN']]

Therefore I did change predictor variables to just two: gender and age. I got this tree.

4) Interpretation

I have to perform a decision tree analysis to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable. Data set is provided by The National Longitudinal Study of Adolescent Health (AddHealth).

I will not complicate things here, therefore I am focusing on regular smoking (TREG1 variable).

I translated .dot file to .png using “dot -Tpng name.dot -o name.png” command.

I chose few variables to determine if they can predict regular smoking.

predictor = dc[['HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN']]

Therefore I did change predictor variables to just two: gender and age.

#Running a Classification Tree #Machine Learning for Data Analysis #Wesleyan University #Coursera #Python #Week1

0 notes