apoorvaml-week1
apoorvaml-week1
Week 1 : Running a Classification Tree
1 post
Don't wanna be here? Send us removal request.
apoorvaml-week1 · 2 years ago
Text
Week 1: Peer-graded Assignment: Running a Classification Tree
This assignment is intended for Coursera course "Machine Learning for Data Analysis by Wesleyan University”.
It is for "Week 1: Peer-graded Assignment: Running a Classification Tree".
I am working on decision trees in Python.
1) Syntax used to run Classification Tree
Installation in Linux Ubuntu.
sudo chmod +x Anaconda3-2022.10-Linux-x86_64.sh
./Anaconda3-2022.10-Linux-x86_64.sh
conda install scikit-learn
conda install -n my_environment scikit-learn
pip install sklearn
pip install -U scikit-learn scipy matplotlib
sudo apt-get install graphviz
sudo apt-get install pydotplus
conda create -c conda-forge -n spyder-env spyder numpy scipy pandas matplotlib
   sympy cython
conda create -c conda-forge -n spyder-env spyder
conda activate spyder-env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
python -m pip install pydotplus
dot -Tpng tree.dot -o tree5.png
2)  Code used to run Classification Tree
       import numpy as np
       import pandas as pd
       import matplotlib.pylab as plt
from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import classification_report from sklearn import tree import pydotplus import sklearn.metrics
#Load the dataset
data = pd.read_csv("tree_addhealth.csv")
dc = data.dropna() dc.dtypes dc.describe() """ Modeling and Prediction """
#Split into training and testing sets
predictor = dc[['HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN']]
target = dc.TREG1
pr_train,pr_test,t_train,t_test= train_test_split(predictor,target, test_size=0.4) pr_train.shape pr_test.shape t_train.shape t_test.shape
#Build model on training data
classif = DecisionTreeClassifier() classif = classif.fit(pr_train,t_train) pred=classif.predict(pr_test) sklearn.metrics.confusion_matrix(t_test,pred) sklearn.metrics.accuracy_score(t_test,pred)
#Running Classification Tree
tree.export_graphviz(classif, out_file='tree_race.dot')
3) Corresponding Output
Tumblr media
I chose few variables to determine if they can predict regular smoking.
predictor = dc[['HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN']]
Therefore I did change predictor variables to just two: gender and age. I got this tree.
Tumblr media
4) Interpretation
I have to perform a decision tree analysis to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable. Data set is provided by The National Longitudinal Study of Adolescent Health (AddHealth).
I will not complicate things here, therefore I am focusing on regular smoking (TREG1 variable).
I translated .dot file to .png using “dot -Tpng name.dot -o name.png” command.
I chose few variables to determine if they can predict regular smoking.
predictor = dc[['HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN']]
Therefore I did change predictor variables to just two: gender and age.
0 notes