Don't wanna be here? Send us removal request.
Text
Assessment week 4
The codes for the data visualization for each diagram is written as follows:-
1. for first Diagram.
# importing dependencies for virtualization import seaborn as sns import matplotlib.pyplot as plt
# univariate analysis for the count of people suffering with illness sns.countplot(x = ‘Illness’,data = data) plt.xlabel('ILLNESS’) plt.title('univariate analysis for the count of people suffering with illness’) plt.show()
2.for second Diagram.
# Now checking the people suffering with illness in the major cites of USA c1 = data[(data['Illness’] == 'Yes’)]
plt.figure(figsize=(16,6)) sns.countplot(x = 'City’,data = c1) plt.xlabel('Cites of US suffering the illness’) plt.title('The people suffering with illness in the major cites of USA’) plt.show()
3. for 3rd Diagram.
# Now checking the people not suffering with illness in the major cites of USA c2 = data[(data['Illness’] == 'No’)]
plt.figure(figsize=(16,6)) sns.countplot(x = 'City’,data = c2) plt.xlabel('Cites of US not suffering with illness’) plt.title('Graph to show people not suffering with illness in the major cites of USA’) plt.show()
4.For fourth Diagram
# bivariate graph representation plt.figure(figsize=(16,6)) sns.regplot(x = 'Age’,y = 'Income’,data = data.head(200) , fit_reg = False) plt.xlabel('Ages of the people’) plt.ylabel(“Income of the people”) plt.title('Scatterplot of Age Vs Income’) plt.show()
0 notes
Text
Assessment Week 3
With my toy dataset I am continuing data management and analysis. So here, I am working with dataset to know if there is illness in top 3 cities where people live in and falls under the age group of Adult and Old Age.
________________________________________________________________
# checking how many people in dallas city of USA having illness who are adult c1 = data[(data[‘City’] == 'Dallas’) & (data['Age’] == 'Adulthood’)]
data['Illness’].value_counts(sort = True)
output -
No 137861
Yes 12139
_______________________________________________________________
# checking how many people in New York City city of USA having illness c2 = data[(data['City’] == 'New York City’) & (data['Age’] == 'Adulthood’)] c2['Illness’].value_counts(sort = True)
output -
No 41106
Yes 3554
________________________________________________________________
# checking how many people in Los Angeles city of USA having illness c3 = data[(data['City’] == 'Los Angeles’) & (data['Age’] == 'Adulthood’)] c3['Illness’].value_counts(sort = True)
output -
No 26242
Yes 2286
________________________________________________________________
# Now checking for Old Age people in Dallas City c4 = data[(data['City’] == 'Dallas’) & (data['Age’] == 'Old Age’)] c4['Illness’].value_counts(sort = True)
output -
No 2033
Yes 201
________________________________________________________________
# Now checking for Old Age people in Los Angeles City c5 = data[(data['City’] == 'Los Angeles’) & (data['Age’] == 'Old Age’)] c5['Illness’].value_counts(sort = True)
No 3363
Yes 282
________________________________________________________________
# Now checking for Old Age people in New York City City c6 = data[(data['City’] == 'New York City’) & (data['Age’] == 'Old Age’)] c6['Illness’].value_counts(sort = True)
output -
No 5180
Yes 467
________________________________________________________________
So, we can infer from the outputs that yes stands for the people with the illness and no stands for people without any illness and the no. after yes or no stands for the total no. of people suffering from illness or not in a particular city of USA.
0 notes
Text
Assessment Week 2
I have done Univariate analysis on my dataset by using python Language .
Note - the codes , output and the comments for the analysis of dataset is written as follows:
________________________________________________________________
# importing dependencies import pandas as pd import numpy as np
# creating dataframe data = pd.read_csv(‘toy_dataset.csv’,low_memory=False,index_col=0) data.head(5)
output -
Number City Gender Age Income Illness 1 Dallas Male 41 40367.0 No 2 Dallas Male 54 45084.0 No 3 Dallas Male 42 52483.0 No 4 Dallas Male 40 40941.0 No 5 Dallas Male 46 50289.0 No
________________________________________________________________
# checking the number of rows and columns data.shape
OUPUT -
(150000, 5) # 15000 rows and 5 columns
________________________________________________________________
#running the frequency distribution # univariate analyis data['City’].value_counts(sort = True)
output-
New York City 50307
Los Angeles 32173
Dallas 19707
Mountain View 14219
Austin 12292
Boston 8301
Washington D.C. 8120
San Diego 4881
# calculating percentages data['City’].value_counts(sort = True, normalize = True)
output -
New York City 0.335380
Los Angeles 0.214487
Dallas 0.131380
Mountain View 0.094793
Austin 0.081947
Boston 0.055340
Washington D.C. 0.054133
San Diego 0.032540
________________________________________________________________
# Gender column Univariate Analysis data['Gender’].value_counts(sort = True)
output -
Male 83800
Female 66200
data['Gender’].value_counts(sort = True, normalize = True)
output -
Male 0.558667
Female 0.441333
________________________________________________________________
# Illness column Univariate Analysis data['Illness’].value_counts(sort = True)
output -
No 137861
Yes 12139
data['Illness’].value_counts(sort = True, normalize = True)
output -
No 0.919073
Yes 0.080927
________________________________________________________________
# dividing the age of various people in different groups def ageGroups(X): if X > 0 and X < 12: return 'Childhood’ elif X > 11 and X < 21: return 'Adolscence’ elif X > 19 and X < 61: return 'Adulthood’ else: return 'Old Age’ data['Age’] = data['Age’].apply(ageGroups)
# Age column Univariate Analysis data['Age’].value_counts(sort = True)
outout -
Adulthood 133052
Old Age 16948
data['Age’].value_counts(sort = True, normalize = True)
output -
Adulthood 0.887013
Old Age 0.112987
0 notes
Text
Assignment week 1
I am going to work around with a dataset name Toy Data which is related on people having illness or not based on the various parameters in the big cities of US.
This is the link to download the dataset : https://drive.google.com/file/d/1-in10C0BGvkKpWRrsg0zr_g96VCB_Ag7/view?usp=drivesdk
Variables:
The different columns of the dataset are:-
Number: A simple index number for each row
City: The location of a person (Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego and Austin) [categorical data]
Gender: Gender of a person (Male or Female) [categorical data]
Age: The age of a person (Ranging from 25 to 65 years) [quantitative data]
Income: Annual income of a person (Ranging from -674 to 177175) [quantitative data]
Illness: Is the person Ill? (Yes or No) [categorical data]
Research Question:
For me the research questions that I would like to evaluate through this dataset can be :-
1. In the big cities of USA, which gender is more suffering from illness?
2. Which age group of people are more suffering from illness?
3. What is the range of income for people who are suffering from illness?
4. Which area of USA is mostly affected with illness?
5. what is the percentage of illness among the genders in each mentioned city in USA?
Summary:
According to dataset, most of illness occurred with the people having less income. By manipulating the data at same result was interpreted. With the passage of time, the number of illness all over the world increased due to the less infrastructure and slow development in non-major cities of USA .
This picture is taken from Wikipidea which proves the above-mentioned hypothesis.
The additional topic I will explore is the influence of proper diets on the illness rate. Literature :-
1) How Does Income affect life expectancy? (Blog)
2) Age group more prone to illness in US? (Blog)
3) Sex and gender differences in health - NCBI? (Blog)`
1 note
·
View note