solving-the-assignment
solving-the-assignment
Untitled
4 posts
Don't wanna be here? Send us removal request.
solving-the-assignment · 4 years ago
Text
Assessment week 4
Tumblr media Tumblr media Tumblr media Tumblr media
The codes for the data visualization for each diagram is written as follows:-
1. for first Diagram.
# importing dependencies for virtualization import seaborn as sns import matplotlib.pyplot as plt
# univariate analysis for the count of people suffering with illness sns.countplot(x = ‘Illness’,data = data) plt.xlabel('ILLNESS’) plt.title('univariate analysis for the count of people suffering with illness’) plt.show()
2.for second Diagram.
# Now checking the people suffering with illness in the major cites of USA c1 = data[(data['Illness’] == 'Yes’)]
plt.figure(figsize=(16,6)) sns.countplot(x = 'City’,data = c1) plt.xlabel('Cites of US suffering the illness’) plt.title('The people suffering with illness in the major cites of USA’) plt.show()
3. for 3rd Diagram.
# Now checking the people not suffering with illness in the major cites of USA c2 = data[(data['Illness’] == 'No’)]
plt.figure(figsize=(16,6)) sns.countplot(x = 'City’,data = c2) plt.xlabel('Cites of US not suffering with illness’) plt.title('Graph to show people not suffering with illness in the major cites of USA’) plt.show()
4.For fourth Diagram
# bivariate graph representation plt.figure(figsize=(16,6)) sns.regplot(x = 'Age’,y = 'Income’,data = data.head(200) , fit_reg = False) plt.xlabel('Ages of the people’) plt.ylabel(“Income of the people”) plt.title('Scatterplot of Age Vs Income’) plt.show()
0 notes
solving-the-assignment · 4 years ago
Text
Assessment Week 3
With my toy dataset I am continuing data management and analysis. So here, I am working with dataset to know if there is illness in top 3 cities where people live in and falls under the age group of Adult and Old Age.
________________________________________________________________
# checking how many people in dallas city of USA having illness who are adult c1 = data[(data[‘City’] == 'Dallas’) & (data['Age’] == 'Adulthood’)]
data['Illness’].value_counts(sort = True)
output -
No     137861
Yes     12139
_______________________________________________________________
# checking how many people in New York City city of USA having illness c2 = data[(data['City’] == 'New York City’) & (data['Age’] == 'Adulthood’)] c2['Illness’].value_counts(sort = True)
output -
No     41106
Yes     3554
________________________________________________________________
# checking how many people in Los Angeles city of USA having illness c3 = data[(data['City’] == 'Los Angeles’) & (data['Age’] == 'Adulthood’)] c3['Illness’].value_counts(sort = True)
output -
No     26242
Yes     2286
________________________________________________________________
# Now checking for Old Age people in Dallas City c4 = data[(data['City’] == 'Dallas’) & (data['Age’] == 'Old Age’)] c4['Illness’].value_counts(sort = True)
output -
No     2033
Yes     201
________________________________________________________________
# Now checking for Old Age people in Los Angeles City c5 = data[(data['City’] == 'Los Angeles’) & (data['Age’] == 'Old Age’)] c5['Illness’].value_counts(sort = True)
No     3363
Yes     282
________________________________________________________________
# Now checking for Old Age people in New York City City c6 = data[(data['City’] == 'New York City’) & (data['Age’] == 'Old Age’)] c6['Illness’].value_counts(sort = True)
output -
No     5180
Yes     467
________________________________________________________________
So, we can infer from the outputs that yes stands for the people with the illness and no stands for people without any illness and the no. after yes or no stands for the total no. of people suffering from illness or not in a particular city of USA.
0 notes
solving-the-assignment · 4 years ago
Text
Assessment Week 2
I have done Univariate analysis on my dataset by using python Language .
Note - the codes , output  and the comments for the analysis of dataset is written as follows:
________________________________________________________________
# importing dependencies import pandas as pd import numpy as np
# creating dataframe data = pd.read_csv(‘toy_dataset.csv’,low_memory=False,index_col=0) data.head(5)
output -
Number    City     Gender     Age     Income    Illness 1            Dallas    Male         41        40367.0        No 2            Dallas    Male         54        45084.0        No 3            Dallas    Male         42        52483.0        No 4            Dallas    Male         40        40941.0        No 5            Dallas    Male         46        50289.0        No
________________________________________________________________
# checking the number of rows and columns data.shape
OUPUT -
(150000, 5)  # 15000 rows and 5 columns
________________________________________________________________
#running the frequency distribution # univariate analyis data['City’].value_counts(sort = True)
output-
New York City            50307
Los Angeles               32173
Dallas                         19707
Mountain View            14219
Austin                          12292
Boston                         8301
Washington D.C.         8120
San Diego                   4881
# calculating percentages data['City’].value_counts(sort = True, normalize = True)
output -
New York City     0.335380
Los Angeles        0.214487
Dallas                  0.131380
Mountain View     0.094793
Austin                   0.081947
Boston                  0.055340
Washington D.C.   0.054133
San Diego             0.032540
________________________________________________________________
# Gender column Univariate Analysis data['Gender’].value_counts(sort = True)
output -
Male        83800
Female    66200
data['Gender’].value_counts(sort = True, normalize = True)
output -
Male        0.558667
Female    0.441333
________________________________________________________________
# Illness column Univariate Analysis data['Illness’].value_counts(sort = True)
output -
No     137861
Yes     12139
data['Illness’].value_counts(sort = True, normalize = True)
output -
No     0.919073
Yes    0.080927
________________________________________________________________
# dividing the age of various people in different groups def ageGroups(X): if X > 0  and X < 12:   return 'Childhood’ elif X > 11 and X < 21:   return 'Adolscence’ elif X > 19  and X < 61:   return 'Adulthood’ else:   return 'Old Age’ data['Age’] = data['Age’].apply(ageGroups)
# Age column Univariate Analysis data['Age’].value_counts(sort = True)
outout -
Adulthood    133052
Old Age       16948
data['Age’].value_counts(sort = True, normalize = True)
output -
Adulthood    0.887013
Old Age      0.112987
0 notes
solving-the-assignment · 4 years ago
Text
Assignment week 1
I am going to work around with a dataset name Toy Data which is related on people having illness or not based on the various parameters in the big cities of US.
This is the link to download the dataset : https://drive.google.com/file/d/1-in10C0BGvkKpWRrsg0zr_g96VCB_Ag7/view?usp=drivesdk
Variables:
The different columns of the dataset are:-
Number: A simple index number for each row
City: The location of a person (Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego and Austin)  [categorical data]
Gender: Gender of a person (Male or Female)  [categorical data]
Age: The age of a person (Ranging from 25 to 65 years) [quantitative data]
Income: Annual income of a person (Ranging from -674 to 177175)  [quantitative data]
Illness: Is the person Ill? (Yes or No)  [categorical data]
Research Question:
For me the research questions that I would like to evaluate through this dataset can be :-
1. In the big cities of USA, which gender is more suffering from illness?
2. Which age group of people are more suffering from illness?
3. What is the range of income for people who are suffering from illness?
4. Which area of USA is mostly affected with illness?
5. what is the percentage of illness among the genders in each mentioned city in USA?
Summary:
According to dataset, most of illness occurred with the people having less income. By manipulating the data at same result was interpreted. With the passage of time, the number of illness all over the world increased due to the less infrastructure and slow development in non-major cities of USA .
Tumblr media
This picture is taken from Wikipidea which proves the above-mentioned hypothesis.
The additional topic I will explore is the influence of proper diets on the illness rate. Literature :-
1) How Does Income affect life expectancy? (Blog)
2) Age group more prone to illness in US? (Blog)
3) Sex and gender differences in health - NCBI? (Blog)`
1 note · View note