solving-the-assignment - Tumblr blog

solving-the-assignment · 4 years ago

Text

Assessment week 4

The codes for the data visualization for each diagram is written as follows:-

1. for first Diagram.

# importing dependencies for virtualization import seaborn as sns import matplotlib.pyplot as plt

# univariate analysis for the count of people suffering with illness sns.countplot(x = ‘Illness’,data = data) plt.xlabel('ILLNESS’) plt.title('univariate analysis for the count of people suffering with illness’) plt.show()

2.for second Diagram.

# Now checking the people suffering with illness in the major cites of USA c1 = data[(data['Illness’] == 'Yes’)]

plt.figure(figsize=(16,6)) sns.countplot(x = 'City’,data = c1) plt.xlabel('Cites of US suffering the illness’) plt.title('The people suffering with illness in the major cites of USA’) plt.show()

3. for 3rd Diagram.

# Now checking the people not suffering with illness in the major cites of USA c2 = data[(data['Illness’] == 'No’)]

plt.figure(figsize=(16,6)) sns.countplot(x = 'City’,data = c2) plt.xlabel('Cites of US not suffering with illness’) plt.title('Graph to show people not suffering with illness in the major cites of USA’) plt.show()

4.For fourth Diagram

# bivariate graph representation plt.figure(figsize=(16,6)) sns.regplot(x = 'Age’,y = 'Income’,data = data.head(200) , fit_reg = False) plt.xlabel('Ages of the people’) plt.ylabel(“Income of the people”) plt.title('Scatterplot of Age Vs Income’) plt.show()

0 notes

solving-the-assignment · 4 years ago

Text

Assessment Week 3

With my toy dataset I am continuing data management and analysis. So here, I am working with dataset to know if there is illness in top 3 cities where people live in and falls under the age group of Adult and Old Age.

________________________________________________________________

# checking how many people in dallas city of USA having illness who are adult c1 = data[(data[‘City’] == 'Dallas’) & (data['Age’] == 'Adulthood’)]

data['Illness’].value_counts(sort = True)

output -

No 137861

Yes 12139

_______________________________________________________________

# checking how many people in New York City city of USA having illness c2 = data[(data['City’] == 'New York City’) & (data['Age’] == 'Adulthood’)] c2['Illness’].value_counts(sort = True)

output -

No 41106

Yes 3554

________________________________________________________________

# checking how many people in Los Angeles city of USA having illness c3 = data[(data['City’] == 'Los Angeles’) & (data['Age’] == 'Adulthood’)] c3['Illness’].value_counts(sort = True)

output -

No 26242

Yes 2286

________________________________________________________________

# Now checking for Old Age people in Dallas City c4 = data[(data['City’] == 'Dallas’) & (data['Age’] == 'Old Age’)] c4['Illness’].value_counts(sort = True)

output -

No 2033

Yes 201

________________________________________________________________

# Now checking for Old Age people in Los Angeles City c5 = data[(data['City’] == 'Los Angeles’) & (data['Age’] == 'Old Age’)] c5['Illness’].value_counts(sort = True)

No 3363

Yes 282

________________________________________________________________

# Now checking for Old Age people in New York City City c6 = data[(data['City’] == 'New York City’) & (data['Age’] == 'Old Age’)] c6['Illness’].value_counts(sort = True)

output -

No 5180

Yes 467

________________________________________________________________

So, we can infer from the outputs that yes stands for the people with the illness and no stands for people without any illness and the no. after yes or no stands for the total no. of people suffering from illness or not in a particular city of USA.

0 notes

solving-the-assignment · 4 years ago

Text

Assessment Week 2

I have done Univariate analysis on my dataset by using python Language .

Note - the codes , output and the comments for the analysis of dataset is written as follows:

________________________________________________________________

# importing dependencies import pandas as pd import numpy as np

# creating dataframe data = pd.read_csv(‘toy_dataset.csv’,low_memory=False,index_col=0) data.head(5)

output -

Number City Gender Age Income Illness 1 Dallas Male 41 40367.0 No 2 Dallas Male 54 45084.0 No 3 Dallas Male 42 52483.0 No 4 Dallas Male 40 40941.0 No 5 Dallas Male 46 50289.0 No

________________________________________________________________

# checking the number of rows and columns data.shape

OUPUT -

(150000, 5) # 15000 rows and 5 columns

________________________________________________________________

#running the frequency distribution # univariate analyis data['City’].value_counts(sort = True)

output-

New York City 50307

Los Angeles 32173

Dallas 19707

Mountain View 14219

Austin 12292

Boston 8301

Washington D.C. 8120

San Diego 4881

# calculating percentages data['City’].value_counts(sort = True, normalize = True)

output -

New York City 0.335380

Los Angeles 0.214487

Dallas 0.131380

Mountain View 0.094793

Austin 0.081947

Boston 0.055340

Washington D.C. 0.054133

San Diego 0.032540

________________________________________________________________

# Gender column Univariate Analysis data['Gender’].value_counts(sort = True)

output -

Male 83800

Female 66200

data['Gender’].value_counts(sort = True, normalize = True)

output -

Male 0.558667

Female 0.441333

________________________________________________________________

# Illness column Univariate Analysis data['Illness’].value_counts(sort = True)

output -

No 137861

Yes 12139

data['Illness’].value_counts(sort = True, normalize = True)

output -

No 0.919073

Yes 0.080927

________________________________________________________________

# dividing the age of various people in different groups def ageGroups(X): if X > 0 and X < 12: return 'Childhood’ elif X > 11 and X < 21: return 'Adolscence’ elif X > 19 and X < 61: return 'Adulthood’ else: return 'Old Age’ data['Age’] = data['Age’].apply(ageGroups)

# Age column Univariate Analysis data['Age’].value_counts(sort = True)

outout -

Adulthood 133052

Old Age 16948

data['Age’].value_counts(sort = True, normalize = True)

output -

Adulthood 0.887013

Old Age 0.112987

0 notes

solving-the-assignment · 4 years ago

Text

Assignment week 1

I am going to work around with a dataset name Toy Data which is related on people having illness or not based on the various parameters in the big cities of US.

This is the link to download the dataset : https://drive.google.com/file/d/1-in10C0BGvkKpWRrsg0zr_g96VCB_Ag7/view?usp=drivesdk

Variables:

The different columns of the dataset are:-

Number: A simple index number for each row

City: The location of a person (Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego and Austin) [categorical data]

Gender: Gender of a person (Male or Female) [categorical data]

Age: The age of a person (Ranging from 25 to 65 years) [quantitative data]

Income: Annual income of a person (Ranging from -674 to 177175) [quantitative data]

Illness: Is the person Ill? (Yes or No) [categorical data]

Research Question:

For me the research questions that I would like to evaluate through this dataset can be :-

1. In the big cities of USA, which gender is more suffering from illness?

2. Which age group of people are more suffering from illness?

3. What is the range of income for people who are suffering from illness?

4. Which area of USA is mostly affected with illness?

5. what is the percentage of illness among the genders in each mentioned city in USA?

Summary:

According to dataset, most of illness occurred with the people having less income. By manipulating the data at same result was interpreted. With the passage of time, the number of illness all over the world increased due to the less infrastructure and slow development in non-major cities of USA .

This picture is taken from Wikipidea which proves the above-mentioned hypothesis.

The additional topic I will explore is the influence of proper diets on the illness rate. Literature :-

1) How Does Income affect life expectancy? (Blog)

2) Age group more prone to illness in US? (Blog)

3) Sex and gender differences in health - NCBI? (Blog)`

1 note · View note