houdabssr - Tumblr blog

houdabssr · 5 years ago

Text

Forth assignement

My hypothesis was: the more we consume alcohol, the more the number of divorce cases increases.

Using two variables : marital statut (response variable) and Consumer (explenatory varable)

I begin with examin my data description then visualisate my data .I used this code :

this is the result

summary : We can see from the description that we’ve 6 different values (6 unique) and the top is the 1st one which means the 1st value (married) is the most common and the frequency of it is 20796 times, so we’ve 43093 of persons and 20769 are married

Both of my variables are Categorical so they can be visualized one at a time with the univariate graphs, that is with single variable bar charts.

Univariate graph for 1st variable (marital statut)

So I used this code to rename the values and visualize the graphs

the result:

The result is like I’ve expected in the description above.

Now we’ll do the same thing for our 2nd variable (Consumer)

The code to describe the variable consumer

the result

sammury :So we’ve the 43093 person with 3 categories and the most common is the 1st one (current drinker) with frequence of 26946.

Univariate graph for 2nd variable (Consumer)

Lets move to see the graph

sammury : therefore the most common is current drinker like we’ve seen in the description, the ex drinker and abstainer are less then the current & approximately equal to each other (8000 persons)

Bivariate graph for both variable

After that I tried to find relation between our two variables ,I tried to colapse my response variable with these lines

unfortunately it didn't work so I didnt colapse this variable and I opted for a bar char with 6 categories for marital statut and 3 for consumer using this code

#data visualization #data management #coursera

0 notes

houdabssr · 5 years ago

Text

Third assigment

Now it’s time to some data management so for my data management, I chose not to add a secondry variable, on the other hand I found a missing data in one of my variables which is S2AQ2 (DRANK AT LEAST 12 ALCOHOLIC DRINKS IN LAST 12 MONTHS) ,some people respond by « unknown », so I replaced it with no and like we see there are 32 persons

then I recorded all my variables so that my code is more readable with these lines of code

this is what the results looks like

finally I grouped my variables by making the cross in two tables (always using the recording)

table one : the crossing of marital (divorced or separated people) with drinking status (consumer)

we notice that the majority of people who are divorced or separated ,they consume alcoholic or they’re ex drinker .on the other side a minority of people who have been divorced are abstainer

table two : the crossing of marital (divorced or separated people) if they drank at least 12 alcoholic last 12 months

number of divorced people who drank at least 12 alcoholic in the past 12 months and who did not drink are approximately equal

Same thing for separated ppl.

#datavisualization #data management #coursera

0 notes

houdabssr · 5 years ago

Text

Second assignment

Introduction :

After choosing nesarc and the assumption I made first week, now it's time to take action and start coding.I chose python because it is a general prupose language.

Main body:

To start, I imported my dataset and want to see the number of individuals (rows) and variables (columnes) in my dataset.

Then, I displayed my 3 variables which interest me: current marrital status, drinking status and number of consumption in 12 months.

For the marital status,the person could be: married,living with someone,widowed,

divorced,separated or never married

For drinking status ,the consumer could be : current drinker,ex drinker or Lifetime Abstainer

For the question of frequency of drinking (DRANK AT LEAST 12 ALCOHOLIC DRINKS IN LAST 12 MONTHS),the answers are : yes ,No or unkown

the results of distribution of variables and my code are in the image below:

Then, I asked myself the following question: “why not put the data in a new table (subNew in the code) which contains only the divorced or separated persons and who have drunk 12 times this last 12months?” I think that this question answers more to my hypothesis that I put in the 1st week (the hypothesis was :the more we consume alcohol, the number of divorce cases increases). For this I used the following condition in my code

Finally, I display the result of this category of people with the same three variables defined above.

This is the code I wrote and the results :

Conclusion and remarks :

I notice that the number of separated or divorced person decreases a little (from 5401 to 2789 for divorced,and from 1445 to 712 for separated) but it remains considerable and especially the distribution of drinking status in this case is current drinker that means every person who have drunk 12 times this last 12months and are separated or divorced are all current drinker

#assignment #datavisualization #coursera

0 notes

houdabssr · 5 years ago

Text

First assignment

The project : This project is my first work in data managment and visualizations lesson. First, in the choices section I hesitated between nesarc and add health, the two books are interesting for me and I ended up choosing nesarc.

After that,I started to filter well in the book and see the different information it contains.I determined my first topic which is the consumption of alcohol, according to the book people consume different types of alcohol (beer, wine, collers, liquor ...) I extracted some data and variables from the book as the frequency of drinking.

Then I got interested in the relationship between alcohol consumption and divorce. It was divorce my second topic.I looked for the number of divorce in recent years and its causes.

Hypothesis : I suppose that the more we consume alcohol, the number of divorce cases increases.

I did some research and I found that divorce leads to consumption (article 1 below) but the opposite is not always true, on the other research done on the inhabitants,it affirm that the consumption of alcohol and the divorce cases are correlated,each one increases the other (article 2).

In my opinion, it is necessary to do other more detailed research and develop models to have clearer answers. And that is what I intend to do by following the rest of the training .

Article 1 : https://www.tandfonline.com/doi/abs/10.1300/J279v12n01_08

Article 2 : https://www.jsad.com/doi/abs/10.15288/jsa.1999.60.647

#datavisualization #datascience #firstproject

1 note · View note