Tumgik
pustolovka-blog · 4 years
Text
K-means clustering using add_health dataset
Following the last week of the Machine Learning for Data Analysis course, I conducted k-means clustering analysis using the add_health dataset. I focused on the examples from the course, looking at different variables that could have an impact on students’ GPA. Clustering variables included violent behavior, alcohol consumption, marijuana consumption, school connectedness, family connectedness, parental presence, depression, self esteem, parental activity and alcohol problems. All variables were standardized to have a mean of 0, and standard deviation of 1. The code is presented bellow.
Tumblr media Tumblr media
Data was further split into a training and a test dataset, using 70% for training, and 30% for testing. From there, using Euclidian distance, I conducted a series of k-means cluster analysis specifying k=1-9 clusters. The variance in the clustering variables that was accounted for by the clusters (r-square) was plotted for each of the nine cluster solutions in an elbow curve to provide guidance for choosing the number of clusters to interpret. The code is bellow:
Tumblr media
As a result, I got a plot visualizing the potential number of clusters. The output is bellow:
Tumblr media
The elbow curve was inconclusive, suggesting that the 2, 3 or 9 clusters could be interpreted. I opted for interpreting 3 using the following code:
Tumblr media
As a result I got a scatterplot with the two cannonical varables as shown in the scatterplot bellow:
Tumblr media
Cluster three (turquoise) was quite distinct, having little overlap with the other two, however, there was significant variance withing the cluster. Clusters one (yellow) and 2 (purple) showed little variance within the clusters, but there was some overlap among them. Based on the results, it is possible that two clusters would suffice for interpreting this data. 
I finished my analysis by merging variables and interpreting the output using the following code: 
Tumblr media Tumblr media
As a conclusion, I could see that clusters 1 and 2 (in the table bellow) showed greatest difference. In cluster 1 were adolescents who scored high on alcohol and marijuana consumption, violence, depression and deviant behaviour and had low self esteem, low parental activity and family connectedness. Cluster 1 represents adolescents encountering trouble and difficulties in their lives. On the other hand, in cluster 0, there were adolescents who scored low on alcohol and marijuana consumption, violence, deviant behavior and similar, while scoring high on self esteem, parental activity, family connectedness etc. These were adolescents who were less or not troubled in their lives. Cluster 2 included those who had average results on most of the variables.
Tumblr media
In order to validate the results, I conducted Analysis of Variance (ANOVA) to to test for significant differences between the clusters on grade point average (GPA). A tukey test was used for post hoc comparisons between the clusters. The code is bellow:
Tumblr media
As a result, I could see that there were significant differences between the clusters on GPA  (picture bellow). The tukey post hoc comparisons showed significant differences between clusters on GPA, with the exception that clusters 0 and 2 were not significantly different from each other. Adolescents in cluster 2 had the highest GPA (mean=2.99, sd=0.73), and cluster 1 had the lowest GPA (mean=2.43, sd=0.79).
Tumblr media Tumblr media
0 notes
pustolovka-blog · 4 years
Text
Week 3: Lasso Regression
Following the course of week 3 on Lasso Regression, I conducted the task using the add_health dataset. Compared to the analysis conducted in the lectures, I choose my target/response variable to be depression (DEP1), not school connectedness (SCHCONN1). Analysis connected in this assignment shows which other variables are most connected to depression.
I wrote the following code:
Tumblr media Tumblr media Tumblr media
As a result, I first got a list of variables that had significant connection to my target variable DEP1, as well as a list of those that had no significance and were thus removed by lasso regression.
Strongly connected:
‘ESTEEM1’: -1.7652286145736722,
'SCHCONN1’: -1.085865883493359,
'FAMCONCT’: -0.76871421576888954
Not significant:
'ALCEVR1’: 0.1536519510237751,
'COCEVER1’: 0.135117961131035,    
'GPA1’: -0.090492604677195485,
NAMERICAN’: 0.095304431061106198,
'CIGAVAIL’: 0.064324369645845383,
'PARPRES’: -0.060793321061830156,
'ASIAN’: 0.047412176135257091,  
'MAREVER1’: 0.0050799568064496927,
'HISPANIC’: 0.0,
'BLACK’: 0.0,
'EXPEL1’: 0.0,
'INHEVER1’: 0.0,
My code for visualizing this shows the mentioned factors in the plot bellow. Self-esteem, school connectedness and family connectedness are all negatively associated with depression.
Tumblr media
Mean square error on each fold shows that that MSE levels off and becomes flat (stable) at 3, indicating that only 3 folds are required.  
Tumblr media
Also looking at MSE, my analysis showed similar results for both training (29.56) and test datasets (30.94), as well as r-squared values (training 0.296, and test 0.324). The R-squares of .3 indicate moderate model fit for this LASSO regression. 
Tumblr media
0 notes
pustolovka-blog · 4 years
Text
Machine Learning for Data Analysis - Random Forests
Progressing with the course of Machine Learning for Dana Analysis, this week I conducted random forests analysis on the add_health dataset. Following the course instructions, after uploading the dataset, I selected all the variables such as sex, race, ethnicity, alcohol consumption, marijuana consumption, GPA, relationship with parents etc (see the full list in the code attached). 
The code I wrote was the following:
Tumblr media Tumblr media
Similar to the results shown in the lectures, my output showed that the variable most connected to regular smoking was previous consumption of marijuana 0.132), followed by deviant behaviour (0.0772) and GPA (0.0720). On the other hand, least significant variables are variables nativeamerican and asian.
Tumblr media
When checking accuracy, my analysis showed that the initial tree had an accuracy slightly higher that 82%, while after running random forests, the accuracy increased only slightly to somewhat bellow 84%. This shows that running one tree would have been sufficiently accurate.
Tumblr media
0 notes
pustolovka-blog · 5 years
Text
Machine Learning for Data Analysis - Decision Trees
Following the Data Analysis and Interpretation Specialization, I started the course focusing on machine learning. While I had been working with the Gapminder dataset in the previous courses of this specialization, for this assignment I followed the suggested add health data set. Given that I was not as acquainted with the data, I primarily focused on the examples given in class for delivering this assignment.
The code I used was the following:
from pandas import Series, DataFrame import pandas as pd import numpy as np import os import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import classification_report import sklearn.metrics
data = pd.read_csv ("/Users/noraborealis/anaconda3/tree_addhealth.csv")
""" Data Engineering and Analysis """
#Load the dataset
AH_data = pd.read_csv("tree_addhealth.csv") data_clean = AH_data.dropna()
data_clean.dtypes data_clean.describe()
""" Modeling and Prediction """ #Split into training and testing sets
predictors = data_clean[['BIO_SEX','HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN', 'age','ALCEVR1','ALCPROBS1','marever1','cocever1','inhever1','cigavail','DEP1', 'ESTEEM1','VIOL1','PASSIST','DEVIANT1','SCHCONN1','GPA1','EXPEL1','FAMCONCT','PARACTV', 'PARPRES']]
targets = data_clean.TREG1
pred_train, pred_test, tar_train, tar_test  =   train_test_split(predictors, targets, test_size=.4)
pred_train.shape pred_test.shape tar_train.shape tar_test.shape
#Build model on training data
classifier=DecisionTreeClassifier() classifier=classifier.fit(pred_train,tar_train)
predictions=classifier.predict(pred_test)
sklearn.metrics.confusion_matrix(tar_test,predictions) sklearn.metrics.accuracy_score(tar_test, predictions)
#Displaying the decision tree from sklearn import tree #from StringIO import StringIO from io import StringIO #from StringIO import StringIO from IPython.display import Image out = StringIO() tree.export_graphviz(classifier, out_file=out) import pydotplus graph=pydotplus.graph_from_dot_data(out.getvalue()) Image(graph.create_png())
After many technical challenges with running Graphviz both on Mac and on Windows, I finally got the desired result - graphic visualization of my decision tree. The goal of the decision tree was to test nonlinear relationships between a binary categorical variable and many explanatory variables. The code written tested all possible separations and cut points, and the result was the following: 
Tumblr media
In order to account for a diverse array of factors contributing to smoking experimentation, variables such as age, race, gender, use of marijuana, alcohol, inhalants, violence, self-esteem, parental socio-economic status (receiving social aid), and several others (see the code) were used. As python does not allow pruning, the tree is very complex containing many leaves and difficult to understand, thus not very useful for further understanding the data.
0 notes
pustolovka-blog · 5 years
Text
Gapminder Dataset - Testing Logistic Regression Model
When I first started using the GapMinder dataset, I was curios to understand whether countries with greater income per person also spend more electricity. After doing multiple regression analysis, I understood that there were potential confounding factors such as urban rate and polity score. I decided to test these using logistic regression model.
Since my variable was quantitative, I first had to change it into categorical where 0 indicated low electricity consumption and 1 indicated high electricity consumption. The method of binning the variable into two categories was done in order to fulfill the assignment using the median value, however, I do not feel that these categories show the complexity of energy consumption distribution.
Tumblr media
Then I moved on to conducting logistic regression analysis. Here is my code:
Tumblr media
My output
Tumblr media Tumblr media
Two our of three of my explanatory variables were found to have a significant positive influence on electric consumption
urban rate (Beta=0.049, P=0.022)
income per person (Beta=0.0006, P<0.000)polity score
Polity score did not show positive influence on electric consumption  (Beta=-0.039, P=0.418)
When it comes to odds ratios, my output again confirms the previous results. The polity score close to 1 (0.96) shows that polity score does not have significant influence on electric consumption. Here, income per person ratio is also close to 1 (1,0006) which implies it is less significant, and urban rate is slightly above 1 (1,051) showing some significance and indicating that countries with greater urbanization rate have greater electric consumption.
Given the low scores it is difficult to draw any specific conclusions from this type of model. As indicated above, given the type of quantitative data used, I do not find the logistic regression model as most fitting for drawing conclusions.
0 notes
pustolovka-blog · 5 years
Text
Working on the Gapminder dataset, for the third week of the course I conducted polynomial regression analysis using the rate of electricity consumption and income.
My conclusion after having conducted multiple regression models is that the hypothesis that income and energy consumption are correlated is correct. In addition, I have learned that the correlation is not linear, and that while low income countries have low electricity consumption, energy consumption grows with income, with the exception of some low income countries that have very high energy consumption. This can be attributed to emerging economies which still have low income per capita, but an increase in population and demand for electricity.
In the beginning, I created both first and second order polynomials and displayed them on a scatterplot in order to see whether a linear or a curved model better fits the data.
Tumblr media Tumblr media
The scatterplot showed me that a curved line catches the nonlinear nature of the association better.
Tumblr media
In order to adapt my model to the results, I first centered the variables I was testing and then ran simple, quadratic and cubic regression analysis to see how the model changes.
Tumblr media
My output was the following:
Simple regression analysis
Tumblr media
Looking at the results, I could see that the p value was less than 0.5 and that the parameter estimate of 4.7 was showing a positive correlation between income and electric consumption. Furthermore, R-squared indicated that model was capturing 42% of the variability.
Quadratic regression analysis
Tumblr media
After introducing the quadratic term of the electricity consumption variable, the model improved. Both p values were lower than 0.5 and the parameter estimate showed that the curve began at a lower point, went up and then went down again, just as the scatterplot showed. The R-squared value also increased indicating that the model was capturing 63% of the variability. 
The warning displayed in the model is expected given that variable electric consumption squared is of course correlated with the variable electric consumption. Both variables are kept in the model in order to account for the curved line. 
Cubic regression analysis
Tumblr media
In order to check for further nonlinear aspects of the model, I also ran a cubic regression analysis. While the p value and the parameter estimate showed significant correlation, the R-squared value decreased compared to the model with quadratic regression. 
In order to test the multiple regression model, I first added another variable - urban rate and then conducted a qqplot test.
Tumblr media
After inserting a new variable, the regression analysis output I received was the following:
Tumblr media
I could see that the p values of electric consumption and electric consumption squared stayed significant even after adding a new variable. Furthermore, I could see that the intercept value for income variable was significant, and that the R-squared value was high.
The results of the q-q plot test were the following:
Tumblr media
The q-q test showed that most residuals followed a straight line, with the exception of some at the very top and bottom of the line. This shows that other factors could be attributed to the variability, not just income and urban rate.
After the q-q plot test, I tested standardized residuals:
Tumblr media
The output I got was the following:
Tumblr media
Based on the results, I could see that 95% of the countries fall between two standard deviations. However, one country appears to be an extreme outlier falling beyond 3 standard deviations. 
Finally, I conducted the leverage plot test:
Tumblr media
The output I got was the following:
Tumblr media
The leverage plot shows that there are several outliers that fall outside the 2 standard deviations. However, the plot also shows that there leverage is almost insignificant. One observation that does have high leverage (201) is not an outlier which makes the model quite sound. 
0 notes
pustolovka-blog · 5 years
Text
Linear Regression Analysis using GapMinder dataset
Since the first course, my interest was in understanding the association between income level (explanatory variable) and energy consumption (response variable) in different countries. GapMinder dataset uses solely quantitative data which means that in this assignment I did the following:
1) ran code to find the mean for my explanatory variable “incomeperperson”
Tumblr media
The mean for the explanatory variable before centering was 8740.97:
Tumblr media
2) centred the mean to 0 (or close to 0)
Tumblr media
3) ran code to test that the mean was centered to 0 and got the following output:
Tumblr media
4) ran linear regression analysis for the variables of income and electricity consumption:
Tumblr media
5) my output was the following:
Tumblr media
The conclusion of this linear regression analysis is that there is significant association given that the f-statistic is 94.47, and the associated p-value is very low 4.63e-17. The intercept is 3391.2, and the slope coefficient is 4.7. The analysis was done on 130 observations and the dependent variable is income level. This confirms that there is a positive association between income level and electric consumption.
0 notes
pustolovka-blog · 5 years
Text
Understanding the association between income level and electricity consumption
Sample
The sample used for understanding the association between income level and electricity consumption comes from the GapMinder dataset. Promoting sustainable global development, GapMinder collects information on various factors related to the living conditions of a society in all 192 UN member states, and additional 24 areas. Data generated relates to factors such as HIV rate, gross domestic product, unemployment rates etc.
The population studied are individual countries (192) plus the additional 24 areas (such as West Bank and Gaza). Given the complexity of the variables (15 in total), and the diverse sources of information, the dataset does not contain information on each of the factors for each of the countries studied.
Procedure
The original purpose of data collection was the promotion of sustainable development. Data was collected using data reporting to the different reliable agencies and sources - United Nations Statistics Division, World Bank, Institute for Health Metrics and Evaluation and US Census Bureau’s International Database. Data was collected between 2002 and 2011. The collection of data was done by the specific agencies - alcohol consumption: WHO, female employment rate: ILO, GDP per capita: World Bank, etc.
Measures/Variables
In order to understand whether there was an association between income level and electricity consumption, I looked at the GDP per capita factor in constant 2000US$ and residential electricity consumption per person in kWh. In order to better understand the data, I excluded those countries which did not have information on one or the other factor, and grouped the income variable into: poverty, low, middle and high income countries. I also managed the electricity consumption variable by creating categories of very low, low, middle and high electricity consumption. I tested the variables both as categorical and numerical using different analysis tools, tested for confounding variables, and always got a positive association between income and electricity consumption - higher income countries have greater electricity consumption.
0 notes
pustolovka-blog · 5 years
Text
Testing a moderator using the correlation coefficient in the Gapminder dataset
In order to understand whether the correlation between income level and electric consumption in different countries is related to an external third factor, I used the Pearson correlation coefficient and included the variable showing the level of democracy (polityscore). This variable is expressed on a scale from -10 to 10, so I divided countries into two categories: low democracy and high democracy. From there I conducted my analysis.
#testing for moderation
def polity (row):    if row ["polityscore"] <= 0:        return 1    elif row ["polityscore"]<=10:        return 2
mydata_clean["polity"] = mydata_clean.apply(lambda row: polity (row), axis=1) chk1 = mydata_clean["polity"].value_counts(sort=False, dropna = False)
low = mydata_clean[(mydata_clean["polity"]==1)] high = mydata_clean[(mydata_clean["polity"]==2)]
print ("association between income level and electric consumption for low democracy countries") print (scipy.stats.pearsonr(low["incomeperperson"], low["relectricperperson"]))
print ("association between income level and electric consumption for high democracy countries") print (scipy.stats.pearsonr(high["incomeperperson"], high["relectricperperson"]))
My results show a correlation in both categories, with the one in high democracy countries being stronger. The p-value associated with both categories is strong as well.
association between income level and electric consumption for low democracy countries (0.6448264326053613, 5.105940761759414e-05) association between income level and electric consumption for high democracy countries (0.8466751098198433, 7.709054663738328e-26)
This is also visible on the scatterplot:
Tumblr media Tumblr media
0 notes
pustolovka-blog · 5 years
Text
Pearson correlation for the Gapminder dataset
Continuing the work on the gapminder dataset, I was interested in understanding whether there was a correlation between electric consumption and income per person, as well as electric consumption and employment rate. For this purpose I conducted the Pearson correlation test:
mydata_clean = mydata.dropna()
print ("association between electric consumption and income") print (scipy.stats.pearsonr(mydata_clean["incomeperperson"], mydata_clean["relectricperperson"]))
print ("association between electric consumption and employment") print (scipy.stats.pearsonr(mydata_clean["employrate"], mydata_clean["relectricperperson"]))
In the output I could see a positive and strong correlation between income levels and electric consumption, and a weak correlation between employment rate and electric consumption. 
association between electric consumption and income (0.6536076842568537, 4.5973586072831085e-17) association between electric consumption and employment (0.1437964201546844, 0.10399697594348797)
In conclusion, higher income results in greater electricity consumption. The r coefficient is 0.65, and the p-value is extremely high with 4.5973586072831085e-17. If we square the r coefficient, we have a 42% chance to predict electric consumption. 
0 notes
pustolovka-blog · 5 years
Text
Chi-Square test of independence
Working with the Gapminder dataset, for the purpose of this assignemnt I worked with previously created categories of income (poverty, low, medium, high), and electric consumption (very low, low, medium high).
As the instructions in the course imply, running a 4x4 chi-square test is usually not done in practice, I here did it for the purpose of the assignment.
Tumblr media
When doing the initial chi-square test, the very low p value of 0.0015 stipulated that there was reason to abolish Ho that there was no connection between income and electric consumption. In other words it indicated that there was connection between the two variables.
Tumblr media Tumblr media
In order to understand which of the categories of electric consumption and income were connected, I conducted the post hoc Bonferroni test on each of the explanatory variables (electric consumption. 
Tumblr media Tumblr media Tumblr media
As a result, I could confirm that there was significant connection between the category of poverty and very low electric consumption, as well as middle and high income and high electric consumption.
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
0 notes
pustolovka-blog · 5 years
Text
Analysis of variance on the Gapminder dataset
Continuing the work on the Gapminder dataset, I was interested in understanding whether electric consumption was higher in countries with higher income. During the previous course, I had grouped countries by income in 4 categories (poverty, low income, middle income and high income countries) which I used as the categorical variable. 
My null hypothesis was therefore that there is no statistical significance between the mean values of electric consumption in each group. The alternate hypothesis was that there is statistical significance.
Tumblr media Tumblr media
Using the OLS method, I got a clear result that the probability that there is significant difference is very high. With the result 1.46e-14, it meant that the p value was 0.0000000000000146 that the hypothesis null was false and that the alternate hypothesis was correct. This meant that there was a correlation between income and electric consumption.
As I had more than 2 categories, I also did a post hoc test to determine in which of the categories there was statistical significance.
Tumblr media
As the results show, the alternate hypothesis is correct for the category of poverty and low, middle and high income, as well as for the group of low and middle income. It is however not correct for the categories of low income and high income, as well as for middle income and high income. 
0 notes
pustolovka-blog · 5 years
Text
Visualizing data
For assignment 4, I have created univariate graphs, and bivariate graphs for the Gapminder variables of income per person, electric  consumption rate, and employment rate that I selected in the beginning.
Tumblr media
The univariate graph for income per person shows a skewed right distribution indicating the there most countries have very low income, and that few have medium, high or very high income per person. Furthermore, we see a unimodal distribution where the majority of countries are in the category with lowest income.
Tumblr media
For the variable of urban electric consumption per person we see a similar tendency as with income. Most countries have a low electric consumption ratio and the graph is skewed right with the rest following a decrease in electric consumption.
Tumblr media
With employment rate variable, the distribution is somewhat different. We see a bimodal distribution, with the majority of values centered around the lowest and somewhat high employment rate.  
When it comes to my research question, I was interested to know whether countries with greater income and greater employment rate, also have greater electrical consumption habits. Changing my variables into quantitative, I did a scatterplot for my bivariate graphs.
Tumblr media
Using the variables of income and electric consumption, I can see that there is a positive correlation indicating that countries with lower income use less electricity, whereas countries with higher income spend more electricity. This proves my hypothesis.
Tumblr media
For the correlation of employment rate and electric consumption I cannot see any tendencies. Employment rate is distributed somewhat equally and does not correlate with electric consumption habits. 
0 notes
pustolovka-blog · 6 years
Text
Managing data in the Gapminder dataset
As it turns out, struggling to finish the assignment in week 2, resulted in me doing the work for week 3 in advance. Anyhow, given that I had already applied some of the data management options, i used this week to refine the details.
My primary decision regarding data management was grouping or binning as it best suited the dataset. Firstly, I selected my own range for binning each of the variables - incomeperperson, electricalconsumption and employment rate. Doing this gave me more creadible data when using the value_counts function.
income binning:
0-5000
5001-10000
10001-15000
15001-20000
Tumblr media Tumblr media
electric consumption binning
0-50
51-100
101-500
501-1000
Tumblr media Tumblr media
employment binning
20-40
40-60
60-80
80-100
Tumblr media Tumblr media
Based on these categories/bins I managed to do frequency distribution and extrapolate count and percentage of these categories.
Tumblr media
In conclusion, my data shows that income inequality globally is rather high. More than half of the world’s countries (115, meaning 61%) fall in the lowest income category, while only 7 (meaning 0.03%) fall in the highest income category. When it comes to electric consumption, there are more countries that are in the highest consumption categories, than in the lower ones. In total 33% of the countries are in the upper electricity consumption groups. Here, it is important to note that more than half of that data is missing (55% are nan). Finally, regarding employment, there are no countries in the lowest employment category, and 52% of the countries are in the middle category regarding employment. 
0 notes
pustolovka-blog · 6 years
Text
Analyzing the Gapminder dataset in Python
Working with the Gapminder dataset imposed several challenges that the course did not provide solutions to. Given that this dataset does not give specific values that can be counted, I had to dig deeper and find a suitable solution. 
After struggling a lot with identifying quantifiable variables in this dataset, I found the solution in grouping up certain values (income, electric consumption and employment rate) in 4 categories. I used the binning option, or as it is defined in the program pandas.cut function to create meaningful categories I could work with. Bellow, I explain the way I ran my program, and the results I got.
Tumblr media
I started with the regular functions as explained in the course.
Tumblr media
Then I moved on to creating categories, or labels as written in the program. Using the coerce function, I removed unanswered fields and with the pandas.cut function I created 4 labels for each of the variables I looked into. 
Tumblr media
Only after doing this, I could use the value.counts function for doing frequency distribution with my data. Following the course instructions, and adding the bins=4 option, I successfully got both the count and percentage for each of my variables.
RESULTS:
Tumblr media Tumblr media
1. Results for the variable incomeperperson when using pandas.cut function.
Tumblr media Tumblr media
2.  Results for the variable electric consumption per person also using pandas.cut function
Tumblr media Tumblr media
3. Results for the employment rate variable using pandas.cut function.
Finally, once I managed to separate these variables and categories, I did the frequency distribution and got following results.
Tumblr media
What my results currently show is that the majority of the countries in the dataset follow bellow the median: 169+18, meaning 79% countries in the category poverty and 8% in the category low income. The same applies to electric consumption and employment. As this does not reflect the reality, I will adapt the parameters as the course advances. 
As I would still like to keep my research question open, I did not select rows, as that would eliminate certain countries. I will definitely look into removing certain countries from the analyses at a later point.
0 notes
pustolovka-blog · 6 years
Text
Does greater income result in higher residential electricity consumption?
Both my educational background and my current job are intrinsically linked to sustainable development, the challenges we face and possibilities we have to improve the societies we live in. For these reasons, I have chosen to look at the Gapminder dataset.
I am interested in understanding people’s consumption habits, especially their energy consumption. Furthermore, I am interested in knowing whether higher income per person leads to greater energy consumption. It is my hypothesis that people in developed countries consume more energy which leads to greater extraction of resources and environmental damage. Therefore, based on the available data in the data set, I will look at the variables of income per person and residential electricity consumption and see whether there is a correlation. It is my hypothesis that countries with higher income per person ratio will have greater residential electricity consumption per person.
Research shows that development and growth are directly linked to greater energy consumption (Brown et al. 2011, Yalcintas and Kaya 2017). Furthermore, there are critical questions of whether current trends of population and economic growth can be sustained by extracting resources necessary to provide energy to the modern day societies (Brown et al. 2011). And while energy and energy consumption can relate to many different areas, most energy, whether deriving from traditional or renewable sources is transformed into electricity (Liu et al. 2016). Providing necessary energy for the use of different household appliances, electricity is directly linked to the consumption of refrigerators, air conditioning devices, stoves, owens and many other devices in countries around the world (Bouznit et al. 2018, alcintas and Kaya 2017, Liu et al. 2016). Based on this literature, there is a clear correlation in countries’ development represented in the income per person variable and residential energy consumption.
In this light, using the two variables (income per person and residential electricity consumption), I wish to understand the electricity consumption practices of Serbian citizens in relation to other, more and less developed countries in the data set.
1 note · View note