swaraj101-blog
swaraj101-blog
Regression Modeling in Practice
2 posts
Don't wanna be here? Send us removal request.
swaraj101-blog · 6 years ago
Text
Writing About Your Data
Sample:
The sample contains 9,358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level, within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on-field deployed air quality chemical sensor devices responses.  Missing values are tagged with -200 value. The sample contains a total of 15 attributes.
Procedure:
This sample contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer.
Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and was provided by a co-located reference certified analyzer. Evidence of cross-sensitivities, as well as both concept and sensor drifts, are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (link) eventually affecting sensors concentration estimation capabilities
Variables:
The sample contains a total of 15 attributes. The measures include the True hourly averaged concentration as per reference certified analyzer :
CO in mg/m^3
Overall Non Metanic HydroCarbons concentration in microg/m^3 
Benzene concentration in microg/m^3 
NOx concentration in ppb(parts per billion)
NO2 concentration in microg/m^3 
Hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device:
Hourly averaged sensor response (nominally CO targeted)
Hourly averaged sensor response (nominally NMHC targeted)
Hourly averaged sensor response (nominally NOx targeted)
Hourly averaged sensor response (nominally NO2 targeted)
Hourly averaged sensor response (nominally O3 targeted)
Measures associated with the climate of the city:
Temperature in °C
Relative Humidity (%)
AH Absolute Humidity 
2 more features of identifying records:
Date
Time
0 notes
swaraj101-blog · 6 years ago
Text
Basic Linear Regression Analysis
Centering:
I have a quantitative explanatory variable called horsepower and  I centered it so that the mean = 0 (or really close to 0) by subtracting the mean.
Mean before centering= 104.469
Mean after centering= 1.4E-13 = 0
Regression Model:
The model is testing the relationship between horsepower(hp) and miles per gallon(mpg),
We find that the F-statistic is 599.7 and the P value is infinitesimally small. Considerably less than our alpha level of .05,which tells us that we can reject the null hypothesis and conclude that horsepower is significantly associated with miles per gallon for an automobile.
The R-squared value,which is the proportion of the variance in the response variable that can be explained by the explanatory variable is 0.606. We now know that this model accounts for about 60% of the variability we see in our response variable, mpg.
The coefficient for hp is -0.1578, and the intercept is 23.4459, which gives us the regression equation: mpg=23.4459-0.1578*hp
Python Program:
#Created on Tue,Jan29 2019 #author: Swaraj Mohapatra
import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf
# bug fix for display formats to avoid run time errors pandas.set_option(‘display.float_format’, lambda x:’%.2f’%x)
#call in dataset data = pd.read_excel(“Auto-MPG.xlsx”)
# BASIC LINEAR REGRESSION # convert variables to numeric format using convert_objects function data['hp’] = pd.to_numeric(data['hp’], errors='coerce’)
reg1 = smf.ols(formula='mpg ~ hp’, data=data).fit() print (reg1.summary())
Output:
                          OLS Regression Results                             ============================================================================== Dep. Variable:                    mpg   R-squared:                       0.606 Model:                            OLS   Adj. R-squared:                  0.605 Method:                 Least Squares   F-statistic:                     599.7 Date:                Wed, 30 Jan 2019   Prob (F-statistic):           7.03e-81 Time:                        02:06:24   Log-Likelihood:                -1178.7 No. Observations:                 392   AIC:                             2361. Df Residuals:                     390   BIC:                             2369. Df Model:                           1                                         Covariance Type:            nonrobust                                         ==============================================================================                coef    std err          t      P>|t|      [95.0% Conf. Int.] —————————————————————————— Intercept     23.4459      0.248     94.625      0.000        22.959    23.933 hp            -0.1578      0.006    -24.489      0.000        -0.171    -0.145 ============================================================================== Omnibus:                       16.432   Durbin-Watson:                   1.071 Prob(Omnibus):                  0.000   Jarque-Bera (JB):               17.305 Skew:                           0.492   Prob(JB):                     0.000175 Kurtosis:                       3.299   Cond. No.                         38.4 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
1 note · View note