zaworm
zaworm
Data Management and Visualization
17 posts
Course Blog
Don't wanna be here? Send us removal request.
zaworm · 5 years ago
Text
Practice Peer-graded Assignment: Milestone Assignment 1: Title and Introduction to the Research Question
1) Title:  Net national income per capita and its association with key predictors
2) I have chosen the World Bank capstone data set and the question I would like to answer is: are there key predictors that influence the net national income per capita and can these indicators be used by other countries to make adjustments such that they can increase their income.
3) The motivation to answer this question is to see if there are achievable goals that can be set by poorer countries to increase their income and in the end ultimately help their citizens and reduce poverty.
4) The potential implications of answering this question would be to benefit the poorer citizens around the world.  Ultimately poorer countries may be able to look at what increases their nations income and what activities may be wasting and not contributing to income per capita.  The goal would be to reduce poverty of capable countries or for countries to increase their income.   
0 notes
zaworm · 5 years ago
Text
Machine Learning for Data Analysis Week 4 Running a k-means Cluster Analysis
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater?
B. Is the depth dependant on the latitude? Are craters closer to the poles shallower?
2.Notes about the Results
* see section 3 for full code and section 4 for code output
The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
Shallow craters were defined as craters with a depth of 180m (0.18km) or less.  This was the binary response variable.
Craters close to the pole were defined +-40 poleward (ie craters that were not close to the pole had latitudes of -60 to +60 degrees).  The latitude variable was classified as 0 if the craters were close to the pole and 1 if they were not close to the poles.  This was a binary explanatory variable (=1 IS_NOT_NEAR_POLES)
Wide Diameter craters were defined as craters that had a diameter larger than 5km. (=1 IS_WIDE_CRATER)
A k-means cluster analysis was conducted to identify underlying subgroups of craters  based on their similarity of responses on 5 variables that represent characteristics:IS_WIDE_CRATER, IS_NOT_NEAR_POLES, LONGITUDE_CIRCLE_IMAGE, DIAM_CIRCLE_IMAGE, LATITUDE_CIRCLE_IMAGE All clustering variables were standardized to have a mean of 0 and a standard deviation of 1.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. A series of k-means cluster analyses were conducted on the training data specifying k=1-9 clusters, using Euclidean distance. The variance in the clustering variables that was accounted for by the clusters (r-square) was plotted for each of the nine cluster solutions in an elbow curve to provide guidance for choosing the number of clusters to interpret.
Tumblr media
Figure 1. Elbow curve of r-square values for the nine cluster solutions
The elbow curve was inconclusive, suggesting that the 2,3 and 5-cluster solutions might be interpreted. The results below are for an interpretation of the 3-cluster solution.
Canonical discriminant analyses was used to reduce the 5 clustering variable down a few variables that accounted for most of the variance in the clustering variables. A scatterplot of the first two canonical variables by cluster (Figure 2 shown below) indicated the observations.
Tumblr media
Figure 2. Plot of the first two canonical variables for the clustering variables by cluster.
In order to externally validate the clusters, an Analysis of Variance (ANOVA) was conducting to test for significant differences between the clusters on crater depth. A tukey test was used for post hoc comparisons between the clusters. Results indicated significant differences between the clusters on crater depth
cluster                       0                    0.233786 1                    0.247493 2                    0.508011 standard deviations for DEPTH_RIMFLOOR_TOPOG by cluster         DEPTH_RIMFLOOR_TOPOG cluster                       0                    0.164277 1                    0.335329 2                    0.411583 Multiple Comparison of Means - Tukey HSD, FWER=0.05 ================================================== group1 group2 meandiff p-adj  lower  upper  reject --------------------------------------------------     0      1   0.0137 0.0239 0.0014  0.026   True     0      2   0.2742  0.001 0.2672 0.2813   True     1      2   0.2605  0.001 0.2485 0.2725   True --------------------------------------------------
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Week 4 - Running a k-Means Cluster Analysis
-------------------------------------------
IS_WIDE_CRATER              int64
IS_NOT_NEAR_POLES           int64
LONGITUDE_CIRCLE_IMAGE    float64
DIAM_CIRCLE_IMAGE         float64
LATITUDE_CIRCLE_IMAGE     float64
dtype: object
-------------------------------
       IS_WIDE_CRATER  ...  LATITUDE_CIRCLE_IMAGE
count    76520.000000  ...           76520.000000
mean         0.569603  ...              -9.997340
std          0.495135  ...              33.599395
min          0.000000  ...             -86.700000
25%          0.000000  ...             -34.660500
50%          1.000000  ...             -12.159000
75%          1.000000  ...              14.828000
max          1.000000  ...              85.702000
[8 rows x 5 columns]
-------------------------------
   IS_WIDE_CRATER  IS_NOT_NEAR_POLES  ...  DIAM_CIRCLE_IMAGE  LATITUDE_CIRCLE_IMAGE
0               1                  0  ...              82.10                 84.367
1               1                  0  ...              82.02                 72.760
2               1                  0  ...              79.63                 69.244
3               1                  0  ...              74.81                 70.107
4               1                  0  ...              73.53                 77.996
[5 rows x 5 columns]
-------------------------------
train test split
Clustering variable means by cluster
                 index  ...  LATITUDE_CIRCLE_IMAGE
cluster                 ...                       
0        180746.841806  ...               0.152403
1        258779.073548  ...              -0.608429
2        197361.717082  ...              -0.009083
[3 rows x 6 columns]
                             OLS Regression Results                             
================================================================================
Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.146
Model:                              OLS   Adj. R-squared:                  0.146
Method:                   Least Squares   F-statistic:                     4572.
Date:                  Wed, 26 Aug 2020   Prob (F-statistic):               0.00
Time:                          21:45:54   Log-Likelihood:                -16413.
No. Observations:                 53564   AIC:                         3.283e+04
Df Residuals:                     53561   BIC:                         3.286e+04
Df Model:                             2                                         
Covariance Type:              nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept           0.2338      0.002    103.365      0.000       0.229       0.238
C(cluster)[T.1]     0.0137      0.005      2.620      0.009       0.003       0.024
C(cluster)[T.2]     0.2742      0.003     91.239      0.000       0.268       0.280
==============================================================================
Omnibus:                    20663.101   Durbin-Watson:                   2.008
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           113044.488
Skew:                           1.784   Prob(JB):                         0.00
Kurtosis:                       9.158   Cond. No.                         4.47
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
means for DEPTH_RIMFLOOR_TOPOG by cluster
         DEPTH_RIMFLOOR_TOPOG
cluster                      
0                    0.233786
1                    0.247493
2                    0.508011
standard deviations for DEPTH_RIMFLOOR_TOPOG by cluster
         DEPTH_RIMFLOOR_TOPOG
cluster                      
0                    0.164277
1                    0.335329
2                    0.411583
Multiple Comparison of Means - Tukey HSD, FWER=0.05
==================================================
group1 group2 meandiff p-adj  lower  upper  reject
--------------------------------------------------
     0      1   0.0137 0.0239 0.0014  0.026   True
     0      2   0.2742  0.001 0.2672 0.2813   True
     1      2   0.2605  0.001 0.2485 0.2725   True
--------------------------------------------------
End
<Figure size 432x288 with 0 Axes>
0 notes
zaworm · 5 years ago
Text
Machine Learning for Data Analysis Week 3 Running a Lasso Regression Analysis
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater?
B. Is the depth dependant on the latitude? Are craters closer to the poles shallower?
2.Notes about the Results
* see section 3 for full code and section 4 for code output
The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
Shallow craters were defined as craters with a depth of 180m (0.18km) or less.  This was the binary response variable.
Craters close to the pole were defined +-40 poleward (ie craters that were not close to the pole had latitudes of -60 to +60 degrees).  The latitude variable was classified as 0 if the craters were close to the pole and 1 if they were not close to the poles.  This was a binary explanatory variable (=1 IS_NOT_NEAR_POLES)
Wide Diameter craters were defined as craters that had a diameter larger than 5km. (=1 IS_WIDE_CRATER)
A lasso regression analysis was run to test nonlinear relationships among a series of explanatory variables and a quantitative response variable which was the crater depth.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations.
The predictor variables were:
'IS_WIDE_CRATER','IS_NOT_NEAR_POLES', 'LONGITUDE_CIRCLE_IMAGE', 'DIAM_CIRCLE_IMAGE', 'LATITUDE_CIRCLE_IMAGE'
The results from the dictionary show that all of the values where used by the model as none are zero:
{'IS_WIDE_CRATER': 0.05552232138486251, 'IS_NOT_NEAR_POLES': 0.04471569339534235, 'LONGITUDE_CIRCLE_IMAGE': 0.00309458641422248, 'DIAM_CIRCLE_IMAGE': 0.1601521016998867, 'LATITUDE_CIRCLE_IMAGE': -0.00022362030166929382}
The value that is  the largest is the diameter of the crater indicating as it is the strongest contributor to the explanatory variable.
If we plot the values we see the following
Tumblr media
The training data mean square error was very similar to the test data mean square error and their values were:
training data MSE 0.08704722988847985 test data MSE 0.08545239592311873
Also the R square values for both the test and the training were essentially the same with their values as follows:
training data R-square 0.3119452335930013 test data R-square 0.31373693600486263
The least angle regression algorithm with k=10 fold cross validation was used to estimate the lasso regression model in the training set, and the model was validated using the test set. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of predictor variables.Figure 1. Change in the validation mean square error at each stepThe mean squared error for each of the folds is shown below:
Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Week 3 - Running a Lasso Regression Analysis
-------------------------------------------
CRATER_ID                  object CRATER_NAME                object LATITUDE_CIRCLE_IMAGE     float64 LONGITUDE_CIRCLE_IMAGE    float64 DIAM_CIRCLE_IMAGE         float64 DEPTH_RIMFLOOR_TOPOG      float64 MORPHOLOGY_EJECTA_1        object MORPHOLOGY_EJECTA_2        object MORPHOLOGY_EJECTA_3        object NUMBER_LAYERS               int64 IS_NOT_NEAR_POLES           int64 IS_WIDE_CRATER              int64 dtype: object
-------------------------------
      LATITUDE_CIRCLE_IMAGE  ...  IS_WIDE_CRATER count           76520.000000  ...    76520.000000 mean               -9.997340  ...        0.569603 std                33.599395  ...        0.495135 min               -86.700000  ...        0.000000 25%               -34.660500  ...        0.000000 50%               -12.159000  ...        1.000000 75%                14.828000  ...        1.000000 max                85.702000  ...        1.000000
[8 rows x 7 columns]
-------------------------------
  CRATER_ID CRATER_NAME  ...  IS_NOT_NEAR_POLES  IS_WIDE_CRATER 0  01-000000              ...                  0               1 1  01-000001     Korolev  ...                  0               1 2  01-000002              ...                  0               1 3  01-000003              ...                  0               1 4  01-000004              ...                  0               1
[5 rows x 12 columns]
-------------------------------
train test split
Lasso Regression Model
dict
{'IS_WIDE_CRATER': 0.05552232138486251, 'IS_NOT_NEAR_POLES': 0.04471569339534235, 'LONGITUDE_CIRCLE_IMAGE': 0.00309458641422248, 'DIAM_CIRCLE_IMAGE': 0.1601521016998867, 'LATITUDE_CIRCLE_IMAGE': -0.00022362030166929382} training data MSE 0.08704722988847985 test data MSE 0.08545239592311873 training data R-square 0.3119452335930013 test data R-square 0.31373693600486263
End
0 notes
zaworm · 5 years ago
Text
Machine Learning for Data Analysis Week 2 Running a Random Forest
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater?
B. Is the depth dependant on the latitude? Are craters closer to the poles shallower?
2.Notes about the Results
* see section 3 for full code and section 4 for code output
The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
Shallow craters were defined as craters with a depth of 180m (0.18km) or less.  This was the binary response variable.
Craters close to the pole were defined +-40 poleward (ie craters that were not close to the pole had latitudes of -60 to +60 degrees).  The latitude variable was classified as 0 if the craters were close to the pole and 1 if they were not close to the poles.  This was a binary explanatory variable (=1 IS_NOT_NEAR_POLES)
Wide Diameter craters were defined as craters that had a diameter larger than 5km. (=1 IS_WIDE_CRATER)
A randome forest analysis was run to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable.
The explanatory variables were: 
'IS_WIDE_CRATER' (yes or no) 'IS_NOT_NEAR_POLES' (yes or no) 'LONGITUDE_CIRCLE_IMAGE' (longitude value) 'DIAM_CIRCLE_IMAGE' (crater diameter value) 'LATITUDE_CIRCLE_IMAGE' (latitude value)
The categorical response variable was the crater depth ( deep or shallow crater)
The confusion matrix shows:
[[ 7230  4391] [ 2288 16699]] Which shows 7230 true positives and 16699 true negatives for crater size.  there is also 4391 false positives  and 2288  false corrections.
With an accuracy score of  0.7817890747516989 which is rounded to 78% were correctly identified
The feature importance shows the following. [0.060825   0.03014673 0.24935164 0.30946463 0.35021201]
The first 2 variables are binary values that indicate if the crater is wide or not near the poles.  The values are 6% and 3% which shows they are of little importance.  The most important variable is the latitude with a value of 35%, followed by the diameter with a value of 31% and the longitude with a value of 24% on importance.
Performing a series of random forest classifications shows that after approximately 20 runs we have a convergences which shows that using a single decision tree would not have been appropriate as using too few would have resulted in accuracy of 68% versus 78% when using more.
Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Tumblr media
0 notes
zaworm · 5 years ago
Text
Machine Learning for Data Analysis Week 1 Running a Classification Tree
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater?
B. Is the depth dependant on the latitude? Are craters closer to the poles shallower?
2.Notes about the Results
* see section 3 for full code and section 4 for code output
The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
Shallow craters were defined as craters with a depth of 180m (0.18km) or less.  This was the binary response variable.
Craters close to the pole were defined +-40 poleward (ie craters that were not close to the pole had latitudes of -60 to +60 degrees).  The latitude variable was classified as 0 if the craters were close to the pole and 1 if they were not close to the poles.  This was a binary explanatory variable (=1 IS_NOT_NEAR_POLES)
Wide Diameter craters were defined as craters that had a diameter larger than 5km. (=1 IS_WIDE_CRATER)
A decision tree analysis was run to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable. 
The 2 explanatory variables were: diameter (wide crater) and latitude (craters not near the poles)
The categorical response variable was the crater depth ( deep or shallow crater)
The confusion matrix shows:
[[ 1107 10638] [   31 18832]]
Which shows 1107 true positives and 18832 true negatives for crater size.  there is also 31 and 10638 false corrections.
The Score is  0.6514309984317825 which shows 65% were correctly identified.
The decision tree is as follows:
The first split is on the diameter of the crater and the second is on the crater latitude.
Tumblr media
If we look at the lower left we see that craters with a small diameter and near the poles there are 1,555 are not deep craters whereas 69 are large craters.
If we look at the far right at the bottom we see that 5885 are not deep craters and 17729 are deep craters indicating that craters with a classified large diameter and latitude closer to the equator have larger craters.
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Tumblr media Tumblr media
0 notes
zaworm · 5 years ago
Text
Regression Modeling in Practice Week 4 Test a Logistic Regression Model
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater?
B. Is the depth dependant on the latitude? Are craters closer to the poles shallower?
2.Notes about the Results
* see section 3 for full code and section 4 for code output
The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
Shallow craters were defined as craters with a depth of 180m (0.18km) or less.  This was the binary response variable.
Craters close to the pole were defined +-40 poleward (ie craters that were not close to the pole had latitudes of -60 to +60 degrees).  The latitude variable was classified as 0 if the craters were close to the pole and 1 if they were not close to the poles.  This was a binary explanatory variable (=1 IS_NOT_NEAR_POLES)
Wide Diameter craters were defined as craters that had a diameter larger than 5km. (=1 IS_WIDE_CRATER)
The visualization of the data can be shown by the binary explanatory and response variables below:
Tumblr media Tumblr media Tumblr media
The first Logit Regression involved classifying the craters as either deep or shallow (into a binary variable).  The regression compared wide craters against the depth and it P value of 0 showed there was statistical significance and a positive coefficient.  I then compared crater latitude against the depth and showed there was statistical significance with a p value of 0.0 and a positive coefficient.
  I then added both of the variables and performed a Logit Regression analysis and the results are shown below.  The results show that there are no confounding variables and both are independently significant.  The correlation coefficient for wide craters is 1.18 and for craters closed to the equator the value is 1.29.  This shows that for wider diameter craters and craters closer to the equator the craters are generally deeper than less wide and craters close to the north and south pole.
The Odds ratio for wide craters is 3.26x with a 95% confidence interval of 3.16 to 3.36.
Wide craters are 3.26 times more likely to have deep craters than small diameter craters after controlling for crater latitude. 
The Odds ratio for craters not near the poles is 3.66x with a 95% confidence interval of 3.45 to 3.85.
Also, craters with latitudes not near the poles are 3.65 times more likely to be deeper than craters closer to the poles, after controlling for the presence of crater diameter. 
Because the confidence intervals on our odds ratios do not  overlap, we can say that crater latitude is more strongly associated with crater depth than diameter. 
The results show that the hypothesis was correct and both the latitude and diameter play a role in the crater depth.
Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Tumblr media Tumblr media
Mars Crater Study
Regression Modeling in Practice - Week 4
Test a Logistic Regression Model
-------------------------------------------
Here are the counts for deep and shallow craters craters:
DEEP       47395 SHALLOW    29125 Name: IS_CRATER_DEEP, dtype: int64 Response Variable Categorical 2 Levels: Crater Depth (shallow or deep)
Info about this subset and narrowed down  dataset
Total number of rows in the dataset: 76520
Total number of columns in the dataset: 8 Optimization terminated successfully.         Current function value: 0.628466         Iterations 5                           Logit Regression Results                           ============================================================================== Dep. Variable:     IS_CRATER_DEEP_NUM   No. Observations:                76520 Model:                          Logit   Df Residuals:                    76518 Method:                           MLE   Df Model:                            1 Date:                Tue, 28 Jul 2020   Pseudo R-squ.:                 0.05404 Time:                        15:48:34   Log-Likelihood:                -48090. converged:                       True   LL-Null:                       -50837. Covariance Type:            nonrobust   LLR p-value:                     0.000 ==================================================================================                     coef    std err          z      P>|z|      [0.025      0.975] ---------------------------------------------------------------------------------- Intercept         -0.1204      0.011    -10.904      0.000      -0.142      -0.099 IS_WIDE_CRATER     1.1267      0.015     72.891      0.000       1.096       1.157 ================================================================================== Odds Ratios Intercept         0.886578 IS_WIDE_CRATER    3.085475 dtype: float64 Optimization terminated successfully.         Current function value: 0.651050         Iterations 4                           Logit Regression Results                           ============================================================================== Dep. Variable:     IS_CRATER_DEEP_NUM   No. Observations:                76520 Model:                          Logit   Df Residuals:                    76518 Method:                           MLE   Df Model:                            1 Date:                Tue, 28 Jul 2020   Pseudo R-squ.:                 0.02004 Time:                        15:48:35   Log-Likelihood:                -49818. converged:                       True   LL-Null:                       -50837. Covariance Type:            nonrobust   LLR p-value:                     0.000 =====================================================================================                        coef    std err          z      P>|z|      [0.025      0.975] ------------------------------------------------------------------------------------- Intercept            -0.5560      0.025    -22.321      0.000      -0.605      -0.507 IS_NOT_NEAR_POLES     1.1526      0.026     44.096      0.000       1.101       1.204 ===================================================================================== Odds Ratios Intercept            0.573496 IS_NOT_NEAR_POLES    3.166426 dtype: float64 Optimization terminated successfully.         Current function value: 0.613005         Iterations 5                           Logit Regression Results                           ============================================================================== Dep. Variable:     IS_CRATER_DEEP_NUM   No. Observations:                76520 Model:                          Logit   Df Residuals:                    76517 Method:                           MLE   Df Model:                            2 Date:                Tue, 28 Jul 2020   Pseudo R-squ.:                 0.07731 Time:                        15:48:35   Log-Likelihood:                -46907. converged:                       True   LL-Null:                       -50837. Covariance Type:            nonrobust   LLR p-value:                     0.000 =====================================================================================                        coef    std err          z      P>|z|      [0.025      0.975] ------------------------------------------------------------------------------------- Intercept            -1.3198      0.028    -47.083      0.000      -1.375      -1.265 IS_WIDE_CRATER        1.1819      0.016     74.704      0.000       1.151       1.213 IS_NOT_NEAR_POLES     1.2944      0.027     47.464      0.000       1.241       1.348 ===================================================================================== Odds Ratios Intercept            0.267190 IS_WIDE_CRATER       3.260724 IS_NOT_NEAR_POLES    3.648895 dtype: float64 Odds Ratios                   Lower CI  Upper CI        OR Intercept          0.252906  0.282280  0.267190 IS_WIDE_CRATER     3.161160  3.363424  3.260724 IS_NOT_NEAR_POLES  3.458975  3.849243  3.648895
0 notes
zaworm · 5 years ago
Text
Regression Modeling in Practice Week 3 Test a Multiple Regression Model
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater? 
The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
2.Notes about the Results
* see section 3 for full code and section 4 for code output
I ran the Regression Model to look at the relationship between crater depth and crater diameter and added the latitude variable to see how this affected the results.
The explanatory variables that I used are: Diameter Diameter Squared Latitude
First I centered all of the variables and this can be seen below after I checked the mean to ensure the mean was zero after subtracting off the mean.
Tumblr media Tumblr media
I then ran a regression analysis with my explanatory variable of crater diameter and the response variable was the crater diameter.  The R squared value was 0.278 and the p value of 0.0 indicated there was a significant relationship between these two.  the intercept was at 0.3764 and the slope is positive with a value of 0.0151 indicating there is a positive relationship.
Tumblr media
I then ran another regression analysis with my explanatory variable of crater diameter , and the crater diameter square to have a polynomial fit and the response variable was the crater diameter.  The R squared value was only slightly better by using the polynomial and the value was 0.298 and the p value of 0.0 indicated there was a significant relationship between these two.  The intercept was at 0.4.  The value for the linear starts positive and the squared value is negative showing the concave shape of the curve
Tumblr media
A plot of the fit line for linear and polynomial fit are shown below.  the addition of the polynomial fit does a slightly better job at predicting the smaller portion and large diameter craters.
Tumblr media Tumblr media
The addition of the latitude explanatory variable was included to see if the latitude had a significant influence on the depth of the craters.  The regression analysis showed that the latitude had a p value of 0.0 indicating that there was a significant relationship.  The R squared value also slightly improved showing a better correlation when including this variable.  All of the variables are statistically significant since when they were all added the resultant P value is around 0.0 which is less than 0.05.
The intercept shows 0.4015 depth when the other variables are at their means since they were centred.  Only about 29.9% of the variability is explained by these variables.
Tumblr media Tumblr media Tumblr media
The qq plot shows the residuals deviate quite a bit from the fit line. Indicating that all the explanatory variables may not be properly fitting the relationship.  Especially near the far right there is poor agreement.
Tumblr media
The residual plot shows that most fall within +/-2.5 standard deviations.  We also see some extreme outliers at great than 5 standard deviations.  The fit of the model is relatively poor and could be improved.
Tumblr media
Next the Regressions plots were included to examine the residuals.  This shows that the residuals are larger away from the south pole and there are randomly spread out around the best fit line.  This shows that even though there is a significant statistical relationship, the correlation is pretty weak.
Tumblr media
Finally, the influence plot was created and there are several outliers visible on the residual plot that show that some of these outliers have a large influence or bias.
Tumblr media
Overall the hypothesis was corrected however the dataset should be reduced and perhaps only look at a subsection of the crater as the relationship is poor.   There were no confounding variables observed.
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Regression Modeling in Practice - Week 3
Test a Multiple Regression Model
-------------------------------------------
Info about this subset and narrowed down  dataset
Total number of rows in the dataset: 76520
Total number of columns in the dataset: 4 Eplanatory Variable: Diameter
Eplanatory Variable: Latitude
Response Variable: Crater Depth
----Explanatory before centred described: count    76520.000000 mean        -9.997340 std         33.599395 min        -86.700000 25%        -34.660500 50%        -12.159000 75%         14.828000 max         85.702000 Name: LATITUDE_CIRCLE_IMAGE, dtype: float64 count    76520.000000 mean        10.537621 std         12.353504 min          1.060000 25%          3.580000 50%          5.850000 75%         12.022500 max         99.970000 Name: DIAM_CIRCLE_IMAGE, dtype: float64
----Explanatory after centred described: count    7.652000e+04 mean    -2.610102e-13 std      3.359940e+01 min     -7.670266e+01 25%     -2.466316e+01 50%     -2.161660e+00 75%      2.482534e+01 max      9.569934e+01 Name: LATITUDE_CIRCLE_IMAGE_c, dtype: float64 count    7.652000e+04 mean    -1.520567e-13 std      1.235350e+01 min     -9.477621e+00 25%     -6.957621e+00 50%     -4.687621e+00 75%      1.484879e+00 max      8.943238e+01 Name: DIAM_CIRCLE_IMAGE_c, dtype: float64 AxesSubplot(0.125,0.125;0.775x0.755) AxesSubplot(0.125,0.125;0.775x0.755) AxesSubplot(0.125,0.125;0.775x0.755) AxesSubplot(0.125,0.125;0.775x0.755)                             OLS Regression Results                             ================================================================================ Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.278 Model:                              OLS   Adj. R-squared:                  0.278 Method:                   Least Squares   F-statistic:                 2.940e+04 Date:                  Thu, 16 Jul 2020   Prob (F-statistic):               0.00 Time:                          13:51:16   Log-Likelihood:                -16855. No. Observations:                 76520   AIC:                         3.371e+04 Df Residuals:                     76518   BIC:                         3.373e+04 Df Model:                             1                                         Covariance Type:              nonrobust                                         =======================================================================================                          coef    std err          t      P>|t|      [0.025      0.975] --------------------------------------------------------------------------------------- Intercept               0.3764      0.001    345.213      0.000       0.374       0.379 DIAM_CIRCLE_IMAGE_c     0.0151   8.83e-05    171.476      0.000       0.015       0.015 ============================================================================== Omnibus:                    13409.393   Durbin-Watson:                   1.667 Prob(Omnibus):                  0.000   Jarque-Bera (JB):            42677.836 Skew:                           0.900   Prob(JB):                         0.00 Kurtosis:                       6.185   Cond. No.                         12.4 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                             OLS Regression Results                             ================================================================================ Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.298 Model:                              OLS   Adj. R-squared:                  0.298 Method:                   Least Squares   F-statistic:                 1.626e+04 Date:                  Thu, 16 Jul 2020   Prob (F-statistic):               0.00 Time:                          13:51:16   Log-Likelihood:                -15745. No. Observations:                 76520   AIC:                         3.150e+04 Df Residuals:                     76517   BIC:                         3.152e+04 Df Model:                             2                                         Covariance Type:              nonrobust                                         ===============================================================================================                                  coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------------------- Intercept                       0.4012      0.001    335.779      0.000       0.399       0.404 DIAM_CIRCLE_IMAGE_c             0.0211      0.000    138.111      0.000       0.021       0.021 I(DIAM_CIRCLE_IMAGE_c ** 2)    -0.0002   3.42e-06    -47.464      0.000      -0.000      -0.000 ============================================================================== Omnibus:                    13409.564   Durbin-Watson:                   1.716 Prob(Omnibus):                  0.000   Jarque-Bera (JB):            42023.720 Skew:                           0.905   Prob(JB):                         0.00 Kurtosis:                       6.147   Cond. No.                         637. ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.                             OLS Regression Results                             ================================================================================ Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.299 Model:                              OLS   Adj. R-squared:                  0.299 Method:                   Least Squares   F-statistic:                 1.088e+04 Date:                  Thu, 16 Jul 2020   Prob (F-statistic):               0.00 Time:                          13:51:16   Log-Likelihood:                -15704. No. Observations:                 76520   AIC:                         3.142e+04 Df Residuals:                     76516   BIC:                         3.145e+04 Df Model:                             3                                         Covariance Type:              nonrobust                                         ===============================================================================================                                  coef    std err          t      P>|t|      [0.025      0.975] ----------------------------------------------------------------------------------------------- Intercept                       0.4015      0.001    336.079      0.000       0.399       0.404 DIAM_CIRCLE_IMAGE_c             0.0213      0.000    138.264      0.000       0.021       0.022 I(DIAM_CIRCLE_IMAGE_c ** 2)    -0.0002   3.43e-06    -48.013      0.000      -0.000      -0.000 LATITUDE_CIRCLE_IMAGE_c         0.0003   3.22e-05      9.093      0.000       0.000       0.000 ============================================================================== Omnibus:                    13248.022   Durbin-Watson:                   1.718 Prob(Omnibus):                  0.000   Jarque-Bera (JB):            41812.705 Skew:                           0.892   Prob(JB):                         0.00 Kurtosis:                       6.151   Cond. No.                         637. ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Figure(432x288)
0 notes
zaworm · 5 years ago
Text
Regression Modeling in Practice Week 2 Testing a Basic Linear Regression Model
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater in the context of the moderator located near the poles or the equator? The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  
2.Notes about the Results
* see section 3 for full code and section 4 for code output
I ran the Regression Model to look at the relationship between crater depth and crater diameter.
Explanatory Variable: Diameter
Response Variable: Crater Depth
I first centred the Explanatory variable data.  This was done by calculating the mean and then subtracting the mean from each value.  The mean was calculated to be  10.53762 1km.  Prior to centering the data the distrubtion was as follows:
Tumblr media
After I centered the data it shift to the following.  I recalculated the mean and its value was 0.0 indicating that the centering was successful.
Tumblr media
The results from the linear regression were:
The p value is 0.0 indicating that there is a significant relationship between the depth and diameter of the crater.
The regression revealed that the estimated correlation linear fit line can be calculated as follows:
CraterDepth = Intercept_Coef + Diameter_Coef * CraterDiameter
CraterDepth = 0.3764 + (0.0151 * CraterDiameter)
This is a positive relationship and the depth increases with crater diameter.
The R-sqaured value was 0.278.  It is the proportion of the variance in the response (depth) variable that can be explained by the explanatory (diameter) variable. We now know that this model accounts for about 28% of the variability we see in our response variable, crater depth.
Tumblr media
I plotted the data and the regression line is visible in red.  This line follows the above equation.
Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Tumblr media Tumblr media
0 notes
zaworm · 5 years ago
Text
Regression Modeling in Practice Week 1 Writing About Your Data
1. Background
I have chosen the Mars Craters Study data set as it most accurately represents my field in the Physical Science and Engineering area of study.  I am interested in examining the craters on Mars and their relationship to further the science in this area.
2. Sample 
The Mars Craters Study, presents a global database that includes over 300,000 Mars craters 1 km or larger that were created between 4.2 and 3.8 billion years ago during a period of heavy bombardment (i.e. impacts of asteroids, proto-planets, and comets).  This study, created by Stuart Robbins, presents a new global database for Mars that contains 378,540 craters statistically complete for diameters D ≥ 1 km. Each crater was measured and reported individually.  
For my analysis I am looking at craters that are less equal to or less than 100 km in diameter and  a crater depth that was greater than 0 km.  This selection resulted in 76520 individual records.
2. Procedure 
The bulk of crater identification and classification was done using observational studies.  The craters were identified using THermal EMission Imaging System (THEMIS) aboard the 2001 Mars Odyssey NASA spacecraft.  This space instrument is a multi-spectral thermal-infrared imager.  The average local time for daytime observations is 4:30 P.M. to yield a high phase angle with shadows and heating effects sufficient for geomorphologic feature identification.
Data was collected between 2001 and 2011.   One of the initial goals for the instrument was to create a global mosaic of the planet in both day and night from which thermal inertia maps could be calculated
ArcGIS Software was used to analyze the images and their location was determined in terms of latitude and longitude.
Diameters of the craters were fit using a circle routine approach with the Igor Pro software.
The topology of the crater was recorded by the Mars Orbiter Laser Altimeter (MOLA) attached to the Mars Global Surveyor (MGS) spacecraft. The MOLA instrument transmitted infrared laser pulses towards Mars to determine the distance of the MGS spacecraft to the Martian surface.
3. Measures 
More recently the ability to measure craters has been enhanced based on the instruments of the Mars Orbiter.  This detailed database includes location in longitude (-180 to 180)  and latitude (-90 to 90), ejecta morphology, crater depth (in km), diameter (in km) and the name of the crater if available.
The use of new datasets has allowed for the reexamination of past large craters and also allows for smaller (>1km) craters to be analyzed.  
I would like to study the relationship between the crater diameter and the depth of the craters on Mars.  The primary questions to be proposed is: 1.  “Does the crater diameter have a relationship with the depth of the crater?  Is the depth dependant on the diameter of the crater?”
For this primary question it will be important to note and compare the diameter of the crater and the crater depth.
In addition to the primary questions I will also be looking at the location of the craters and would like to investigate and research the following questions: 2.  “Are shallower depth craters associated with locations near the poles of Mars?”  For this secondary question it will be important to look at the location of the craters and in particular the latitude.  
Craters appear throughout the terrain of Mars and are the result of a period of heavy bombardment from asteroids, protoplanets and comets.  The craters that appear on Mars are vital in understanding its surface material properties and provide insight into its climate and history and impact physics.
0 notes
zaworm · 5 years ago
Text
Data Analysis Tools Week 4 Testing a Potential Moderator
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater in the context of the moderator located near the poles or the equator? The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.  The moderator was split into 3 categories, near the Southpole, near the Equator and near the Northpole.
2.Notes about the Results
* see section 3 for full code and section 4 for code output
I ran the correlation coefficient on the diameter versus depth against the moderator for location (based on latitude).
The syntax of the code used to run the correlation and plot the results is:
Tumblr media
2.1 South Pole
The results for the Southpole show that the correlation coefficient is positive and a value of 0.69 is shown.  This shows and increasing trend with larger diameter craters are often deeper.  The coefficient is somewhat close to 1 which shows a strong relationship.
The p value is significant with a value of 0 indicating that there is a relationship between diameter and depth.
The results from the code are:
******************** Association between crater diameter and crater depth for SOUTHPOLE (0.6909785490096341, 0.0)
2.2 Equator
The results for the Southpole show that the correlation coefficient is positive and a value of 0.51 is shown.  This shows and increasing trend with larger diameter craters are often deeper.  The coefficient is almost between 0 and 1 which shows a moderate weak relationship.
The p value is significant with a value of 0 indicating that there is a relationship between diameter and depth.
The results from the code are:
******************** Association between crater diameter and crater depth for EQUATOR (0.5141921407957738, 0.0)
2.1 North Pole
The results for the Southpole show that the correlation coefficient is positive and a value of 0.49 is shown.  This shows and increasing trend with larger diameter craters are often deeper.  The coefficient is in the middle of 1 and 0 which shows a weak relationship.
The p value is significant with a value of approximately 0 indicating that there is a relationship between diameter and depth.
The results from the code are:
******************** Association between crater diameter and crater depth for NORTHPOLE (0.4687330023740152, 8.270884773408035e-184)
2.4 Scatter plots
Scatter plots for each of the latitude locations were plotted and show that the strongest moderator effect is at the south pole and as the craters travel to the north the correlation is weaker.  At the south there is a stronger relationship between diameter and depth than near the equator or north poles.
Tumblr media Tumblr media Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Data Analysis Tools
Testing moderation in the context of correlation
-------------------------------------------
Info about this subset and narrowed down  dataset
Total number of rows in the dataset: 76520
Total number of columns in the dataset: 5
-----------------------   LATTITUDE BIN SECTION   -----------------------------------------
Latitude bin counts: SouthPole     9909 Equator      63239 NorthPole     3372 Name: LAT_LOCATION_GROUP, dtype: int64
******************** Association between crater diameter and crater depth for SOUTHPOLE (0.6909785490096341, 0.0)
******************** Association between crater diameter and crater depth for EQUATOR (0.5141921407957738, 0.0)
******************** Association between crater diameter and crater depth for NORTHPOLE (0.4687330023740152, 8.270884773408035e-184)
0 notes
zaworm · 5 years ago
Text
Data Analysis Tools Week 3 Generating a Correlation Coefficient
1.Background
I have decided to look at Mars craters and ask the following question:
A. Is the depth dependant on the diameter of the crater?  The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.
2.Notes about the Results
2.1 The code generated a correlation coefficient between two qualitative variables, the diameter of the crater and the depth of the crater.
The python suntax that was used is:
print ('association between crater diameter and crater depth') print (scipy.stats.pearsonr(data_clean['DIAM_CIRCLE_IMAGE'], data_clean['DEPTH_RIMFLOOR_TOPOG']))
* see section 3 for full code and section 4 for code output
The results from the code show the following:
association between crater diameter and crater depth (0.5268779814292375, 0.0)
The correlation coefficient is 0.527 and positive indicating a positive relationship between the diameter and the depth of the crater; with increasing crater diameters there is an increase in depth.  The p value is 0 and less than 0.05 indicating that we reject the null hypothesis.  The relationship is statistically significant.  The linear correlation coefficient is not a very strong linear relationship as the coefficient lies close to halfway between 0 and 1. 
RSquared or Coefficient of Determination was found to be 0.277.  This indicates we can predict approximately 28% of the variability if we know the diameter.
A plotted curve fit of the correlation is shown in the figure below.
Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Data Analysis Tools
Generating a Correlation Coefficient
-------------------------------------------
Info about this subset and narrowed down  dataset
Total number of rows in the dataset: 76520
Total number of columns in the dataset: 5 association between crater diameter and crater depth (0.5268779814292375, 0.0)
0 notes
zaworm · 5 years ago
Text
Data Analysis Tools Week 2 Running a Chi-Square Test of Independence
1.Background
I have decided to look at Mars craters and ask the following question:
A. Are shallower depth craters associated with locations near the North and South poles of Mars?” The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km.
I divided up the latitude of the planet into 5 bins based on the latitude degree increments/bins to see if there was a significant difference between the latitude of these groups and the depth of the crater.
I classified the crater as either being shallow or deep based on the crater depth.  The cutoff was chosen to be 180m.
2.Notes about the Results
2.1 The crater count based on the latitude is shown below:
Tumblr media
2.2 The crater classification is based on the plot below, where the majority of the craters are classified as deep.
Tumblr media
2.3 The overall mean deep craters are shown in the plot below.  There seems to be a correlation between latitude and depth of the crater, with a higher concentration of deep craters near the equator of the planet.  Lets perform a Chi-squared analysis to further investigate.
Tumblr media
The Chi Squared test was run in Python and the full code is below in section 3.  The full results are shown in section 4.
For example the Chi squared test that was coded was written as follows:
Tumblr media
The output showed that the p value was under 0.05 and the value was 0.0 indicating that a Post Hoc test was necessary.  The p value showed that the alternate hypothesis is to be accepted and there is a correlation between deep craters and the latitude.
Response Variable Categorical 2 Levels: Crater Depth (shallow or deep)
************************************************ LAT_LOCATION_GROUP  -90_-50  -50_-20  -20_20  20_50  50_90 IS_CRATER_DEEP                                             SHALLOW                5507     6847    8449   5501   2821 DEEP                   4402    14528   20948   6966    551 LAT_LOCATION_GROUP   -90_-50   -50_-20   -20_20     20_50     50_90 IS_CRATER_DEEP                                                     SHALLOW             0.555757  0.320327  0.28741  0.441245  0.836595 DEEP                0.444243  0.679673  0.71259  0.558755  0.163405 chi-square value, p value, expected counts (5870.45780474349, 0.0, 4, array([[ 3771.55808939,  8135.74065604, 11189.06985102,  4745.18263199,        1283.44877156],      [ 6137.44191061, 13239.25934396, 18207.93014898,  7721.81736801,        2088.55122844]]))
An example Post Hoc test was coded as follows:
Tumblr media
The output showed that the p value was 0.0 indicating the was under the Bonferroni adjustment and the null hypothesis was rejected indicating there is a correlation between latitude groups (-90 to -50 and -50 to -20):
************************************************ COMP1-10        -50_-20  -90_-50 IS_CRATER_DEEP                   SHALLOW            6847     5507 DEEP              14528     4402 COMP1-10         -50_-20   -90_-50 IS_CRATER_DEEP                     SHALLOW         0.320327  0.555757 DEEP            0.679673  0.444243 chi-square value, p value, expected counts (1569.4618848788969, 0.0, 1, array([[ 8440.95224396,  3913.04775604],     [12934.04775604,  5995.95224396]]))
A table summarizing the results of the 10 Chi Square results are shown below.  It shows that we can not accept the null hypothesis and there is a correlation between latitude and crater depth.  There is a stronger correlation when looking at the areas close to the poles vs the equator.
Tumblr media
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Data Analysis Tools
Running a Chi-Square Test of Independence
-------------------------------------------
Info about this dataset
Total number of rows in the dataset: 384343
Total number of columns in the dataset: 10 Info about this subset and narrowed down  dataset
Total number of rows in the dataset: 76520
Total number of columns in the dataset: 5
-----------------------   LATTITUDE BIN SECTION   -----------------------------------------
Latitude bin counts: -90_-50     9909 -50_-20    21375 -20_20     29397 20_50      12467 50_90       3372 Name: LAT_LOCATION_GROUP, dtype: int64
-----------------------   DEPTH MANAGEMENT SECTION  ----------------------------------------- Here are the counts for deep and shallow craters craters:
DEEP       47395 SHALLOW    29125 Name: IS_CRATER_DEEP, dtype: int64
---------- Categorical Eplanatory Variable: Latitude group
Response Variable Categorical 2 Levels: Crater Depth (shallow or deep)
************************************************ LAT_LOCATION_GROUP  -90_-50  -50_-20  -20_20  20_50  50_90 IS_CRATER_DEEP                                             SHALLOW                5507     6847    8449   5501   2821 DEEP                   4402    14528   20948   6966    551 LAT_LOCATION_GROUP   -90_-50   -50_-20   -20_20     20_50     50_90 IS_CRATER_DEEP                                                     SHALLOW             0.555757  0.320327  0.28741  0.441245  0.836595 DEEP                0.444243  0.679673  0.71259  0.558755  0.163405 chi-square value, p value, expected counts (5870.45780474349, 0.0, 4, array([[ 3771.55808939,  8135.74065604, 11189.06985102,  4745.18263199,         1283.44877156],       [ 6137.44191061, 13239.25934396, 18207.93014898,  7721.81736801,         2088.55122844]]))
************************************************ COMP1-10        -50_-20  -90_-50 IS_CRATER_DEEP                   SHALLOW            6847     5507 DEEP              14528     4402 COMP1-10         -50_-20   -90_-50 IS_CRATER_DEEP                     SHALLOW         0.320327  0.555757 DEEP            0.679673  0.444243 chi-square value, p value, expected counts (1569.4618848788969, 0.0, 1, array([[ 8440.95224396,  3913.04775604],       [12934.04775604,  5995.95224396]]))
************************************************ COMP2-10        -20_-20  -90_-50 IS_CRATER_DEEP                   SHALLOW            8449     5507 DEEP              20948     4402 COMP2-10        -20_-20   -90_-50 IS_CRATER_DEEP                   SHALLOW         0.28741  0.555757 DEEP            0.71259  0.444243 chi-square value, p value, expected counts (2329.314932557565, 0.0, 1, array([[10437.70752557,  3518.29247443],       [18959.29247443,  6390.70752557]]))
************************************************ COMP3-10        -90_-50  20_50 IS_CRATER_DEEP                 SHALLOW            5507   5501 DEEP               4402   6966 COMP3-10         -90_-50     20_50 IS_CRATER_DEEP                     SHALLOW         0.555757  0.441245 DEEP            0.444243  0.558755 chi-square value, p value, expected counts (289.20138905047946, 7.422790875630638e-65, 1, array([[4874.78870218, 6133.21129782],       [5034.21129782, 6333.78870218]]))
************************************************ COMP4-10        -90_-50  50_90 IS_CRATER_DEEP                 SHALLOW            5507   2821 DEEP               4402    551 COMP4-10         -90_-50     50_90 IS_CRATER_DEEP                     SHALLOW         0.555757  0.836595 DEEP            0.444243  0.163405 chi-square value, p value, expected counts (847.2982157758986, 2.811519087431208e-186, 1, array([[6213.54958211, 2114.45041789],       [3695.45041789, 1257.54958211]]))
************************************************ COMP5-10        -20_20  -50_-20 IS_CRATER_DEEP                 SHALLOW           8449     6847 DEEP             20948    14528 COMP5-10         -20_20   -50_-20 IS_CRATER_DEEP                   SHALLOW         0.28741  0.320327 DEEP            0.71259  0.679673 chi-square value, p value, expected counts (63.54773843933685, 1.5652720877138506e-15, 1, array([[ 8856.38761522,  6439.61238478],       [20540.61238478, 14935.38761522]]))
************************************************ COMP6-10        -50_-20  20_50 IS_CRATER_DEEP                 SHALLOW            6847   5501 DEEP              14528   6966 COMP6-10         -50_-20     20_50 IS_CRATER_DEEP                     SHALLOW         0.320327  0.441245 DEEP            0.679673  0.558755 chi-square value, p value, expected counts (496.2855186756665, 6.111846388145869e-110, 1, array([[ 7799.14012174,  4548.85987826],       [13575.85987826,  7918.14012174]]))
************************************************ COMP7-10        -50_-20  50_90 IS_CRATER_DEEP                 SHALLOW            6847   2821 DEEP              14528    551 COMP7-10         -50_-20     50_90 IS_CRATER_DEEP                     SHALLOW         0.320327  0.836595 DEEP            0.679673  0.163405 chi-square value, p value, expected counts (3258.881845816943, 0.0, 1, array([[ 8350.64856346,  1317.35143654],       [13024.35143654,  2054.64856346]]))
************************************************ COMP8-10        -20_20  20_50 IS_CRATER_DEEP               SHALLOW           8449   5501 DEEP             20948   6966 COMP8-10         -20_20     20_50 IS_CRATER_DEEP                   SHALLOW         0.28741  0.441245 DEEP            0.71259  0.558755 chi-square value, p value, expected counts (931.7404706337541, 1.2358200405213006e-204, 1, array([[ 9795.72305561,  4154.27694439],       [19601.27694439,  8312.72305561]]))
************************************************ COMP9-10        -20_20  50_90 IS_CRATER_DEEP               SHALLOW           8449   2821 DEEP             20948    551 COMP9-10         -20_20     50_90 IS_CRATER_DEEP                   SHALLOW         0.28741  0.836595 DEEP            0.71259  0.163405 chi-square value, p value, expected counts (4040.9902387527836, 0.0, 1, array([[10110.29295981,  1159.70704019],       [19286.70704019,  2212.29295981]]))
************************************************ COMP10-10       20_50  50_90 IS_CRATER_DEEP               SHALLOW          5501   2821 DEEP             6966    551 COMP10-10          20_50     50_90 IS_CRATER_DEEP                     SHALLOW         0.441245  0.836595 DEEP            0.558755  0.163405 chi-square value, p value, expected counts (1662.095048322109, 0.0, 1, array([[6550.31087821, 1771.68912179],       [5916.68912179, 1600.31087821]]))
0 notes
zaworm · 5 years ago
Text
Data Analysis Tools - Running an analysis of variance Week 1 Assignment
1.Background
I have decided to look at Mars craters and ask the following question:
A. Are shallower depth craters associated with locations near the North and South poles of Mars?” The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km. 
I divided up the latitude of the planet in 20 degree increments/bins to see if there was a significant difference between the latitude of these groups and the depth of the crater.
2.Notes about the Results
2.1 The plot below shows that the crateres were divided into those located in bins of 20 degree increments.  The majority of the craters are around the equator areas.
Tumblr media
2.2 ANOVA
I initially ran the ANOVA  against th category of Latitude bins against greater depth.
Null Hypothesis: There is no relationship between the latitude and the depth of the craters (All means are equal)
Alternate Hypothesis: There is a relationship between the latitude and the depth of the craters (not all means are equal)
The full output from the code is in section 4 below, but the ANOVA results showed a p result of 0.00  which is less than or equal to 0.05, indicating that we reject the Null hypothesis as there is a relationship.
# using ols function for calculating the F-statistic and associated p value model1 = smf.ols(formula='DEPTH_RIMFLOOR_TOPOG ~ C(LAT_LOCATION_GROUP)', data=sub2).fit()
F-statistic:                     490.9 Prob (F-statistic):               0.00
The means were not equal:
                  DEPTH_RIMFLOOR_TOPOG LAT_LOCATION_GROUP                       (-90, -70]                      0.286278 (-70, -50]                      0.295007 (-50, -30]                      0.358990 (-30, -10]                      0.460296 (-10, 10]                       0.425274 (10, 30]                        0.395404 (30, 50]                        0.294638 (50, 70]                        0.136315 (70, 90]                        0.135931
2.2 POST HOC TEST
Running the Post Hoc test with the Tukey test revealed that there is a relationship between the latitude and the depth of the crater.  However at certain locations the means are equal suggesting that at certain latitudes there is no difference.  For example the 40 degrees latitude near the south pole have similar means, but comparing the latitude near the south pole to the equator showed very different results.  The code was run with the following snytax and the output is below.
mc1 = multi.MultiComparison(sub2['DEPTH_RIMFLOOR_TOPOG'], sub2['LAT_LOCATION_GROUP']) res1 = mc1.tukeyhsd()
   Multiple Comparison of Means - Tukey HSD, FWER=0.05     ===========================================================  group1     group2   meandiff p-adj  lower   upper  reject ----------------------------------------------------------- (-90, -70] (-70, -50]   0.0087   0.9 -0.0207  0.0381  False (-90, -70] (-50, -30]   0.0727 0.001  0.0441  0.1013   True (-90, -70] (-30, -10]    0.174 0.001  0.1459  0.2022   True (-90, -70]  (-10, 10]    0.139 0.001  0.1106  0.1674   True (-90, -70]   (10, 30]   0.1091 0.001  0.0805  0.1378   True (-90, -70]   (30, 50]   0.0084   0.9 -0.0217  0.0384  False (-90, -70]   (50, 70]    -0.15 0.001 -0.1838 -0.1161   True (-90, -70]   (70, 90]  -0.1503 0.001 -0.2008 -0.0999   True (-70, -50] (-50, -30]    0.064 0.001  0.0489  0.0791   True (-70, -50] (-30, -10]   0.1653 0.001   0.151  0.1796   True (-70, -50]  (-10, 10]   0.1303 0.001  0.1154  0.1451   True (-70, -50]   (10, 30]   0.1004 0.001  0.0852  0.1156   True (-70, -50]   (30, 50]  -0.0004   0.9 -0.0181  0.0174  False (-70, -50]   (50, 70]  -0.1587 0.001 -0.1823  -0.135   True (-70, -50]   (70, 90]  -0.1591 0.001 -0.2033 -0.1148   True (-50, -30] (-30, -10]   0.1013 0.001  0.0888  0.1138   True (-50, -30]  (-10, 10]   0.0663 0.001  0.0532  0.0793   True (-50, -30]   (10, 30]   0.0364 0.001  0.0229  0.0499   True (-50, -30]   (30, 50]  -0.0644 0.001 -0.0807  -0.048   True (-50, -30]   (50, 70]  -0.2227 0.001 -0.2453 -0.2001   True (-50, -30]   (70, 90]  -0.2231 0.001 -0.2667 -0.1794   True (-30, -10]  (-10, 10]   -0.035 0.001 -0.0472 -0.0229   True (-30, -10]   (10, 30]  -0.0649 0.001 -0.0775 -0.0522   True (-30, -10]   (30, 50]  -0.1657 0.001 -0.1813   -0.15   True (-30, -10]   (50, 70]   -0.324 0.001 -0.3461 -0.3019   True (-30, -10]   (70, 90]  -0.3244 0.001 -0.3678  -0.281   True (-10, 10]   (10, 30]  -0.0299 0.001 -0.0431 -0.0166   True (-10, 10]   (30, 50]  -0.1306 0.001 -0.1467 -0.1145   True (-10, 10]   (50, 70]   -0.289 0.001 -0.3114 -0.2665   True (-10, 10]   (70, 90]  -0.2893 0.001 -0.3329 -0.2458   True  (10, 30]   (30, 50]  -0.1008 0.001 -0.1172 -0.0843   True  (10, 30]   (50, 70]  -0.2591 0.001 -0.2818 -0.2364   True  (10, 30]   (70, 90]  -0.2595 0.001 -0.3032 -0.2158   True  (30, 50]   (50, 70]  -0.1583 0.001 -0.1828 -0.1339   True  (30, 50]   (70, 90]  -0.1587 0.001 -0.2034  -0.114   True  (50, 70]   (70, 90]  -0.0004   0.9 -0.0477  0.0469  False -----------------------------------------------------------
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Data Analysis Tools
Running an analysis of variance
-------------------------------------------
Info about this dataset
Total number of rows in the dataset: 384343
Total number of columns in the dataset: 10 Info about this subset and narrowed down  dataset
Total number of rows in the dataset: 76520
Total number of columns in the dataset: 5
-----------------------   LATTITUDE BIN SECTION   -----------------------------------------
Latitude bin counts: (-90, -70]     1585 (-70, -50]     8324 (-50, -30]    12919 (-30, -10]    17375 (-10, 10]     14180 (10, 30]      12280 (30, 50]       6485 (50, 70]       2738 (70, 90]        634 Name: LAT_LOCATION_GROUP, dtype: int64 Here are the counts for craters in 20 degree bins based on latitude (-30, -10]    17375 (-10, 10]     14180 (-50, -30]    12919 (10, 30]      12280 (-70, -50]     8324 (30, 50]       6485 (50, 70]       2738 (-90, -70]     1585 (70, 90]        634 Name: LAT_LOCATION_GROUP, dtype: int64 Here are the counts for craters in 20 degree bins based on latitude (-30, -10]    0.227065 (-10, 10]     0.185311 (-50, -30]    0.168832 (10, 30]      0.160481 (-70, -50]    0.108782 (30, 50]      0.084749 (50, 70]      0.035781 (-90, -70]    0.020714 (70, 90]      0.008285 Name: LAT_LOCATION_GROUP, dtype: float64 OLS Regression
Eplanatory Variable: Latitude group
Response Variable: Crater Depth
                            OLS Regression Results                             ================================================================================ Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.049 Model:                              OLS   Adj. R-squared:                  0.049 Method:                   Least Squares   F-statistic:                     490.9 Date:                  Thu, 25 Jun 2020   Prob (F-statistic):               0.00 Time:                          11:21:22   Log-Likelihood:                -27382. No. Observations:                 76520   AIC:                         5.478e+04 Df Residuals:                     76511   BIC:                         5.486e+04 Df Model:                             8                                         Covariance Type:              nonrobust                                         ===============================================================================================================================                                                                  coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------------------------------------------------------- Intercept                                                       0.2863      0.009     32.931      0.000       0.269       0.303 C(LAT_LOCATION_GROUP)[T.Interval(-70, -50, closed='right')]     0.0087      0.009      0.920      0.357      -0.010       0.027 C(LAT_LOCATION_GROUP)[T.Interval(-50, -30, closed='right')]     0.0727      0.009      7.894      0.000       0.055       0.091 C(LAT_LOCATION_GROUP)[T.Interval(-30, -10, closed='right')]     0.1740      0.009     19.163      0.000       0.156       0.192 C(LAT_LOCATION_GROUP)[T.Interval(-10, 10, closed='right')]      0.1390      0.009     15.164      0.000       0.121       0.157 C(LAT_LOCATION_GROUP)[T.Interval(10, 30, closed='right')]       0.1091      0.009     11.814      0.000       0.091       0.127 C(LAT_LOCATION_GROUP)[T.Interval(30, 50, closed='right')]       0.0084      0.010      0.862      0.389      -0.011       0.027 C(LAT_LOCATION_GROUP)[T.Interval(50, 70, closed='right')]      -0.1500      0.011    -13.729      0.000      -0.171      -0.129 C(LAT_LOCATION_GROUP)[T.Interval(70, 90, closed='right')]      -0.1503      0.016     -9.244      0.000      -0.182      -0.118 ============================================================================== Omnibus:                    33558.123   Durbin-Watson:                   1.257 Prob(Omnibus):                  0.000   Jarque-Bera (JB):           189495.964 Skew:                           2.067   Prob(JB):                         0.00 Kurtosis:                       9.508   Cond. No.                         23.1 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. means for crater depth by latitude group                    DEPTH_RIMFLOOR_TOPOG LAT_LOCATION_GROUP                       (-90, -70]                      0.286278 (-70, -50]                      0.295007 (-50, -30]                      0.358990 (-30, -10]                      0.460296 (-10, 10]                       0.425274 (10, 30]                        0.395404 (30, 50]                        0.294638 (50, 70]                        0.136315 (70, 90]                        0.135931 standard deviations for crater depth by latitude group                    DEPTH_RIMFLOOR_TOPOG LAT_LOCATION_GROUP                       (-90, -70]                      0.331280 (-70, -50]                      0.353152 (-50, -30]                      0.352421 (-30, -10]                      0.363237 (-10, 10]                       0.353117 (10, 30]                        0.345958 (30, 50]                        0.311536 (50, 70]                        0.232491 (70, 90]                        0.254736 Post Hoc Analysis
   Multiple Comparison of Means - Tukey HSD, FWER=0.05     ===========================================================  group1     group2   meandiff p-adj  lower   upper  reject ----------------------------------------------------------- (-90, -70] (-70, -50]   0.0087   0.9 -0.0207  0.0381  False (-90, -70] (-50, -30]   0.0727 0.001  0.0441  0.1013   True (-90, -70] (-30, -10]    0.174 0.001  0.1459  0.2022   True (-90, -70]  (-10, 10]    0.139 0.001  0.1106  0.1674   True (-90, -70]   (10, 30]   0.1091 0.001  0.0805  0.1378   True (-90, -70]   (30, 50]   0.0084   0.9 -0.0217  0.0384  False (-90, -70]   (50, 70]    -0.15 0.001 -0.1838 -0.1161   True (-90, -70]   (70, 90]  -0.1503 0.001 -0.2008 -0.0999   True (-70, -50] (-50, -30]    0.064 0.001  0.0489  0.0791   True (-70, -50] (-30, -10]   0.1653 0.001   0.151  0.1796   True (-70, -50]  (-10, 10]   0.1303 0.001  0.1154  0.1451   True (-70, -50]   (10, 30]   0.1004 0.001  0.0852  0.1156   True (-70, -50]   (30, 50]  -0.0004   0.9 -0.0181  0.0174  False (-70, -50]   (50, 70]  -0.1587 0.001 -0.1823  -0.135   True (-70, -50]   (70, 90]  -0.1591 0.001 -0.2033 -0.1148   True (-50, -30] (-30, -10]   0.1013 0.001  0.0888  0.1138   True (-50, -30]  (-10, 10]   0.0663 0.001  0.0532  0.0793   True (-50, -30]   (10, 30]   0.0364 0.001  0.0229  0.0499   True (-50, -30]   (30, 50]  -0.0644 0.001 -0.0807  -0.048   True (-50, -30]   (50, 70]  -0.2227 0.001 -0.2453 -0.2001   True (-50, -30]   (70, 90]  -0.2231 0.001 -0.2667 -0.1794   True (-30, -10]  (-10, 10]   -0.035 0.001 -0.0472 -0.0229   True (-30, -10]   (10, 30]  -0.0649 0.001 -0.0775 -0.0522   True (-30, -10]   (30, 50]  -0.1657 0.001 -0.1813   -0.15   True (-30, -10]   (50, 70]   -0.324 0.001 -0.3461 -0.3019   True (-30, -10]   (70, 90]  -0.3244 0.001 -0.3678  -0.281   True (-10, 10]   (10, 30]  -0.0299 0.001 -0.0431 -0.0166   True (-10, 10]   (30, 50]  -0.1306 0.001 -0.1467 -0.1145   True (-10, 10]   (50, 70]   -0.289 0.001 -0.3114 -0.2665   True (-10, 10]   (70, 90]  -0.2893 0.001 -0.3329 -0.2458   True  (10, 30]   (30, 50]  -0.1008 0.001 -0.1172 -0.0843   True  (10, 30]   (50, 70]  -0.2591 0.001 -0.2818 -0.2364   True  (10, 30]   (70, 90]  -0.2595 0.001 -0.3032 -0.2158   True  (30, 50]   (50, 70]  -0.1583 0.001 -0.1828 -0.1339   True  (30, 50]   (70, 90]  -0.1587 0.001 -0.2034  -0.114   True  (50, 70]   (70, 90]  -0.0004   0.9 -0.0477  0.0469  False -----------------------------------------------------------
0 notes
zaworm · 5 years ago
Text
Data Management and Visualization - Week 4 - Creating Graphs Week 4 Assignment
1.Background
I have decided to look at Mars craters and ask the following questions:
A. Is the depth dependant on the diameter of the crater? B. Are shallower depth craters associated with locations near the North and South poles of Mars?” (close to poles defined as +/- 40 degrees latitude) The dataset was limited to craters that had a diameter of 100 km or less and a crater depth that was greater than 0 km. 
My code outputs creates frequency distributions for the relevant data variables that will be used to answer the hypothesis questions and does a correlations between variables.  I am particularly interested in the crater latitude, the diameter and the depth of the crater.  The plots are shown in Section 2.
2.Notes about the Results
2.1 The plot below shows that the crateres were divided into those located close to the poles and those located close to the equator.  The result is that the majority of craters are not located close to the poles.
Tumblr media
2.2 The plot below shows the count of craters and that most of the craters are located just south of the equator.  There plot is a unimodal distribution.  The center is just south of the equator and the spread covers the entire latitude showing their are craters all over the planet.
Tumblr media
2.3 The plot below shows the count of craters binned into groups of 1- degrees and that most of the craters are located just south of the equator.  There is unimodal distribution and this is a modified version of the plot above.  The spread covers the entire latitude of the planet.
Tumblr media
2.4 The plot below shows the distribution plot for the diameters of craters.  The plot is skewed right. The standard deviation is 12km with the max diameter being just under 100km
Tumblr media
2.5 The plot below shows the distribution plot for the diameters of craters.  The plot is skewed right with the center located just above 0. The majority of the craters are under 1 km in depth with the mode just slightly above 0 km.
Tumblr media
2.6 The plot below shows the scatter plot comparing the association with diameter and depth of the crater.  The explanatory variable is the crater depth and the response variable is the depth.  The plot shows that there is a general trend to have deeper depths as the crater diameter increases.  A positive relationship is shown.
Tumblr media
2.7 The plot below shows the scatter plot comparing the association with diameter and depth of the crater. There is a best fit line drawn through the data that shows that as the diameter of the crater increases the depth of the crater increases.
Tumblr media
2.8 The plot below shows a scatter plot comparing the latitude of the crater with the depth of the crater.  The plot shows that near the poles the depth of the crater is smaller as is shown by the low y values near the extents or poles.
Tumblr media
More output is shown in section 4 where the distributions are described from the python code.
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media Tumblr media
4.Code Output:
Below is the raw output from the code.  The interpretation and comments on the results are shown above in Section 2.
Mars Crater Study
Info about this dataset
Total number of rows in the dataset: 384343
Total number of columns in the dataset: 10
-----------------------   NEAR POLE OF PLANET SECTION   ----------------------------------------- Here are the counts for craters close to the pole =1 for close to poles 0    63239 1    13281 Name: IS_NEAR_POLE, dtype: int64 Here are the percentages for craters close to the pole =1 for close to poles 0    0.826438 1    0.173562 Name: IS_NEAR_POLE, dtype: float64
-----------------------   LATTITUDE BIN SECTION   -----------------------------------------
Latitude bin counts: (-90, -80]     100 (-80, -70]    1485 (-70, -60]    3452 (-60, -50]    4872 (-50, -40]    5353 (-40, -30]    7566 (-30, -20]    8456 (-20, -10]    8919 (-10, 0]      7780 (0, 10]       6400 (10, 20]      6298 (20, 30]      5982 (30, 40]      4259 (40, 50]      2226 (50, 60]      1451 (60, 70]      1287 (70, 80]       604 (80, 90]        30 Name: LATITUDE_CIRCLE_IMAGE_10, dtype: int64
Latitude Described: count    76520.000000 mean        -9.997340 std         33.599395 min        -86.700000 25%        -34.660500 50%        -12.159000 75%         14.828000 max         85.702000 Name: LATITUDE_CIRCLE_IMAGE, dtype: float64
Latitude Binned Described: count          76520 unique            18 top       (-20, -10] freq            8919 Name: LATITUDE_CIRCLE_IMAGE_10, dtype: object
-----------------------   CRATER DIAMETER SECTION  ----------------------------------------- ------- Diameter Bins ----------- [ 1.06    3.58    5.85   12.0225 99.97  ]
Crater Diameter - 4 categories - quartiles 1=0%tile     19225 2=25%tile    19054 3=50%tile    19111 4=75%tile    19130 Name: DIAM_CIRCLE_4, dtype: int64 -------------------- DIAM_CIRCLE_IMAGE  1.06   1.08   1.10   1.11   ...  99.33  99.35  99.92  99.97 DIAM_CIRCLE_4                                  ...                             1=0%tile               1      1      1      1  ...      0      0      0      0 2=25%tile              0      0      0      0  ...      0      0      0      0 3=50%tile              0      0      0      0  ...      0      0      0      0 4=75%tile              0      0      0      0  ...      2      1      1      1
[4 rows x 5889 columns]
-----------------------   CRATER DEPTH SECTION  ----------------------------------------- ------- Depth Bins ----------- [0.01 0.26 4.95]
Crater Depth - 2 halves - lower and upper bins 1=Lower    38378 2=Upper    38142 Name: DEPTH_RIMFLOOR_TOPOG_4, dtype: int64 -------------------- DEPTH_RIMFLOOR_TOPOG    0.01  0.02  0.03  0.04  ...  3.80  4.01  4.75  4.95 DEPTH_RIMFLOOR_TOPOG_4                          ...                         1=Lower                  404   862  1301  1644  ...     0     0     0     0 2=Upper                    0     0     0     0  ...     1     1     1     1
[2 rows x 280 columns]
All Depths Described: count    76520.000000 mean         0.376384 std          0.354846 min          0.010000 25%          0.120000 50%          0.260000 75%          0.520000 max          4.950000 Name: DEPTH_RIMFLOOR_TOPOG, dtype: float64
-----------------------   HYPOTHESIS SECTION  -----------------------------------------
----Diameter described: count    76520.000000 mean        10.537621 std         12.353504 min          1.060000 25%          3.580000 50%          5.850000 75%         12.022500 max         99.970000 Name: DIAM_CIRCLE_IMAGE, dtype: float64
----Depth described: count    76520.000000 mean         0.376384 std          0.354846 min          0.010000 25%          0.120000 50%          0.260000 75%          0.520000 max          4.950000 Name: DEPTH_RIMFLOOR_TOPOG, dtype: float64
----Depth described: count    76520.000000 mean        -9.997340 std         33.599395 min        -86.700000 25%        -34.660500 50%        -12.159000 75%         14.828000 max         85.702000 Name: LATITUDE_CIRCLE_IMAGE, dtype: float64
0 notes
zaworm · 5 years ago
Text
Data Management and Visualization - Week 3 - Making Data Management Decisions Mars Crater Study
Week 3 Assignment
1.Background
I have decided to look at Mars craters and ask the following questions:
A. Is the depth dependant on the diameter of the crater? B. Are shallower depth craters associated with locations near the North and South poles of Mars?” (close to poles defined as +/- 40 degrees latitude)
My code outputs creates frequency distributions for the relevant data variables that will be used to answer the hypothesis questions.  I am particularly interested in the crater latitude, the diameter and the depth of the crater.
For this reason the data management portion focuses on making decisions for these variables.
2.Notes about the Results
2.1 Locations
As can be imaged the distribution varies widely as the location of the craters vary widely across the planet and it is unlikely that multiple craters impact the exact location on the planet.  For this reason, I will focus on 2 particular locations, those close to the planets poles and those near the equator.  I have created a new variable that identifies whether the crater is: a. Near the North Pole b. Near the South Pole c. Near the Equator
The new value for these corresponds to =1, =-1 and =0. The frequency counts for those show that the majority of the craters and not near the poles.  The distribution of values are:  0    322844 -1     39900  1     21599
I have also chosen to divide the planet up into 10 degree latitude positions. This resulted in 18 bins of 10 degree increments ranging from -90 to +90 latitude.  The results showed that the majority of the craters occured between -30 and 0 degrees latitude.   (-30, -20]    46504 (-20, -10]    46158 (-10, 0]      40921 Near the poles the counts were even less, with the extreme poles showing a small count and percentage: (-90, -80]      631 (80, 90]         44
2.2 Diameters
The diameter of the craters were investigated and put into 4 percentile bins.  the bins range is shown in the following output and repeated below.  It shows that the crater size for percentils 0-75 range from 1 to 2.5km in diameter, however the 75% percentile range has craters as large as 1164 km.  The vast majority of craters according to the output are under 2.5km.
------- Diameter Bins ----------- [1.00000e+00 1.18000e+00 1.53000e+00 2.55000e+00 1.16422e+03]
Crater Diameter - 4 categories - quartiles 1=0%tile     96308 2=25%tile    97609 3=50%tile    94359 4=75%tile    96067
2.3 Depths
Several of the craters had depths of 0 km.  The depths of the craters were put into 2 bins that ranged from  -0.4 km  to  0 km and 0 km to 4.95 km.  The frequency count is shown below and shows that the vast majority are shallow craters.
Crater Depth - 2 halves - lower and upper bins 1=Lower    307539 2=Upper     76804
2.4 Depth Management
To further classify the depths of the craters a new variable was created for each row and it classified the crater as either being a -1, - or 1.  This classification meant: -1 crater depth represents a negative value or peak in the crater 0 crater depth noted as 0km +1 crater is a regular crater depression
The results of the frequency distribution show the following: 0    307529 1     76804 -1        10
This shows that there majority of the craters are marked with a depth of 0km and only 10 of 384343 craters have peaks, making this occurrence rare. There is no sign that the 0s represent missing data, but appear to be just shallow craters.
3.Raw Python Code:
The raw Python code is shown in the photos below.  I decided to use screenshots for easier readability as it includes syntax highlighting.
Tumblr media Tumblr media Tumblr media
4.Code Output:
Mars Crater Study
Info about this dataset
Total number of rows in the dataset: 384343
Total number of columns in the dataset: 10
-----------------------   NEAR POLE OF PLANET SECTION   ----------------------------------------- Here is a sample of the first 10 rows   CRATER_ID  LATITUDE_CIRCLE_IMAGE  ...  DEPTH_RIMFLOOR_TOPOG  IS_NEAR_POLE 0  01-000000                 84.367  ...                  0.22             1 1  01-000001                 72.760  ...                  1.97             1 2  01-000002                 69.244  ...                  0.09             1 3  01-000003                 70.107  ...                  0.13             1 4  01-000004                 77.996  ...                  0.11             1 5  01-000005                 68.547  ...                  0.19             1 6  01-000006                 69.492  ...                  0.10             1 7  01-000007                 78.716  ...                  0.05             1 8  01-000008                 75.539  ...                  0.11             1 9  01-000009                 69.371  ...                  0.00             1
[10 rows x 6 columns] Here are the counts for craters close to the pole =1 for close to poles 0    322844 -1     39900 1     21599 Name: IS_NEAR_POLE, dtype: int64
-----------------------   LATTITUDE BIN SECTION   -----------------------------------------
Latitude bin counts: (-90, -80]      631 (-80, -70]     6984 (-70, -60]    13527 (-60, -50]    18758 (-50, -40]    25396 (-40, -30]    34577 (-30, -20]    46504 (-20, -10]    46158 (-10, 0]      40921 (0, 10]       32362 (10, 20]      30411 (20, 30]      28990 (30, 40]      23365 (40, 50]      14160 (50, 60]      10801 (60, 70]       7974 (70, 80]       2780 (80, 90]         44 Name: LATITUDE_CIRCLE_IMAGE_10, dtype: int64
-----------------------   CRATER DIAMETER SECTION  ----------------------------------------- ------- Diameter Bins ----------- [1.00000e+00 1.18000e+00 1.53000e+00 2.55000e+00 1.16422e+03]
Crater Diameter - 4 categories - quartiles 1=0%tile     96308 2=25%tile    97609 3=50%tile    94359 4=75%tile    96067 Name: DIAM_CIRCLE_4, dtype: int64 -------------------- DIAM_CIRCLE_IMAGE  1.00     1.01     1.02     ...  624.50   1096.65  1164.22 DIAM_CIRCLE_4                                 ...                           1=0%tile              3129     6298     6077  ...        0        0        0 2=25%tile                0        0        0  ...        0        0        0 3=50%tile                0        0        0  ...        0        0        0 4=75%tile                0        0        0  ...        1        1        1
[4 rows x 6240 columns]
-----------------------   CRATER DEPTH SECTION  ----------------------------------------- ------- Depth Bins ----------- [-0.42  0.    4.95]
Crater Depth - 2 halves - lower and upper bins 1=Lower    307539 2=Upper     76804 Name: DEPTH_RIMFLOOR_TOPOG_4, dtype: int64 -------------------- DEPTH_RIMFLOOR_TOPOG    -0.42  -0.03  -0.02  -0.01  ...   4.01   4.72   4.75   4.95 DEPTH_RIMFLOOR_TOPOG_4                              ...                             1=Lower                     1      2      4      3  ...      0      0      0      0 2=Upper                     0      0      0      0  ...      1      1      1      1
[2 rows x 296 columns]
-----------------------   DEPTH MANAGEMENT SECTION  ----------------------------------------- Here is a sample of the first 10 rows   CRATER_ID  LATITUDE_CIRCLE_IMAGE  ...  DEPTH_RIMFLOOR_TOPOG_4  HAS_CRATER_DEPTH 0  01-000000                 84.367  ...                 2=Upper                 1 1  01-000001                 72.760  ...                 2=Upper                 1 2  01-000002                 69.244  ...                 2=Upper                 1 3  01-000003                 70.107  ...                 2=Upper                 1 4  01-000004                 77.996  ...                 2=Upper                 1 5  01-000005                 68.547  ...                 2=Upper                 1 6  01-000006                 69.492  ...                 2=Upper                 1 7  01-000007                 78.716  ...                 2=Upper                 1 8  01-000008                 75.539  ...                 2=Upper                 1 9  01-000009                 69.371  ...                 1=Lower                 0
[10 rows x 10 columns] Here are the counts for craters with a depression (+ or - depth)
0    307529 1     76804 -1        10 Name: HAS_CRATER_DEPTH, dtype: int64
0 notes
zaworm · 5 years ago
Text
Week 2 Assignment
Background
I have decided to look at Mars craters and ask the following questions:
1. Is the depth dependant on the diameter of the crater? 2. Are shallower depth craters associated with locations near the North and South poles of Mars?” (close to poles defined as +/- 40 degrees latitude)
My code outputs creates frequency distributions (in count and percentage) for the entire data set first.  I output the location of the crater in latitude, the diameter of the crater and the depth of the crater.
I have also created a subset of data that divides the data into datasets close to the equator and those that are close to the poles.  This will allow me to compare the two datasets.
Notes about the Results
As can be imaged the distribution varies widely as the location of the craters vary widely across the planet and it is unlikely that multiple craters impact the exact location on the planet.  For this reason, the frequency of the crater at the same location is low and is under 1 percent.  I have also chosen to sort the distribution so that I can see the most frequent location, diameters and depths.
For craters close to the poles they appear shallow and 78% of these craters have depths of 0 km.
The most frequent diameter of the dataset was 1.01km across and occured in 1.6% of the time.  The most impact latitude on the planet was at -23.6 degrees latitude.
There were no  gaps in the data used.
Data is organized in the output as follows: -All Raw Data -Subset for craters close to the equator -Subset or craters close to the poles For each of these I output the count and percentage for the latitude, diameter and depth of the crater
Raw Python Code:
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ ************  MARS CRATER STUDY  ************
Created on Wed Jun 17 15:47:15 2020
@author: tj """
#libraries used in this code import pandas import numpy
print("\nMars Crater Study\n")
#load in the mars crater dataset dataMarsCraterRaw = pandas.read_csv('marscrater_pds.csv', low_memory=False)
print("Total number of rows in the dataset:") print(len(dataMarsCraterRaw)) print("\nTotal number of columns in the dataset:") print(len(dataMarsCraterRaw.columns))
""" The data set contains the following columns, For my study I require only the 5 columns that have an * *CRATER_ID CRATER_NAME *LATITUDE_CIRCLE_IMAGE *LONGITUDE_CIRCLE_IMAGE *DIAM_CIRCLE_IMAGE *DEPTH_RIMFLOOR_TOPOG MORPHOLOGY_EJECTA_1 MORPHOLOGY_EJECTA_2 MORPHOLOGY_EJECTA_3 NUMBER_LAYERS """
#ensure data is read is as numeric values and not text dataMarsCraterRaw['LATITUDE_CIRCLE_IMAGE'] = pandas.to_numeric(dataMarsCraterRaw['LATITUDE_CIRCLE_IMAGE']) dataMarsCraterRaw['LONGITUDE_CIRCLE_IMAGE'] = pandas.to_numeric(dataMarsCraterRaw['LONGITUDE_CIRCLE_IMAGE']) dataMarsCraterRaw['DIAM_CIRCLE_IMAGE'] = pandas.to_numeric(dataMarsCraterRaw['DIAM_CIRCLE_IMAGE']) dataMarsCraterRaw['DEPTH_RIMFLOOR_TOPOG'] = pandas.to_numeric(dataMarsCraterRaw['DEPTH_RIMFLOOR_TOPOG'])
print("\n***************  ALL RAW DATA  ***************")
# As the data is very unordered and craters are less likely to be in the same spot I have deceided to sort the distribution # so that we can identify the most populous data
print("\n***************  LATITUDE  ***************") print("Counts for the Latitude of the craters:") distLatCnt = dataMarsCraterRaw["LATITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distLatCnt)
print("\nPercentages for the Latitude of the craters:") distLatPrc = dataMarsCraterRaw["LATITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distLatPrc)
""" May want to look at longitude later in the study so this is a placeholder for now. print("\n***************  LONGITUDE  ***************") print("Counts for the Longitude of the craters:") distLongCnt = dataMarsCraterRaw["LONGITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distLongCnt)
print("Percentages for the Longitude of the craters:") distLongPrc = dataMarsCraterRaw["LONGITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distLongPrc) """
print("\n***************  DIAMETER  ***************") print("Counts for the Diameter of the craters:") distDiamCnt = dataMarsCraterRaw["DIAM_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distDiamCnt)
print("\nPercentages for the Diameter of the craters:") distDiamPrc = dataMarsCraterRaw["DIAM_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distDiamPrc)
print("\n***************  DEPTH  ***************") print("Counts for the Depth of the craters:") distDepthCnt = dataMarsCraterRaw["DEPTH_RIMFLOOR_TOPOG"].value_counts(dropna=False,sort=True) print(distDepthCnt)
print("\nPercentages for the Depth of the craters:") distDepthPrc = dataMarsCraterRaw["DEPTH_RIMFLOOR_TOPOG"].value_counts(dropna=False,sort=True, normalize=True) print(distDepthPrc)
# The study will be  comparing data close to the equator with those close to the poles # data close to the poles will be define +/- 40 degrees poleward latitude # so lets create subsets of data for the comparison
print("\n***************  SUBSET OF DATA  ***************") dataMarsCraterEquator = dataMarsCraterRaw[ (dataMarsCraterRaw['LATITUDE_CIRCLE_IMAGE']>=-50) & (dataMarsCraterRaw['LATITUDE_CIRCLE_IMAGE']<=50) ] print("Total number of rows in the close-to-equator dataset:") print(len(dataMarsCraterEquator)) # create a datasubset for data close to the north and south pole.  This is defined as +-40 degrees poleward dataMarsCraterPoles = dataMarsCraterRaw[ (dataMarsCraterRaw['LATITUDE_CIRCLE_IMAGE']<-50) | (dataMarsCraterRaw['LATITUDE_CIRCLE_IMAGE']>50) ] print("Total number of rows in the close-to-poles dataset:") print(len(dataMarsCraterPoles)) # Summation of rows in each of these subsets should equal to the total number of the original raw data table
print("\n***************  CRATERS CLOSE TO THE EQUATOR  ***************")
print("\n***************  LATITUDE for EQUATOR SUBSET ***************") print("Counts for the Latitude of the craters near the equator:") distELatCnt = dataMarsCraterEquator["LATITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distELatCnt) print("\nPercentages for the Latitude of the craters near the equator:") distELatPrc = dataMarsCraterEquator["LATITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distELatPrc)
print("\n***************  DIAMETER for EQUATOR SUBSET ***************") print("Counts for the Diameter of the craters near the equator:") distEDiamCnt = dataMarsCraterEquator["DIAM_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distEDiamCnt) print("\nPercentages for the Diameter of the craters near the equator:") distEDiamPrc = dataMarsCraterEquator["DIAM_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distEDiamPrc)
print("\n***************  DEPTH for EQUATOR SUBSET ***************") print("Counts for the Depth of the craters near the equator:") distEDepthCnt = dataMarsCraterEquator["DEPTH_RIMFLOOR_TOPOG"].value_counts(dropna=False,sort=True) print(distEDepthCnt) print("\nPercentages for the Depth of the craters near the equator:") distEDepthPrc = dataMarsCraterEquator["DEPTH_RIMFLOOR_TOPOG"].value_counts(dropna=False,sort=True, normalize=True) print(distEDepthPrc)
print("\n***************  CRATERS CLOSE TO THE POLES  ***************")
print("\n***************  LATITUDE for CLOSE-TO-POLES SUBSET ***************") print("Counts for the Latitude of the craters close to the poles:") distPLatCnt = dataMarsCraterPoles["LATITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distPLatCnt) print("\nPercentages for the Latitude of the close to the poles:") distPLatPrc = dataMarsCraterPoles["LATITUDE_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distPLatPrc)
print("\n***************  DIAMETER for CLOSE-TO-POLES SUBSET ***************") print("Counts for the Diameter of the craters close to the poles:") distPDiamCnt = dataMarsCraterPoles["DIAM_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True) print(distPDiamCnt) print("\nPercentages for the Diameter of the craters close to the poles:") distPDiamPrc = dataMarsCraterPoles["DIAM_CIRCLE_IMAGE"].value_counts(dropna=False,sort=True, normalize=True) print(distPDiamPrc)
print("\n***************  DEPTH for CLOSE-TO-POLES SUBSET ***************") print("Counts for the Depth of the craters close to the poles:") distPDepthCnt = dataMarsCraterPoles["DEPTH_RIMFLOOR_TOPOG"].value_counts(dropna=False,sort=True) print(distPDepthCnt) print("\nPercentages for the Depth of the craters close to the poles:") distPDepthPrc = dataMarsCraterPoles["DEPTH_RIMFLOOR_TOPOG"].value_counts(dropna=False,sort=True, normalize=True) print(distPDepthPrc)
Code Output
Mars Crater Study
Total number of rows in the dataset: 384343
Total number of columns in the dataset: 10
***************  ALL RAW DATA  ***************
***************  LATITUDE  *************** Counts for the Latitude of the craters: -23.634    17 -2.572     16 -12.970    15 -17.317    15 -3.150     15           .. -47.176     1 38.302     1 -49.988     1 56.487     1 -63.516     1 Name: LATITUDE_CIRCLE_IMAGE, Length: 129197, dtype: int64
Percentages for the Latitude of the craters: -23.634    0.000044 -2.572     0.000042 -12.970    0.000039 -17.317    0.000039 -3.150     0.000039
-47.176    0.000003 38.302    0.000003 -49.988    0.000003 56.487    0.000003 -63.516    0.000003 Name: LATITUDE_CIRCLE_IMAGE, Length: 129197, dtype: float64
***************  DIAMETER  *************** Counts for the Diameter of the craters: 1.01      6298 1.02      6077 1.03      6035 1.04      5941 1.05      5771
115.47       1 52.90        1 65.18        1 64.82        1 65.79        1 Name: DIAM_CIRCLE_IMAGE, Length: 6240, dtype: int64
Percentages for the Diameter of the craters: 1.01      0.016386 1.02      0.015811 1.03      0.015702 1.04      0.015458 1.05      0.015015
115.47    0.000003 52.90     0.000003 65.18     0.000003 64.82     0.000003 65.79     0.000003 Name: DIAM_CIRCLE_IMAGE, Length: 6240, dtype: float64
***************  DEPTH  *************** Counts for the Depth of the craters: 0.00    307529 0.07      2059 0.08      2047 0.09      2008 0.10      1999
4.75         1 2.84         1 4.95         1 2.97         1 3.08         1 Name: DEPTH_RIMFLOOR_TOPOG, Length: 296, dtype: int64
Percentages for the Depth of the craters: 0.00    0.800142 0.07    0.005357 0.08    0.005326 0.09    0.005224 0.10    0.005201
4.75    0.000003 2.84    0.000003 4.95    0.000003 2.97    0.000003 3.08    0.000003 Name: DEPTH_RIMFLOOR_TOPOG, Length: 296, dtype: float64
***************  SUBSET OF DATA  *************** Total number of rows in the close-to-equator dataset: 322844 Total number of rows in the close-to-poles dataset: 61499
***************  CRATERS CLOSE TO THE EQUATOR  ***************
***************  LATITUDE for EQUATOR SUBSET *************** Counts for the Latitude of the craters near the equator: -23.634    17 -2.572     16 -22.340    15 -12.406    15 -17.317    15           .. 42.240     1 45.063     1 46.044     1 16.290     1 -39.182     1 Name: LATITUDE_CIRCLE_IMAGE, Length: 93466, dtype: int64
Percentages for the Latitude of the craters near the equator: -23.634    0.000053 -2.572     0.000050 -22.340    0.000046 -12.406    0.000046 -17.317    0.000046
42.240    0.000003 45.063    0.000003 46.044    0.000003 16.290    0.000003 -39.182    0.000003 Name: LATITUDE_CIRCLE_IMAGE, Length: 93466, dtype: float64
***************  DIAMETER for EQUATOR SUBSET *************** Counts for the Diameter of the craters near the equator: 1.01      5557 1.02      5384 1.03      5310 1.04      5207 1.05      5107
128.10       1 78.73        1 78.77        1 87.81        1 165.98       1 Name: DIAM_CIRCLE_IMAGE, Length: 5773, dtype: int64
Percentages for the Diameter of the craters near the equator: 1.01      0.017213 1.02      0.016677 1.03      0.016448 1.04      0.016129 1.05      0.015819
128.10    0.000003 78.73     0.000003 78.77     0.000003 87.81     0.000003 165.98    0.000003 Name: DIAM_CIRCLE_IMAGE, Length: 5773, dtype: float64
***************  DEPTH for EQUATOR SUBSET *************** Counts for the Depth of the craters near the equator: 0.00    259380 0.10      1533 0.11      1495 0.09      1492 0.12      1445
4.01         1 2.89         1 4.72         1 2.50         1 3.31         1 Name: DEPTH_RIMFLOOR_TOPOG, Length: 293, dtype: int64
Percentages for the Depth of the craters near the equator: 0.00    0.803422 0.10    0.004748 0.11    0.004631 0.09    0.004621 0.12    0.004476
4.01    0.000003 2.89    0.000003 4.72    0.000003 2.50    0.000003 3.31    0.000003 Name: DEPTH_RIMFLOOR_TOPOG, Length: 293, dtype: float64
***************  CRATERS CLOSE TO THE POLES  ***************
***************  LATITUDE for CLOSE-TO-POLES SUBSET *************** Counts for the Latitude of the craters close to the poles: -58.016    8 -52.885    8 -54.023    8 -57.865    8 -58.413    8          .. 73.158    1 -63.830    1 -52.418    1 -73.587    1 72.000    1 Name: LATITUDE_CIRCLE_IMAGE, Length: 35731, dtype: int64
Percentages for the Latitude of the close to the poles: -58.016    0.000130 -52.885    0.000130 -54.023    0.000130 -57.865    0.000130 -58.413    0.000130
73.158    0.000016 -63.830    0.000016 -52.418    0.000016 -73.587    0.000016 72.000    0.000016 Name: LATITUDE_CIRCLE_IMAGE, Length: 35731, dtype: float64
***************  DIAMETER for CLOSE-TO-POLES SUBSET *************** Counts for the Diameter of the craters close to the poles: 1.01      741 1.04      734 1.03      725 1.02      693 1.06      680
63.02       1 100.24      1 28.56       1 36.48       1 23.81       1 Name: DIAM_CIRCLE_IMAGE, Length: 3467, dtype: int64
Percentages for the Diameter of the craters close to the poles: 1.01      0.012049 1.04      0.011935 1.03      0.011789 1.02      0.011268 1.06      0.011057
63.02     0.000016 100.24    0.000016 28.56     0.000016 36.48     0.000016 23.81     0.000016 Name: DIAM_CIRCLE_IMAGE, Length: 3467, dtype: float64
***************  DEPTH for CLOSE-TO-POLES SUBSET *************** Counts for the Depth of the craters close to the poles: 0.00    48149 0.04      706 0.05      696 0.03      666 0.07      658
2.32        1 2.50        1 2.20        1 2.25        1 2.40        1 Name: DEPTH_RIMFLOOR_TOPOG, Length: 231, dtype: int64
Percentages for the Depth of the craters close to the poles: 0.00    0.782923 0.04    0.011480 0.05    0.011317 0.03    0.010829 0.07    0.010699
2.32    0.000016 2.50    0.000016 2.20    0.000016 2.25    0.000016 2.40    0.000016 Name: DEPTH_RIMFLOOR_TOPOG, Length: 231, dtype: float64
0 notes
zaworm · 5 years ago
Text
Assignment 1: Develop Research Question and Personal Codebook
DATASET:
I have chosen the Mars Craters Study data set as it most accurately represents my field in the Physical Science and Engineering area of study.  I am interested in examining the craters on Mars and their relationship to further the science in this area. 
The Mars Craters Study, presents a global database that includes over 300,000 Mars craters 1 km or larger that were created between 4.2 and 3.8 billion years ago during a period of heavy bombardment
RESEARCH QUESTIONS:
I would like to study the relationship between the crater diameter and the depth of the craters on Mars.  The primary questions to be proposed is: 1.  “Does the crater diameter have a relationship with the depth of the crater?  Is the depth dependant on the diameter of the crater?”
For this primary question it will be important to note and compare the diameter of the crater and the crater depth.
In addition to the primary questions I will also be looking at the location of the craters and would like to investigate and research the following questions: 2.  “Are shallower depth craters associated with locations near the poles of Mars?”
For this secondary question it will be important to look at the location of the craters and in particular the latitude.  I will consider locations close to the poles as poleward +/- 40 degrees latitude.
PERSONALIZED CODEBOOK:
A subset of the variables used in the provide dataset will be used for the study.  Not all variables are required for my study, a unique identifier is used along with the diameter and depth and location.  The applicable Mars Crater variables names used in my study are:
CRATER_ID LATITUDE_CIRCLE_IMAGE LONGITUDE_CIRCLE_IMAGE DIAM_CIRCLE_IMAGE DEPTH_RIMFLOOR_TOPOG
The description of each of the applicable variables are:
• CRATER_ID – crater ID for internal sue, based upon the region of the planet (1/16ths), the “pass” under which the crate was identified, ad the order in which it was identified
 • LATITUDE_CIRCLE_IMAGE – latitude from the derived center of a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim (units are decimal degrees North) 
• LONGITUDE_CIRCLE_IMAGE – longitude from the derived center of a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim (units are decimal degrees East) 
• DIAM_CIRCLE_IMAGE – diameter from a non-linear least squares circle fit to the vertices selected to manually identify the crater rim (units are km) 
• DEPTH_RIMFLOOR_TOPOG – average elevation of each of the manually determined N points along (or inside) the crater rim(units are km) where:       Depth Rim - Points are selected as relative topographic highs under the assumption they are the least eroded so most original points along the rim       Depth Floor – Points were chosen as the lowest elevation that did not include visible embedded craters 
HYPOTHESIS:
It is hypothesized that a direct correlation does exist between the depth of the crater and the crater diameter of craters caused by asteroid bombardment.  
It is also hypothesized that the depths of the craters near the poles of the planet are much shallower than those found near the equator.  
The hypotheses are based on a literature review which is discussed below.
LITERATURE REVIEW:
A literature review was conducted on Mars cratering with an investigation on the crater, depth, diameter and locations.  The following keywords were used in a literature research on Google Scholar: Mars, Mars craters, Mars crater depth, Mars crater depth diameter relationship, Mars crater depth poles, craters poles vs equator
Craters appear throughout the terrain of Mars and are the result of a period of heavy bombardment from asteroids, protoplanets and comets.  The craters that appear on Mars are vital in understanding its surface material properties and provide insight into its climate and history and impact physics.
In the simplest form a crater can be assumed to be in a circular shape and be characterized via its depth and diameter.  The craters depth to diameter ratio is a fundamental property of craters but was not directly measurable until the last two decades.  More recently the ability to measure craters has been enhanced based on the instruments of the Mars Orbiter.  There have been several relationships and correlations made between crater depth and diameter, however several of these are based on older data and the most recent dataset can provide the opportunity to compare and contrast new relationships based on new data with historical relationships.  Depending on the complexity of the crater different relationships between depth and diameter can be observed.
Recent studies have shown that the craters formed near the poles of the planet can be up to 3 times shallower than their counterparts closer to the equator.
The use of new datasets has allowed for the reexamination of past large craters and also allows for smaller (>1km) craters to be analyzed.  
REFERENCES:
Barlow, N.G., 1993. Depth-diameter ratios for Martian impact craters: Implications for target properties and episodes of degradation. In its Mars: Past, Present, and Future. Results from the MSATT Program, Part 1 p 1 (SEE N94-33190 09-91)
Barlow, N. G. (1988). Crater size-frequency distributions and a revised Martian relative chronology. Icarus, 75(2), 285-305.
Barlow, N.G. and Bradley, T.L., 1990. Martian impact craters: Correlations of ejecta and interior morphologies with diameter, latitude, and terrain. Icarus, 87(1), pp.156-179.
Boyce, J.M., Mouginis‐Mark, P. and Garbeil, H., 2005. Ancient oceans in the northern lowlands of Mars: Evidence from impact crater depth/diameter relationships. Journal of Geophysical Research: Planets, 110(E3).
Cintala, M.J., Head, J.W. and Mutch, T.A., 1976, April. Martian crater depth/diameter relationships-Comparison with the moon and Mercury. In Lunar and Planetary Science Conference Proceedings (Vol. 7, pp. 3575-3587).
Hartmann, W.K., 1966. Martian cratering. Icarus, 5(1-6), pp.565-576.
Malin, M.C. and Dzurisin, D., 1977. Landform degradation on Mercury, the Moon, and Mars: Evidence from crater depth/diameter relationships. Journal of Geophysical Research, 82(2), pp.376-388.
Robbins, Stuart. 2011. Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database.
Robbins, S. J., and Hynek, B. M. ( 2012), A new global database of Mars impact craters ≥1 km: 2. Global crater properties and regional variations of the simple‐to‐complex transition diameter, J. Geophys. Res., 117, E06001, doi:10.1029/2011JE003967.
Stewart, S.T., Valiant, G.J., 2006. Martian subsurface properties and crater formation processes inferred from fresh impact crater geometries. Meteor. & Planet. Sci. 41, 10, pp. 1509- 1537.
Stepinski, T.F., Mendenhall, M.P. and Bue, B.D., 2009. Machine cataloging of impact craters on Mars. icarus, 203(1), pp.77-87.
1 note · View note