#Crosstabulation
Explore tagged Tumblr posts
sanjeev216-blog · 5 years ago
Link
0 notes
ijtsrd · 6 years ago
Photo
Tumblr media
Teaching Data Analysis using SPSS
by San San Nwe | Myint Myint Yee | Aung Cho ""Teaching Data Analysis using SPSS""
Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019,
URL: https://www.ijtsrd.com/papers/ijtsrd26739.pdf  Paper URL: https://www.ijtsrd.com/computer-science/data-miining/26739/teaching-data-analysis-using-spss/san-san-nwe
international journals in engineering, call for paper science, ugc list of journals
SPSS, standing for Statistical Package for the Social Sciences, is a powerful, user friendly software package for the manipulation and statistical analysis of data. The package is particularly useful for students and researchers in psychology, sociology, psychiatry, and other behavioral sciences, containing as it does an extensive range of both univariate and multivariate procedures much used in these disciplines. This paper intends to support teacher teaching forecasting based on sample dataset teach.sav.SPSS background algorithm used crosstabulation Pearson chi square algorithm for data significant. Tech.sav was downloaded from Google and was analyzed and viewed. It used IBM SPSS statistics version 23 and PYTHON version 3.7. 
0 notes
ceescedasticity · 2 years ago
Text
Tulkas is behind only Manwë, Varda, and Námo in "did more harm than good". Evidently I need to brush up on what Tulkas actually did.
47 responses to the survey and unlike all the others in the 'definitely bad guy' question — including Ungoliant — Saruman has been claimed as a blorbo exactly 0 times.
37 notes · View notes
unfunkyufo · 8 years ago
Text
hey so im doing a project with a survey where i ask people if they were spanked as kids, if they support corporal punishment, and if they support 1) three strikes laws 2) solitary confinement and 3) capital punishment, and im wondering if theres some kind of programming where i could look at the results but in a way that connects all those questions together??? like __% of people who said they were spanked as kids would spank their own kids, and then __% of those people support capital punishment
i have 72 responses and like... all i can think of right now is like writing out shit in a google doc and color code it in order to find out what i want to find out, but im wondering if theres some kind of website or software i could use that doesnt require a masters in statistics to understand it that would help me understand these results. thanks for the help
4 notes · View notes
worklabournewsresearch · 6 years ago
Text
The Pay Gap in Sports
Tumblr media
“As the U.S. women’s national team started to celebrate their 2-0 World Cup win over the Netherlands [in July], the energetic crowd shifted from cheering on the group’s record fourth title to chanting a mantra of a slightly different tone when FIFA President Gianni Infantino took to the field: ‘equal pay.’ The U.S. women’s team has been outspoken about their fight for equal pay in the world of professional soccer: In March, 28 members of the team filed a gender discrimination lawsuit against the U.S. Soccer Federation. ... In the lead-up to their World Cup victory, several players on the team — most notably co-captain Megan Rapinoe — have talked openly about what they called an unfair pay structure at U.S. Soccer.”
“And new polling from Morning Consult, in partnership with women’s empowerment organization ASCEND, shows just how powerful and persuasive the U.S. women’s national team has been on this issue in just the past few weeks, and illustrates the political divisiveness that ensues when brands take a stand on gender and pay. While most respondents said their minds weren’t changed (40 percent) or offered no opinion (21 percent), more than one-third of the U.S. public said they now consider the gender pay gap in professional sports to be more of an issue specifically because of the U.S. women’s team’s victory in the World Cup.”
“Although the pay structures for men’s and women’s teams are very different ... — the women’s team argues in its lawsuit against U.S. Soccer that female athletes are considerably underpaid compared to their male counterparts. ... The filing ... says players on the women’s team each earned a $15,000 bonus for making the 2015 World Cup’s roster, compared to the $55,000 given to each male athlete for making the roster for the 2014 World Cup. ... [T]he union representing the women’s team negotiated a base pay of $100,000, plus another $72,500 for playing in the National Women’s Soccer League. ... FIFA’s prize money for the men’s 2018 World Cup, distributed among participating teams, was $400 million; the organization allotted just $30 million in prize money for teams in this year’s Women’s World Cup.”
Morning Consult, July 22, 2019: “After Women’s World Cup, Over a Third Say Sports’ Gender Pay Gap Is Bigger Concern,” by Joanna Piancenza
Morning Consult, July 16-18, 2019: “National Tracking Poll #190724: Crosstabulation Results” (76 pages, PDF) (the poll results)
The Conversation, August 6, 2019: “How big brands could solve the gender pay gap in sport,” by Katie Lebel
U.S. Women’s Soccer Team vs. U.S. Soccer Federation, March 8, 2019: “Case No. 2:19-CV-01717” (25 pages, PDF) (the lawsuit filed against U.S. Soccer Federation)
1 note · View note
timothy-mokoka · 2 years ago
Text
Assignment 2: Hypothesis Testing with Chi-Square Test of Independence
Introduction:
This assignment examines a 2412 sample of Marijuana / Cannabis users from the NESRAC dataset between the ages of 18 and 30. My Research question is as follows:
Is the number of Cannabis joints smoked per day amongst young adults in USA between the Ages of 18 and 30 the leading cause of mental health disorders such as depression and anxiety?
My Hypothesis Test statements are as follows:
H0: The number of Cannabis joints smoked per day amongst young adults in USA between the Ages of 18 and 30 is not the leading cause of mental health disorders such as depression and anxiety.
Ha: The number of Cannabis joints smoked per day amongst young adults in USA between the Ages of 18 and 30 is the leading cause of mental health disorders such as depression and anxiety.
Explanation of the Code:
I used the crosstabulation function to produce a contingency of observed counts and percentages of each mental health disorders, i.e. depression and anxiety. I did this in order to examine if whether the status (1 = Yes and 2 = No) of cannabis usage of the categorical explanatory variable ‘S3BQ1A5’ is correlated with the categorical response variables depression (‘MAJORDEP12’) and anxiety (‘GENAXDX12’). Therefore I ran a Chi-Square Test of Independence for these categorical variables twice, calculating the x-squared values for them and corresponding p-values so that the null and alternative hypothesis are corroborated or rejected with respect to the findings.
To visualize the associate relationship between the frequency of cannabis usage and the depression diagnosis I used the factor-plot function to produce the bivariate graph. I also used the crosstabulation function to test the association between the frequency of cannabis use (‘S3BQ1A5’) and general anxiety (‘GENAXDX12’). After the third Chi-Square Test of Independence I performed a Post Hoc Test using the Bonferroni Adjustment since the explanatory variable has more than two levels. Doing this makes it possible to identify instances where the null hypothesis can be rejected without making an extensive Type-I Error.
Code / Syntax:
-- coding: utf-8 --
""" Created on Fri Mar 31 12:20:15 2023
@author: Oteng """
import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt
nesarc = pandas.read_csv ('nesarc_pds.csv' , low_memory=False)
Sets pandas to show all columns in a dataframe
pandas.set_option('display.max_columns', None)
Sets pandas to show all rows in a dataframe
pandas.set_option('display.max_rows', None)
nesarc.columns = map(str.upper , nesarc.columns)
pandas.set_option('display.float_format' , lambda x:'%f'%x)
Changes the variables of interest to numeric
nesarc['AGE'] = pandas.to_numeric(nesarc['AGE'], errors='coerce') nesarc['S3BQ4'] = pandas.to_numeric(nesarc['S3BQ4'], errors='coerce') nesarc['S3BQ1A5'] = pandas.to_numeric(nesarc['S3BQ1A5'], errors='coerce') nesarc['S3BD5Q2B'] = pandas.to_numeric(nesarc['S3BD5Q2B'], errors='coerce') nesarc['S3BD5Q2E'] = pandas.to_numeric(nesarc['S3BD5Q2E'], errors='coerce') nesarc['MAJORDEP12'] = pandas.to_numeric(nesarc['MAJORDEP12'], errors='coerce') nesarc['GENAXDX12'] = pandas.to_numeric(nesarc['GENAXDX12'], errors='coerce')
Subset of my sample if interest
subset1 = nesarc[(nesarc['AGE']>=18) & (nesarc['AGE']<=30)] # Ages between 18-30 subsetc1 = subset1.copy()
subset2 = nesarc[(nesarc['AGE']>=18) & (nesarc['AGE']<=30) & (nesarc['S3BQ1A5']==1)] # Cannabis users, between age 18-30 subsetc2 = subset2.copy()
Setting missing data for frequency and cannabis use, variables S3BD5Q2E, S3BQ1A5
subsetc1['S3BQ1A5']=subsetc1['S3BQ1A5'].replace(9, numpy.nan) subsetc2['S3BD5Q2E']=subsetc2['S3BD5Q2E'].replace('BL', numpy.nan) subsetc2['S3BD5Q2E']=subsetc2['S3BD5Q2E'].replace(99, numpy.nan)
Contingency table of observed counts of major depression diagnosis (response variable) within cannabis use (explanatory variable), in ages 18-30
contab1=pandas.crosstab(subsetc1['MAJORDEP12'], subsetc1['S3BQ1A5']) print (contab1)
Column percentages
colsum=contab1.sum(axis=0) colpcontab=contab1/colsum print(colpcontab)
Chi-square calculations for major depression within cannabis use status
print ('Chi-square value, p value, expected counts, for major depression within cannabis use status') chsq1= scipy.stats.chi2_contingency(contab1) print (chsq1)
Contingency table of observed counts of geberal anxiety diagnosis (response variable) within cannabis use (explanatory variable), in ages 18-30
contab2=pandas.crosstab(subsetc1['GENAXDX12'], subsetc1['S3BQ1A5']) print (contab2)
Column percentages
colsum2=contab2.sum(axis=0) colpcontab2=contab2/colsum2 print(colpcontab2)
Chi-square calculations for general anxiety within cannabis use status
print ('Chi-square value, p value, expected counts, for general anxiety within cannabis use status') chsq2= scipy.stats.chi2_contingency(contab2) print (chsq2)
#
Contingency table of observed counts of major depression diagnosis (response variable) within frequency of cannabis use (10 level explanatory variable), in ages 18-30
contab3=pandas.crosstab(subset2['MAJORDEP12'], subset2['S3BD5Q2E']) print (contab3)
Column percentages
colsum3=contab3.sum(axis=0) colpcontab3=contab3/colsum3 print(colpcontab3)
Chi-square calculations for mahor depression within frequency of cannabis use groups
print ('Chi-square value, p value, expected counts for major depression associated frequency of cannabis use') chsq3= scipy.stats.chi2_contingency(contab3) print (chsq3)
recode1 = {1: 9, 2: 8, 3: 7, 4: 6, 5: 5, 6: 4, 7: 3, 8: 2, 9: 1} # Dictionary with details of frequency variable reverse-recode subsetc2['CUFREQ'] = subsetc2['S3BD5Q2E'].map(recode1) # Change variable name from S3BD5Q2E to CUFREQ
subsetc2["CUFREQ"] = subsetc2["CUFREQ"].astype('category')
Rename graph labels for better interpretation
subsetc2['CUFREQ'] = subsetc2['CUFREQ'].cat.rename_categories(["2 times/year","3-6 times/year","7-11 times/years","Once a month","2-3 times/month","1-2 times/week","3-4 times/week","Nearly every day","Every day"])
Graph percentages of major depression within each cannabis smoking frequency group
plt.figure(figsize=(12,4)) # Change plot size ax1 = seaborn.factorplot(x="CUFREQ", y="MAJORDEP12", data=subsetc2, kind="bar", ci=None) ax1.set_xticklabels(rotation=40, ha="right") # X-axis labels rotation plt.xlabel('Frequency of cannabis use') plt.ylabel('Proportion of Major Depression') plt.show()
Post hoc test, pair comparison of frequency groups 1 and 9, 'Every day' and '2 times a year'
recode2 = {1: 1, 9: 9} subsetc2['COMP1v9']= subsetc2['S3BD5Q2E'].map(recode2)
Contingency table of observed counts
ct4=pandas.crosstab(subsetc2['MAJORDEP12'], subsetc2['COMP1v9']) print (ct4)
Column percentages
colsum4=ct4.sum(axis=0) colpcontab4=ct4/colsum4 print(colpcontab4)
Chi-square calculations for pair comparison of frequency groups 1 and 9, 'Every day' and '2 times a year'
print ('Chi-square value, p value, expected counts, for pair comparison of frequency groups -Every day- and -2 times a year-') cs4= scipy.stats.chi2_contingency(ct4) print (cs4)
Post hoc test, pair comparison of frequency groups 2 and 6, 'Nearly every day' and 'Once a month'
recode3 = {2: 2, 6: 6} subsetc2['COMP2v6']= subsetc2['S3BD5Q2E'].map(recode3)
Contingency table of observed counts
ct5=pandas.crosstab(subsetc2['MAJORDEP12'], subsetc2['COMP2v6']) print (ct5)
Column percentages
colsum5=ct5.sum(axis=0) colpcontab5=ct5/colsum5 print(colpcontab5)
Chi-square calculations for pair comparison of frequency groups 2 and 6, 'Nearly every day' and 'Once a month'
print ('Chi-square value, p value, expected counts for pair comparison of frequency groups -Nearly every day- and -Once a month-') cs5= scipy.stats.chi2_contingency(ct5) print (cs5)
Output:
Tumblr media
Explanation: When the relationship between the association of cannabis usage and major depression, the Chi-Square Test of Independence amongst young adults aged between 18 and 30 years shows that those who were cannabis users in the last 12 months, which constitutes about 18%, where more likely to have been diagnosed with major depression compared to the non-users of cannabis (8.4%). X2 = 171.6, 1 df, p-value = 3.16e-39. Since the p-value is extremely small, the results provide enough evidence against the null hypothesis. Thus, we accept the alternative hypothesis and reject the null hypothesis since there is a positive relationship / association between cannabis usage and major depression.
Tumblr media
Explanation: When testing the relationship and association between cannabis use and general anxiety, the Chi-Square Test of Independence reveals that, amongst young adults aged between 18 and 30 years, those who were cannabis users were more likely to have been diagnosed with general anxiety in the last 12 months (3.8%), compared to the non-users of cannabis (1.6%), X2 = 40.22, 1 df, p-value = 2.26e-10. Thus these results provides enough evidence against the null hypothesis to safely reject it. Thus we accept the alternative hypothesis and reject the null hypothesis, which indicates a positive relationship between cannabis use and general anxiety.
Tumblr media
Explanation: This third Chi-Square Test of Independence shows that, for cannabis users aged between 18 and 30 years, the frequency of cannabis usage and major depression for the past 12 months were significantly associated, X2 = 35.18, 10 df, p-value = 0.00011.
Tumblr media
Explanation: The Bivariate graph above presenting my sample of interest shows that there is a positive correlation between the frequency of cannabis usage and major depression in the past 12 months. The distribution is skewed to the left which indicates that the more individuals aged 18 – 30 smoked cannabis the more chances they are to have or experience major depression in the past 12 months.
Tumblr media
Explanation: The Post Hoc Test comparison of the Bonferroni Adjustment of the rate of major depression by the pairs “Every Day” and “2 times a year” frequency categories reveal a p-value of 0.00019 and the percentage of major depression diagnosis for each frequency category / group are 23.7% and 11.6% respectively. Thus, since the p-value is smaller than the Bonferroni Adjusted p-value (0.0011 > 0.00019) we can assume that these rates are different from one another. Therefore, we can safely reject the null hypothesis and accept the alternative hypothesis.
Tumblr media
Explanation: With regards to the Post Hoc Test comparison with the Bonferroni Adjustment in relation to major depression by the pairs “Nearly every day” and “once a month” frequency categories, the p-value is 0.046 and the percentages of major depression for these two frequency category groups are 23.3% and 13.7% respectively. As a result, since the p-value is bigger than the Bonferroni Adjusted p-value (0.0011 < 0.046) we can safely assume that these rates are not significantly different from one another. Thus, in this instance, we can accept the null hypothesis and reject the alternative hypothesis.
0 notes
educadacademy · 2 years ago
Text
Oracle Database SQL Training
Oracle Database SQL course is an online course that assists you in preparing ng for the OCP exam. We offer a diverse oracle database SQL exam. This course covers all the features of SQL like editing and making running, running reports, transactional writing, writing short p, programs, and more. We have a batch of certified oracle trainers to assist you. It is a practically based SQL online course to help you have a full grip on Oracle database SQL.
Restricting and Sorting Data
Limit the rows that are retrieved by a query
Sort the rows that are retrieved by a query
Use substitution variables
Use the SQL row limiting clause
Create queries using the PIVOT and UNPIVOT clause
Use pattern matching to recognize patterns across multiple rows in a table
Using the Set Operators
Explain set operators
Use a set operator to combine multiple queries into a single query
Control the order of rows returned
Using Single-Row Functions to Customize Output
Describe various types of functions that are available in SQL
Use character, number, and date and analytical (PERCENTILE_CONT, STDDEV, LAG, LEAD) functions in SELECT statements
Use conversion functions
Manipulating Data
Describe the DML statements
Insert rows into a table
Update rows in a table
Delete rows from a table
Control transactions
Reporting Aggregated Data Using the Group Functions
Identify the available group functions
Use group functions
Group data by using the GROUP BY clause
Include or exclude grouped rows by using the HAVING clause
Using DDL Statements to Create and Manage Tables
Categorize the main database objects
Review the table structure
Describe the data types that are available for columns
Create tables
Create constraints for tables
Describe how schema objects work
Truncate tables, and recursively truncate child tables
Use 12c enhancements to the DEFAULT clause, invisible columns, virtual columns and identity columns in table creation/alteration
Displaying Data from Multiple Tables
Use equijoins and nonequijoins
Use a self-join
Use outer joins
Generate a Cartesian product of all rows from two or more tables
Use the cross outer apply clause
Creating Other Schema Objects
Create simple and complex views with visible/invisible columns
Retrieve data from views
Create, maintain and use sequences
Create private and public synonyms
Using Subqueries to Solve Queries
Use subqueries
List the types of subqueries
Use single-row and multiple-row subqueries
Create a lateral inline view in a query
Managing Objects with Data Dictionary Views
Query various data dictionary views
EXTRACT Managing Schema Objects
Manage constraints
Create and maintain indexes including invisible indexes and multiple indexes on the same columns
Create indexes using the CREATE TABLE statement
Create function-based indexes
Drop columns and set column UNUSED
Perform flashback operations
Create and use external tables
Controlling User Access
Differentiate system privileges from object privileges
Grant privileges on tables and on a user
View privileges in the data dictionary
Grant roles
Distinguish between privileges and roles
Manipulating Large Data Sets
Manipulate data using subqueries
Describe the features of multitable INSERTs
Use multitable inserts
Unconditional INSERT
Pivoting INSERT
Conditional ALL INSERT
Conditional FIRST INSERT
Merge rows in a table
Track the changes to data over a period of time
Use explicit default values in INSERT and UPDATE statements
Managing Data in Different Time Zones
Use various date time functions
Tz_offset
from_tz
to_timestamp
to_timestamp_tz
to_yminterval
to_dsinterval
current_date
current_timestamp
localtimestamp
dbtimezone
sessiontimezone
Generating Reports by Grouping Related Data
Use the ROLLUP operation to produce subtotal values
Use the CUBE operation to produce crosstabulation values
Use the GROUPING function to identify the row values created by ROLLUP or CUBE
Use GROUPING SETS to produce a single result set
Retrieving Data Using Subqueries
Use multiple-column subqueries
Use scalar subqueries
Use correlated subqueries
Update and delete rows using correlated subqueries
Use the EXISTS and NOT EXISTS operators
Use the WITH clause
Hierarchical Retrieval
Interpret the concept of a hierarchical query
Create a tree-structured report
Format hierarchical data
Exclude branches from the tree structure
Regular Expression Support
Use meta Characters
Use regular expression functions to search, match and replace
Use replacing patterns
Use regular expressions and check constraints
International Student Fee : 300 USD | 395 CAD | 1,125 AED | 1,125 SAR
Flexible Class Options
Corporate Group Training | Fast-Track
Week End Classes For Professionals SAT | SUN
Online Classes – Live Virtual Class (L.V.C), Online Training
0 notes
winportables · 3 years ago
Text
StatPlus Pro Portable You can work with various statistical tools and graphical analysis methods, such as Analysis of Variance (ANOVA), Design of Experiments (DOE), as well as regression, time series, and survival analysis. StatPlus Pro Portable is an advanced statistical analysis program intended to help you perform everything from data transformation and sampling to complex regression and non-parametric analysis, survival analysis, and other functions. The application comes with a multitude of charts (histograms, bars, areas, dot charts, pie charts, statistics, control charts) and spreadsheets of mathematical, statistical and financial functions. It also provides support for an Excel add-in that allows you to perform statistical tasks directly from the Excel interface. Clean feature line StatPlus Pro Portable reveals a well-structured GUI where you can enter data directly into a spreadsheet or import it from HTML, XLS, CSV, SAV, ODS, or other file formats. Thanks to the multi-tab design, you can work with different tabs at the same time and quickly switch between them. Editing functions Editing functions are implemented to help you activate clipboard-related tasks (cut, copy, paste), delete entries, search for items, and undo or redo your actions. A spell checker is included in the package. Additionally, you can insert cells, charts, symbols, functions, comments, images, and hyperlinks. Each cell can be customized in terms of layout (such as horizontal or vertical alignment), color, font, and border. You can print the information, email it, or export it to the same file formats as the input. Analysis tools StatPlus Pro Portable supports a wide range of statistical utilities, so be prepared to spend some of your time discovering them. These tests are related to the mean comparison t tests, the Pagurova criterion and the G criterion, the F test, the one and two sample z tests, the correlation coefficients (Pearson, Fechner) and the covariation, the tests normality, crosstabulation and frequency. table analysis (discrete / continuous). Additionally, you can perform analysis of variance (ANOVA) related tests with one-, two-, or three-way analysis of variance, data classification, design of experiments (DOE), as well as non-parametric statistics, such as 2 × 2 table analysis. (eg, chi-square, Yates chi-square, Fisher's exact test), rank correlations, and Cochran's Q test. You can perform regression analysis (for example, logistic regression, polynomial regression), time series analysis (for example, moving average, Fourier analysis, data processing), survival analysis (Cox proportional hazards regression, and Cox proportional hazards regression). ban), power and sample size analysis (PASS), and data processing (eg, random number generation, matrix operations, sampling). The tool allows you to generate charts, like Gantt, arrow, buble, error, pie, and control charts like X-bar, R-chart, S-chart, P-chart, C-chart, U-chart and CUSUM- chart. Graphics can be printed or exported to BMP, GIF, JPEG, PDF, SVG, or other file formats. Release year: 2021 Version: 6.2.5.0 System: Windows® 2000 / XP / Vista / 7/8 / 8.1 /In Windows 10 it is POSSIBLE, BUT NOT GUARANTEED! Interface language: Multilanguage English- English included File size: 75.88 MB Format: Rar Execute as an administrator: There's no need
0 notes
sanjeev216-blog · 5 years ago
Link
0 notes
acemywriter · 3 years ago
Text
Quantitative Analysis Report: Crosstabulation
Quantitative Analysis Report: Crosstabulation
*INSTRUCTIONS ALSO UPLOADED IN FILES SECTION. QUANTITATIVE ANALYSIS REPORT: CROSSTABULATION AND CORRELATION ANALYSIS ASSIGNMENT INSTRUCTIONS OVERVIEW You will take part in several data analysis assignments in which you will develop a report using tables and figures from the IBM SPSS® output file of your results. Using the resources and readings provided, you will interpret these results and test…
View On WordPress
0 notes
essaynook · 4 years ago
Text
Provide a table showing the average sales revenue, variable cost and contribution margin per region per brand.
Provide a table showing the average sales revenue, variable cost and contribution margin per region per brand.
doing some basic Google Colab steps, with provided data. Like making rows, tables, using mathematic functions. output file has to be JSON and Colab (Phyton). Needed = JSON/ GoogleCOLAB file with : •    A crosstabulation table showing the number of transactions and test whether the relationship between these two variables is significant using the chi-square test of independence. •    Calculate the…
View On WordPress
0 notes
fufupaw · 4 years ago
Text
This week's main Discussion requires you to answer the question completely and c
This week’s main Discussion requires you to answer the question completely and c
This week’s main Discussion requires you to answer the question completely and correctly to receive full credit. This week we talk about the uses of a crosstabulation (crosstabs) and the benefits of creating this “snapshot” of your data. For this forum, provide a brief introduction to your study to remind your classmates what we are reading about here. Include: 1. Your overall research…
View On WordPress
0 notes
the-social-networks · 4 years ago
Text
Digital Community and Fandom: Reality TV
WEEK 4
Reality television is an easy ratings grab yet an often criticised genre, popular amongst audiences but still strongly associated with “over the top emotions” (Kavka, 2019) or self-absorbed, wannabe celebrities. Though seemingly the most hated television genre (Morning Consult et al. 2018), it is a guilty pleasure for viewers that garners strong fanbases, with dedicated forums and social media pages made by fans (or fascinated haters) of the Kardashians, The Real Housewives or MAFS. So what makes reality tv so fascinating for audiences, and what role does social media have in its success, and vice versa?
Tumblr media
I used the above gif as an illustration of the major influence reality tv has on creating social publics. Kim Kardashian, famous for... being famous, has created an empire along with her family from their reality tv show. However, their success came during the advent of social media, with their success furthered by fandoms online - as well as critics - constantly discussing, mocking, or enjoying the show via the sharing of memes and iconic moments from the program (such as the gif above). Keeping Up With the Kardashians has covered a plethora of issues, with the world watching and debating these topics online. From minor family drama, to Caitlin Jenner’s transition, the Kardashians showcase the privilege and naivety we expect from Beverley Hills rich kids. However, the universal themes and social issues raised throughout its 10 year run acted as a catalyst for important discourse across social media.
In week 4, the lecture addressed that reality tv is less reliant on television as a medium as it is reliant on social media, with platforms giving reality stars the opportunity to present an even deeper look into their ‘personal’ lives and to “perform amplified versions” (Arcy, 2018) of themselves for audiences online. This idea of monetising and incentivising every aspect of the star’s life not only makes audiences feel as though they are engaging with the content on a deeper level (e.g. buying the same lipstick worn and promoted by their favourite Kardashian), it also creates a marketing tool for the show itself as well as brands that wish to be associated. The active participation of viewers and the two way communication that social media provides gives reality television an aspect of realism that other programming may lack, however as these shows become more and more intertwined with their fanbases online, the authenticity of these stars and these shows starts to fade. An example of this is outlined by Love Island, a show that relies on audiences being active audiences of both television and social media. 
This symbiotic relationship isn’t always positive or beneficial for producers, as Xavier L’Hoiry notes. Discussing the relationships that individual viewers have with each other online, L’Hoiry argues that though Love Island’s social media marketing strategy worked in engaging fans, it also caused issues for the show itself. Fans had the ability to access, share and discuss footage that proved the tv show’s editing was misleading, and creating an air of doubt about the realism of the show. Despite this, it seems the fans were “not seeking to counter organizational surveillance in order to destroy these systems” (L’Hoiry, 2019), but rather were so invested in the content that they wanted to know more, investigate more and uncover every detail surrounding the show, without these issues affecting ratings.
In my opinion, reality television is becoming less authentic in order to remain entertaining, however this strong focus on editing and manipulating social media to adhere to a particular narrative, can also create digital publics such as hashtags, or prompts important social issues to be discussed online surrounding controversies or conversations that appear on these shows.
References
Arcy, J. (2018) The digital money shot: Twitter wars, The Real Housewives, and transmedia storytelling, Celebrity Studies, 9:4, 487-502. 
Hajru, A., Graham, T. (2011) Reality TV as a trigger of everyday political talk in the net-based public sphere, European Journal of Communication, 26:1, 18-32.
L’Hoiry, X. (2019) Love Island, Social Media, and Sousveillance: New Pathways of Challenging Realism in Reality TV, Frontiers in Sociology, 4:59.
Morning Consult, The Hollywood Reporter. (2018) National Tracking Poll #181129 Crosstabulation Results, viewed 15 April 2021 <https://morningconsult.com/wp-content/uploads/2018/11/181129_crosstabs_HOLLYWOOD_REPORTER_Reality-TV.pdf>.
0 notes
ansprasad · 4 years ago
Text
Data Analysis Tools - Assignment 4
used python for checking depression as a moderating variable in cigarettes smoked vs nicotine dependence
loaded data using pandas
data = pd.read_csv('nesarc.csv', low_memory=False)
converted to numeric using following code data['TAB12MDX'] = data['TAB12MDX'].apply(pd.to_numeric, errors='coerce') data['CHECK321'] = data['CHECK321'].apply(pd.to_numeric, errors='coerce') data['S3AQ3B1'] = data['S3AQ3B1'].apply(pd.to_numeric, errors='coerce') data['S3AQ3C1'] = data['S3AQ3C1'].apply(pd.to_numeric, errors='coerce') data['AGE'] = data['AGE'].apply(pd.to_numeric, errors='coerce')
subsetted the target population
sub1=data[(data['AGE']>=18) & (data['AGE']<=25) & (data['CHECK321']==1)]
recoded cigarattes similar to what was done in class
recode1 = {1: 30, 2: 22, 3: 14, 4: 6, 5: 2.5, 6: 1} sub1['USFREQMO']= sub1['S3AQ3B1'].map(recode1)
def USQUAN (row):   if row['S3AQ3B1'] != 1:      return 0   elif row['S3AQ3C1'] <= 5 :      return 3   elif row['S3AQ3C1'] <=10:      return 8   elif row['S3AQ3C1'] <= 15:      return 13   elif row['S3AQ3C1'] <= 20:      return 18   elif row['S3AQ3C1'] > 20:      return 37 sub1['USQUAN'] = sub1.apply (lambda row: USQUAN (row),axis=1)
after dropping na values is crosstabulated using pd. crosstab
Name: S3AQ3C1, dtype: int64 USQUAN    0.0   3.0   8.0   13.0  18.0  37.0 TAB12MDX                                     0          289   130   210    43   114    20 1           97   119   267    91   254    67
and the column percentages are found
USQUAN        0.0       3.0       8.0       13.0      18.0      37.0 TAB12MDX                                                             0         0.748705  0.522088  0.440252  0.320896  0.309783  0.229885 1         0.251295  0.477912  0.559748  0.679104  0.690217  0.770115
chi quare value is computed using
cs1= scipy.stats.chi2_contingency(ct1)
results is as below
chi-square value, p value, expected counts (194.52141019317162, 4.218547040348835e-40, 5, array([[182.90182246, 117.98589065, 226.02116402,  63.49441505,        174.37272193,  41.22398589],       [203.09817754, 131.01410935, 250.97883598,  70.50558495,        193.62727807,  45.77601411]]))
Now the dataset is subsetted using depression
sub3=sub1[(sub1['MAJORDEPLIFE']== 0)] sub4=sub1[(sub1['MAJORDEPLIFE']== 1)]
factorplot in seaborn indicates
Tumblr media
cross tabulation is done again for the two subsets using pd.crosstab and chisquared is done using
ct2=pd.crosstab(sub3['TAB12MDX'], sub3['USQUAN'])
cs2= scipy.stats.chi2_contingency(ct2)
ct3=pd.crosstab(sub4['TAB12MDX'], sub4['USQUAN'])
cs3= scipy.stats.chi2_contingency(ct3)
the outputs of the print statements are as under
USQUAN    0.0  3.0  8.0  13.0  18.0  37.0 TAB12MDX                                 0         231  110  183    41    98    20 1          64   75  171    60   164    40 USQUAN        0.0       3.0       8.0       13.0      18.0      37.0 TAB12MDX                                                             0         0.748705  0.522088  0.440252  0.320896  0.309783  0.229885 1         0.251295  0.477912  0.559748  0.679104  0.690217  0.770115 chi-square value, p value, expected counts (119.8838461347068, 3.321507405356043e-24, 5, array([[160.29037391, 100.52108194, 192.34844869,  54.87907717,        142.35958632,  32.60143198],       [134.70962609,  84.47891806, 161.65155131,  46.12092283,        119.64041368,  27.39856802]])) association between smoking quantity and nicotine dependence for those WITH depression USQUAN    0.0  3.0  8.0  13.0  18.0  37.0 TAB12MDX                                 0          58   20   27     2    16     0 1          33   44   96    31    90    27 USQUAN        0.0       3.0       8.0       13.0      18.0      37.0 TAB12MDX                                                             0         0.748705  0.522088  0.440252  0.320896  0.309783  0.229885 1         0.251295  0.477912  0.559748  0.679104  0.690217  0.770115 chi-square value, p value, expected counts (87.90481311162473, 1.8504851262047968e-17, 5, array([[25.20945946, 17.72972973, 34.07432432,  9.14189189, 29.36486486,         7.47972973],       [65.79054054, 46.27027027, 88.92567568, 23.85810811, 76.63513514,        19.52027027]]))
the same has also been plotted using seaborn.factorplot
Tumblr media Tumblr media
it can be seen that the moderating variable depression does not have significant effect in the replationship between cigarattes smoked and nicotin dependence
0 notes
acedemicsblog · 4 years ago
Text
Criminal homework help
Quantitative Analysis Report: Crosstabulation & Correlation Assignment Instructions   Overview   You will take part in several data analysis assignments in which you will develop a report using tables and figures from the IBM SPSS® output file of your results. Using the resources and readings provided, you will interpret these results and test the hypotheses and writeup these…
View On WordPress
0 notes
lnct-mca · 5 years ago
Text
Chi-Square Test of Independence
The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test.
This test is also known as:
Chi-Square Test of Association.
This test utilizes a contingency table to analyze the data. A contingency table (also known as a cross-tabulation, crosstab, or two-way table) is an arrangement in which data is classified according to two categorical variables. The categories for one variable appear in the rows, and the categories for the other variable appear in columns. Each variable must have two or more categories. Each cell reflects the total count of cases for a specific pair of categories.
There are several tests that go by the name "chi-square test" in addition to the Chi-Square Test of Independence. Look for context clues in the data and research question to make sure what form of the chi-square test is being used.
Common Uses
The Chi-Square Test of Independence is commonly used to test the following:
Statistical independence or association between two or more categorical variables.
The Chi-Square Test of Independence can only compare categorical variables. It cannot make comparisons between continuous variables or between categorical and continuous variables. Additionally, the Chi-Square Test of Independence only assesses associations between categorical variables, and can not provide any inferences about causation.
If your categorical variables represent "pre-test" and "post-test" observations, then the chi-square test of independence is not appropriate. This is because the assumption of the independence of observations is violated. In this situation, McNemar's Test is appropriate.
Data Requirements
Your data must meet the following requirements:
Two categorical variables.
Two or more categories (groups) for each variable.
Independence of observations.
Relatively large sample size.
There is no relationship between the subjects in each group.
The categorical variables are not "paired" in any way (e.g. pre-test/post-test observations).
Expected frequencies for each cell are at least 1.
Expected frequencies should be at least 5 for the majority (80%) of the cells.
Hypotheses
The null hypothesis (H0) and alternative hypothesis (H1) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways:
H0: "[Variable 1] is independent of [Variable 2]" H1: "[Variable 1] is not independent of [Variable 2]"
OR
H0: "[Variable 1] is not associated with [Variable 2]" H1: "[Variable 1] is associated with [Variable 2]"
Test Statistic
The test statistic for the Chi-Square Test of Independence is denoted Χ2, and is computed as:
χ2=∑i=1R∑j=1C(oij−eij)2eijχ2=∑i=1R∑j=1C(oij−eij)2eij
where
oijoij is the observed cell count in the ith row and jth column of the table
eijeij is the expected cell count in the ith row and jth column of the table, computed as
eij=row i total∗col j totalgrand totaleij=row i total∗col j totalgrand total
The quantity (oij - eij) is sometimes referred to as the residual of cell (i, j), denoted rijrij.
The calculated Χ2 value is then compared to the critical value from the Χ2 distribution table with degrees of freedom df = (R - 1)(C - 1) and chosen confidence level. If the calculated Χ2 value > critical Χ2 value, then we reject the null hypothesis.
Data Set-Up
There are two different ways in which your data may be set up initially. The format of the data will determine how to proceed with running the Chi-Square Test of Independence. At minimum, your data should include two categorical variables (represented in columns) that will be used in the analysis. The categorical variables must include at least two groups. Your data may be formatted in either of the following ways:
IF YOU HAVE THE RAW DATA (EACH ROW IS A SUBJECT):
Cases represent subjects, and each subject appears once in the dataset. That is, each row represents an observation from a unique subject.
The dataset contains at least two nominal categorical variables (string or numeric). The categorical variables used in the test must have two or more categories.
IF YOU HAVE FREQUENCIES (EACH ROW IS A COMBINATION OF FACTORS):
An example of using the chi-square test for this type of data can be found in the Weighting Cases tutorial.
Cases represent the combinations of categories for the variables.
You should have three variables: one representing each category, and a third representing the number of occurrences of that particular combination of factors.
Before running the test, you must activate Weight Cases, and set the frequency variable as the weight.
Each row in the dataset represents a distinct combination of the categories.
The value in the "frequency" column for a given row is the number of unique subjects with that combination of categories.
Run a Chi-Square Test of Independence
In SPSS, the Chi-Square Test of Independence is an option within the Crosstabs procedure. Recall that the Crosstabs procedure creates a contingency table or two-way table, which summarizes the distribution of two categorical variables.
To create a crosstab and perform a chi-square test of independence, click Analyze > Descriptive Statistics > Crosstabs.
A Row(s): One or more variables to use in the rows of the crosstab(s). You must enter at least one Row variable.
B Column(s): One or more variables to use in the columns of the crosstab(s). You must enter at least one Column variable.
Also note that if you specify one row variable and two or more column variables, SPSS will print crosstabs for each pairing of the row variable with the column variables. The same is true if you have one column variable and two or more row variables, or if you have multiple row and column variables. A chi-square test will be produced for each table. Additionally, if you include a layer variable, chi-square tests will be run for each pair of row and column variables within each level of the layer variable.
C Layer: An optional "stratification" variable. If you have turned on the chi-square test results and have specified a layer variable, SPSS will subset the data with respect to the categories of the layer variable, then run chi-square tests between the row and column variables. (This is not equivalent to testing for a three-way association, or testing for an association between the row and column variable after controlling for the layer variable.)
D Statistics: Opens the Crosstabs: Statistics window, which contains fifteen different inferential statistics for comparing categorical variables. To run the Chi-Square Test of Independence, make sure that the Chi-square box is checked off.
E Cells: Opens the Crosstabs: Cell Display window, which controls which output is displayed in each cell of the crosstab. (Note: in a crosstab, the cells are the inner sections of the table. They show the number of observations for a given combination of the row and column categories.) There are three options in this window that are useful (but optional) when performing a Chi-Square Test of Independence:
1Observed: The actual number of observations for a given cell. This option is enabled by default.
2Expected: The expected number of observations for that cell (see the test statistic formula).
3Unstandardized Residuals: The "residual" value, computed as observed minus expected.
F Format: Opens the Crosstabs: Table Format window, which specifies how the rows of the table are sorted.
Example: Chi-square Test for 3x2 Table
PROBLEM STATEMENT
In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker. There were three answer choices: Nonsmoker, Past smoker, and Current smoker. Suppose we want to test for an association between smoking behavior (nonsmoker, current smoker, or past smoker) and gender (male or female) using a Chi-Square Test of Independence (we'll use α = 0.05).
BEFORE THE TEST
Before we test for "association", it is helpful to understand what an "association" and a "lack of association" between two categorical variables looks like. One way to visualize this is using clustered bar charts. Let's look at the clustered bar chart produced by the Crosstabs procedure.
This is the chart that is produced if you use Smoking as the row variable and Gender as the column variable (running the syntax later in this example):
The "clusters" in a clustered bar chart are determined by the row variable (in this case, the smoking categories). The color of the bars is determined by the column variable (in this case, gender). The height of each bar represents the total number of observations in that particular combination of categories.
This type of chart emphasizes the differences within the categories of the row variable. Notice how within each smoking category, the heights of the bars (i.e., the number of males and females) are very similar. That is, there are an approximately equal number of male and female nonsmokers; approximately equal number of male and female past smokers; approximately equal number of male and female current smokers. If there were an association between gender and smoking, we would expect these counts to differ between groups in some way.
RUNNING THE TEST
Open the Crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).
Select Smoking as the row variable, and Gender as the column variable.
Click Statistics. Check Chi-square, then click Continue.
(Optional) Check the box for Display clustered bar charts.
Click OK.
SYNTAX
CROSSTABS  /TABLES=Smoking BY Gender  /FORMAT=AVALUE TABLES  /STATISTICS=CHISQ  /CELLS=COUNT  /COUNT ROUND CELL  /BARCHART.
OUTPUTTABLES
The first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with nonmissing values for both smoking behavior and gender can be used in the test.
The next tables are the crosstabulation and chi-square test results.
The key result in the Chi-Square Tests table is the Pearson Chi-Square.
The value of the test statistic is 3.171.
The footnote for this statistic pertains to the expected cell count assumption (i.e., expected cell counts are all greater than 5): no cells had an expected count less than 5, so this assumption was met.
Because the test statistic is based on a 3x2 crosstabulation table, the degrees of freedom (df) for the test statistic isdf=(R−1)∗(C−1)=(3−1)∗(2−1)=2∗1=2df=(R−1)∗(C−1)=(3−1)∗(2−1)=2∗1=2.
The corresponding p-value of the test statistic is p = 0.205.
DECISION AND CONCLUSIONS
Since the p-value is greater than our chosen significance level (α = 0.05), we do not reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an association between gender and smoking.
Based on the results, we can state the following:
No association was found between gender and smoking behavior (Χ2(2)> = 3.171, p = 0.205).
Example: Chi-square Test for 2x2 Table
PROBLEM STATEMENT
Let's continue the row and column percentage example from the Crosstabs tutorial, which described the relationship between the variables RankUpperUnder (upperclassman/underclassman) and LivesOnCampus (lives on campus/lives off-campus). Recall that the column percentages of the crosstab appeared to indicate that upperclassmen were less likely than underclassmen to live on campus:
The proportion of underclassmen who live off campus is 34.8%, or 79/227.
The proportion of underclassmen who live on campus is 65.2%, or 148/227.
The proportion of upperclassmen who live off campus is 94.4%, or 152/161.
The proportion of upperclassmen who live on campus is 5.6%, or 9/161.
Suppose that we want to test the association between class rank and living on campus using a Chi-Square Test of Independence (using α = 0.05).
BEFORE THE TEST
The clustered bar chart from the Crosstabs procedure can act as a complement to the column percentages above. Let's look at the chart produced by the Crosstabs procedure for this example:
The height of each bar represents the total number of observations in that particular combination of categories. The "clusters" are formed by the row variable (in this case, class rank). This type of chart emphasizes the differences within the underclassmen and upperclassmen groups. Here, the differences in number of students living on campus versus living off-campus is much starker within the class rank groups.
RUNNING THE TEST
Open the Crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).
Select RankUpperUnder as the row variable, and LiveOnCampus as the column variable.
Click Statistics. Check Chi-square, then click Continue.
(Optional) Click Cells. Under Counts, check the boxes for Observed and Expected, and under Residuals, click Unstandardized. Then click Continue.
(Optional) Check the box for Display clustered bar charts.
Click OK.
OUTPUTSYNTAX
CROSSTABS  /TABLES=RankUpperUnder BY LiveOnCampus  /FORMAT=AVALUE TABLES  /STATISTICS=CHISQ  /CELLS=COUNT EXPECTED RESID  /COUNT ROUND CELL  /BARCHART.
TABLES
The first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with nonmissing values for both class rank and living on campus can be used in the test.
The next table is the crosstabulation. If you elected to check off the boxes for Observed Count, Expected Count, and Unstandardized Residuals, you should see the following table:
With the Expected Count values shown, we can confirm that all cells have an expected value greater than 5.
Computation of the expected cell counts and residuals (observed minus expected) for the crosstabulation of class rank by living on campus. Off-CampusOn-CampusTotal
Underclassman
Row 1, column 1
o11=79o11=79e11=227∗231388=135.147e11=227∗231388=135.147r11=79−135.147=−56.147r11=79−135.147=−56.147
Row 1, column 2
o12=148o12=148e12=227∗157388=91.853e12=227∗157388=91.853r12=148−91.853=56.147r12=148−91.853=56.147row 1 total = 227
Upperclassmen
Row 2, column 1
o21=152o21=152e21=161∗231388=95.853e21=161∗231388=95.853r21=152−95.853=56.147r21=152−95.853=56.147
Row 2, column 2
o22=9o22=9e22=161∗157388=65.147e22=161∗157388=65.147r22=9−65.147=−56.147r22=9−65.147=−56.147row 2 total = 161
Totalcol 1 total = 231col 2 total = 157grand total = 388
These numbers can be plugged into the chi-square test statistic formula:
χ2=∑i=1R∑j=1C(oij−eij)2eij=(−56.147)2135.147+(56.147)291.853+(56.147)295.853+(−56.147)265.147=138.926χ2=∑i=1R∑j=1C(oij−eij)2eij=(−56.147)2135.147+(56.147)291.853+(56.147)295.853+(−56.147)265.147=138.926
We can confirm this computation with the results in the Chi-Square Tests table:
The row of interest here is Pearson Chi-Square and its footnote.
The value of the test statistic is 138.926.
The footnote for this statistic pertains to the expected cell count assumption (i.e., expected cell counts are all greater than 5): no cells had an expected count less than 5, so this assumption was met.
Because the crosstabulation is a 2x2 table, the degrees of freedom (df) for the test statistic isdf=(R−1)∗(C−1)=(2−1)∗(2−1)=1df=(R−1)∗(C−1)=(2−1)∗(2−1)=1.
The corresponding p-value of the test statistic is so small that it is cut off from display. Instead of writing "p = 0.000", we instead write the mathematically correct statement p < 0.001.
DECISION AND CONCLUSIONS
Since the p-value is less than our chosen significance level α = 0.05, we can reject the null hypothesis, and conclude that there is an association between class rank and whether or not students live on-campus.
Based on the results, we can state the following:
There was a significant association between class rank and living on campus (Χ2(1) = 138.9, p < .001).
0 notes