Sometimes I see intro stats questions on the statistics tag, and I'm usually like 'drat, it's been days since they asked and they probably don't need an answer anymore!' So here's a blog and hopefully people ask about stuff. About me, I'm a stats grad, mostly using R but expanding my horizons. I loved TAing, but now I'm taking a break from school but would like to stay sharp. I track the tag whatisstats as well.
Don't wanna be here? Send us removal request.
Text
Hello there! Unfortunately this is too vague to answer. If you want to know if two variables are correlated, plot them on an X-Y graph, find R^2, do linear regression. If you want to know whether they come from the same population, use the appropriate hypothesis test--generally, you want to know whether their means are the same, and this can be done using z-tests/t-tests (are they normal? Continuous?), Mann-Whitney test (Not normal, but continuous?), binomial test (Not continuous/binary?), etc.
It is always helpful to pinpoint exactly what the question you want to answer is, the more specific the better, and go from there. Hope that helps! (or if it was just a call of general frustration, I'm sorry and hang in there!)
What’s the best method for testing the relationship between two variables?
1 note
·
View note
Text
My first question is, can we assume the exam's scores are Normally distributed? Otherwise we'll need a bit more information first before solving.
Background:
If it is normally distributed, imagine a bell curve. The center is at 1000. It is pretty fat too, because the standard deviation is high. The x-axis is your score. The number of students who score within, say, 1200 and 1300 is the area of the curve between 1200 and 1300 on the x-axis. You need the score at which 5% of the total area of the curve is to the right of the score.
(Aside: This can't actually be really normally distributed, as the SAT exam has a max and min score...but for the purposes of the problem we might assume this.)
You could technically calculate this with the equation for Normal distributions but we won't. Instead, we shift and shrink this distribution into the standard normal, Z, and back again. This equation is
mytransformedscore = (myscore - meanscore)/SD
Why do we use the Z distribution? Because Z has an area of 1, so 5% = .05. Also it's easier to have just one Z table than a table for every Normal distribution.
Solution:
1. Go to your Z table, find the Z score that leaves .05 area to the right of it (area of right tail). This is your 'mytransformedscore'
2. Fill in meanscore and SD of previous equation. Now use some algebra and solve for myscore
Hope that helps!
Is anyone good at statistics that could help me, please!
3 notes
·
View notes
Text
hello! I'll try to help, if you don't mind. What's your question?
Is anyone good at statistics that could help me, please!
3 notes
·
View notes
Text
Likely the data you're working with is a sample; properties of samples (called a statistic) can change, while properties of populations (called a parameter) do not. Hope that helps!
heLP me with statistics pls
i calculated the mean of a data to be 23.90. the data was the percentage of those able to sign up for a specific medical plan; an article claimed that nationwide 24% of those eligible signed up. why is the mean different from the percentage?
11 notes
·
View notes
Text
Hello there! I try not to give answers to specific homework problems, but I hope I can lead you in the right direction.
For the percentage: I assume you were given data whose mean is 23.90% The nationwide average is likely a population average - the average of every person in the nation. Your data, most likely, is a sample - the average of only some (hopefully randomly selected) people in the nation. Sample means will vary, but the population mean will not.
As for your second question, consider the formula
(Average y Per x) = (Total y)/(Total x).
Therefore,
(Average Points Per Paper) = (Total Points)/(Total Papers).
Right now, if we substitute the numbers you have into the above equation,
70 = (Total Points)/19
You can solve what your total points are. What you want, however, is
71 = (New Total Points)/(Possibly New Total Papers)
I say possibly new total papers because I don't know if you're supposed to raise the average by adding more papers or just increasing the score of one paper. In any case, know that
New Total Points = Total Points + (Number of Added Points)
New Total Papers = Total Papers (which was 19) + (Number of Added Papers)
Hope that helped!
heLP me with statistics pls
i calculated the mean of a data to be 23.90. the data was the percentage of those able to sign up for a specific medical plan; an article claimed that nationwide 24% of those eligible signed up. why is the mean different from the percentage?
11 notes
·
View notes
Note
Hi! Can you help me solve this? "A random survey of 59 households found that 19 households turned out their lights and pretended not to be at home on Halloween. Compute an 85% confidence interval (using the "Plus 4"technique) for p, the proportion of all households that pretend to not be at home on Halloween."
No problem!
Background:
Confidence Intervals:
Probability and statistics are often two sides of a coin. If we know the distribution of something, we can calculate the probability of getting a value between numbers a and b when we sample from that distribution. So for example, if we knew the true proportion p of households that pretend not to be at home on Halloween, we can calculate the probability of the sample proportion p_hat for a sample of size, say, 59, being between numbers p-ME and p+ME.
Note here that if p_hat is between p-ME and p+ME, then it must be true that p is between p_hat-ME and p_hat+ME. Very important! Vice versa is also true, and this is an if and only if-so if one isn’t in its respective range, neither will the other be.
In probability, we say that p_hat has a 95% probability of landing between values p-ME and p+ME. The flipside in statistics is that we are 95% confident that p is between p_hat-ME and p_hat+ME.
Caveat: This way of describing CIs is popular, but really it’s less misleading to say that 95% of the time we draw p_hat (so in 95% of samples), p_hat-ME and p_hat+ME will capture p. Unless you’re studying Bayesian statistics (and you’ll know if you are), you cannot cannot say that p has a something probability of being in a CI. p is a parameter, it does not move!
Proportions:
Sample proportions are actually Binomially distributed, but luckily, the Normal distribution is a good approximation with large enough sample sizes. Thus, for sample proportions (p_hat), ME for 95% CI = z_(.975, or .025 depending on your book)*(SD of p_hat, or SE). SE = sqrt(p*(1-p)/n)
However! Note that SD requires you to know p-but you don’t know p! The tricky part about proportions is that the ME is not fixed. This is unlike drawing from say a Normal distribution, where if you know the SD, your ME will still be the same and you will have accurate CIs, no matter the true mean. Not so with proportions, which makes smaller sample sizes/skewed populations difficult to handle. Sure you can just throw in p_hat for p, which is what you do, but the estimated ME will change a lot based on just whatever luck you had with finding the sample proportion, and it may be very wrong.
The plus four method stabilizes the sample proportion you get by pulling your results towards 0.5, rather than 0 or 1. It does this by pretending like your true n is n+4, and your number of successes is x+2, and your failures (n-x)+2. Technically, it is a simplification of Wilson’s interval, but I think this is the same intuition.
Solution:
Rather than 59 and 19, pretend you have 59+4 = 63 and 19+2 = 21. Compute as you would for the regular method of finding p_hat CIs, where it is simply p_hat +/- z_(for (1-.85)/2 or the positive of that value)*SE, where SE = sqrt(p_hat*(1-p_hat)/n). And remember, the p_hat here is not 19/59, and the n is not 59.
I hope that helped, and don’t hesitate to ask for clarification!
0 notes
Text
Hello! This may be too late to help you, but here you go:
Background:
Knowing what any distribution looks like allows you to figure out things like what percentage of the population is bigger than this number or is in between values a and b. The Normal distribution is super amazing because, among other things, if you are looking at a distribution of an average (say, the sample mean), by Central Limit Theorem, that distribution will always become Normal as n increases regardless of the population distribution. This is great because that means it doesn't matter if you have some crazy population distribution or don't even know what it is, you can still make inferences on its sample mean with sufficient sample size.
Now "sufficient sample size" is very important-this sample size is bigger when your population distribution is more skewed, and smaller if it's fairly symmetric/Normal shaped already. If you know it is already Normal (as in here), then the sample mean is definitely normally distributed (actually don't even need CLT for that!).
Standard error of the mean is the same thing as the standard deviation of the sample mean, but it's just sort of nice to differentiate that from the parameter 'standard deviation' (there's other reasons, but I chose this one).
SE = (standard deviation of population)/(square root of sample size). This should be true for any iid sample, and can be gotten from variance rules and the fact that sample mean = (sum of iid variables)/n
Solution:
The underlying population is Normal, thus small sample size (20) is not an issue.
We know the true standard deviation of the population (7.5), so do not need t distribution.
Our sample mean Xbar is Normally distributed with the same mean as the original population, with standard deviation (ie, SE) of (pop SD)/(sqrt(sample size))
Sorry for the rambling explanation, but hope that helped!
hey followers! anyone who’s good at statistics, can you help me out with this problem?
An SRS of 20 recent birth records at the local hospital were selected. In the sample, the average birth weight was 121.4 ounces and the standard deviation was 7.5 ounces. Assume that in the population of all babies born in this hospital, the birth weights follow a Normal distribution, with some mean μ.
The standard error of the mean is A. 27.1. B. 6.1. C. 1.7. D.0.4.
thanks in advance!
1 note
·
View note
Text
The wikipedia article actually does a great job with examples. Most of the examples have to do with the losing the magnitude of the number once it is converted into proportions:
Take, for example the Lisa and Bart example on the wikipedia page, where Lisa improves 0/3, 5/7 papers, and Bart improves 1/7, 3/3 papers. If we look at only proportion of papers improved per day, Bart wins both times, but combining the numbers (5/10, 4/10) respectively, we see that Lisa is overall better.
The problem here is that Lisa's failure the first day (0/3) is given the same weight as Bart's number (1/7) that day, even though Bart failed to improve a lot more papers. With bigger numbers, it'd be like seeing that Lisa failed to improve the single paper she was given, but Bart managed to improve 1 out of the 100 papers he was given, so Bart is better. I think the great confusion is that people tend to think of proportions as equally important-and if the sample sizes in all the groups were the same (say, they each graded 5 papers both days), then such a paradox shouldn't happen (of course, you should still consider possible confounders).
Hope that helped!
I will pay 100 sex to the first person who can make me understand how a Simpson’s paradox works
3 notes
·
View notes
Link
The improbable thrills of probability theory.
Whenever I find myself trying to explain Bayes' Theorem or the logical fallacies of the Sally Clark case, I either quote this fascinating article or belatedly realized that I didn't explain things as amazingly as this article did.
#chances are#bayes theorem#statistics#intro stats#stats help#things everyone should read#medicine#math
0 notes
Text
ahh ok.
cdf is the cumulative distribution function. It is the integral of the density (ie, the probability density function, or pdf). If X has a cdf F(X), then the value of F at X = k is simply P(X < k). The cdf is also called the distribution function (it's pretty confusing since some people will misspeak and refer to the pdf as the probability distribution function, so watch out!)
IMO was 'in my opinion' :)
Oh my…. *cries*
People, do anyone know/study about statistics?
I have note idea what means my homework… :
” X~U(-PI,PI) and Y = C tg(x), determine f_y(y) “
[ _ : sub ]
8 notes
·
View notes
Link
You're welcome, and thanks! (though I'm a little worried about your book now)
whatisstats:
thestripedshirtgirl:
Seriously does anyone know anything about hypothesis testing I REALLY need help D:
If you’d still like help, what are you confused on?
Oh my god, I can’t believe a blog exists for helping with stat stuff, I wish I knew about you…
7 notes
·
View notes
Text
Not at all! I'm new at this, so let me see how well I can do this in just text:
Background:
Presumably, you have some sample, and you are interested in its sample mean. Assuming you've shown that you can use a t-test here (or the book tells you to do it), you now want to test the hypothesis that the true mu = 5, ie, H_0: mu = 5. I'm also going to assume your significance level is .05
If your alternative is right tailed (H_1: mu > 5), you want to find sample mean M* such that P(M > M*) = .05 if the true mu = 5. In this case, you transform both sides of that inequality into the relevant t-distribution:
P( (M - 5)/SE > (M* - 5)/SE ) = .05
Because (M - 5)/SE is t-distributed with n-1 degrees of freedom (usually, I'll talk more on this later),
P( (M - 5)/SE > tstat ) = 1 - pt(tstat)
where pt(tstat) = P (T < tstat). Here, you look up tstat in a t-table with the relevant degrees of freedom, and then find the associated probability. Note that some books look at the right tail and some the left, so be careful! Anyway, this is what you want to do:
1. Find a tstat such that P(T > tstat) = .05. (do this using a computer or a t-table)
2. Solve for M* in (M* - 5)/SE = tstat
Now, for a two-tail alternative, it's very similar except you split that .05 region on either side, so on one side, that P(T > tstat) = .025 now:
1. Find a tstat such that P(T > tstat) = .025. (do this using a computer or a t-table)
2. Solve for M* in (M* - 5)/SE = tstat
3. Your critical value is |M*|, and you compare if |M| > |M*|
Left-tail is similar to right-tail, except switch the inequality.
And of course, if your null hypothesis wasn't 5 but some other number, replace the 5 with that number.
A final note: When you talk about critical values, you are usually talking about the critical sample mean M*, so if you get a sample mean M that's beyond that critical M*, you reject, otherwise you fail to reject. However, if you've already got your t statistic, ie, you've done (M - mu_0)/SE, then you only have to compare to that 'critical' tstat, which can be found by looking it up in a t-table/on a computer.
So, degrees of freedom:
Recall that t-distribution is used in cases where you know what you're estimating has a Normal distribution, but you don't know the true variance sigma^2. Instead of sigma^2, you approximate using the sample variance, but this introduces some bias (deviation from true variance, say) that luckily the t-distribution corrects for. However, you'd also expect that, with increasing sample size, your bias gets smaller. This is why as n increases, the t-distribution converges to the Z-distribution. Therefore, unlike the unique Z distribution, you actually get a separate t-distribution for every different degrees of freedom!
Degrees of freedom for a single i.i.d. sample of size n is n-1. It's used in a lot of other contexts, but I'm guessing you're interested in if you need it to calculate a critical value, and the answer is yes, because it determines the t-distribution you're using.
Hope that helped, and don't hesitate to ask for clarification!
whatisstats:
thestripedshirtgirl:
Seriously does anyone know anything about hypothesis testing I REALLY need help D:
If you’d still like help, what are you confused on?
Oh my god, I can’t believe a blog exists for helping with stat stuff, I wish I knew about you earlier!
Anyway, yeah I still need help, specifically I’m having trouble undrsranding how to establish where the critical values are on a distribution so I can compare them to my t obt., and I have no understanding if what degrees of freedom are whatsoever or if theyre related to the other issue or what.
If you can help me out with that, you will truly be my hero. My textbook is absolutely horrible. D:
7 notes
·
View notes
Text
If you'd still like help, what are you confused on?
Seriously does anyone know anything about hypothesis testing I REALLY need help D:
7 notes
·
View notes
Text
hello there! If it's not too late, can you throw this into a more equation-like format? I'm not exactly sure what the C tg(x) means here.
Nevertheless, I'm guessing that your X is uniformly distributed from -pi to pi, so what is f(x)? Using f(x), substitute it as your x into your equation of y. You can probably use the transformation of variables formula in your book, though I personally like to start with the cdf P(Y < y) = P(h(X) < y) = P(X < h^(-1)(y)) = F(h^(-1)(y)), then use chain rule (though really it's the same thing, just more intuitive IMO)
Hope that helped somewhat!
Oh my…. *cries*
People, do anyone know/study about statistics?
I have note idea what means my homework… :
” X~U(-PI,PI) and Y = C tg(x), determine f_y(y) “
[ _ : sub ]
8 notes
·
View notes
Text
Ask away!
Hello, all! I'm a stats major who hates seeing people quit and hate such a rich topic. Statistics is an extremely difficult thing to get into (I know, because I basically failed my first stats class :( ), but once you get it, it just opens a whole new world in the way you perceive and understand the world. One of my dreams is for everyone to have taken stats-not be a stats major of course, but just to come out of a single course at least.
So this is basically a blog where you can shoot me any sort of introductory stats question, and I may or may not answer (ie, I will want to answer but sometimes will not have the time), and I hope that in time, other people will also come and answer questions for others as well. Then again, it's probably best to dream small and hope that I will help at least one person with this.
#statistics#college#stats help#homework help#but obviously I'm not going to do it for you#high school
0 notes