#maybe i should just. go back to. multiple variables regression or something | Explore Tumblr posts and blogs

crowleystolemyshoes · 4 years ago

Text

alright I took the ap stats test this year, and a few hours after completing the exam I wrote up a list of important things to study before taking it yourself. I was going to wait until next year to post it, but I realized some people are still going to take it maybe?

note that I won't get my score for another month, so I can't exactly back this up BUT here ya go

know the inferences. you'll have the long part of the formula, but you need to know to take p-hat plus or minus that long part (as an example with a one-sample z interval)

"long parts" that are given on the formula sheet:

(I'm not going to add alt text, just because stats formulas are so confusing to describe without symbols, but please let me know in the notes or dm me if you need me to--I'll send links to pages with the formulas or something. we'll figure it out 👍)

you have to know what to do with these to get each formula. the first thing I did when I opened each booklet was wrote extra formulas for each inference test: one- and two-sample z- and t-intervals and tests (eight formulas when I was done)

remember that the t-table is for confidence intervals and the z-table is for p-values (you will have to use the t-table for z-intervals and the z-table for t-tests) (note--the actual difference is that the z-table assumes you know the population standard deviation, where the t-table uses a sample's SD. The z-table is therefore used with an assumed null hypothesis, i.e. z- and t- tests, while the t-table applies to z- and t- intervals without a null hypothesis). The table is in the back of the booklet for the multiple choice, and it does say "stop" at the bottom of the last question. you are allowed to flip past to the tables.

hypotheses for proportions (z inferences) use p = __ ; mean hypotheses (t inferences) use mu (the weird "u" looking thing; mean of the whole population). do not put x-bar; that applies to the mean of a sample and not population.

also know what your p-value means: it's the probability of obtaining the data you used in the calculation (or more extreme--don't forget "or more extreme") assuming the null hypothesis is true

write out your conditions and show them being met. even write that 60 is greater than or equal to 30 if n=60 and you need to prove that it's an approximately normal distribution (note approximately and not exactly)

put your conclusion in context. mention what's being tested, not just numbers

type I and type II errors are something you'll want to put some time into--they were on both my 2021 exam and the 2012 one ap released as a practice. you'll also want to know the chances of each one happening (one is the significance level used and the other is some weird formula, and I don't remember which is which)

type I error: reject the null hypothesis when you shouldn't have

type II error: fail to reject the null hypothesis when you should have ("fail II reject")

- on that note, never accept the null hypothesis when writing conclusions, just fail to reject it

regression lines and "model error" came up and I had no clue about either. I still have no clue if "model error" is even a statistics thing. neither were in my free response section, thank fuck, but regression lines and that number summary thing your calculator gives you was in the free response for 2012, unless I'm mistaken

chi-tests came up twice on my test: on one multiple choice it was asking what test was appropriate with the given circumstances, and a chi goodness of fit test was an option. the second time was in the final free response question, and it was the chi-test about correlation between two variables. I got lucky, because it didn't actually want me to run one and I could bullshit my way through the question, but make sure you know them. put z- and t- inference tests at a higher priority, though

one of my free response questions was entirely free of calculations; it wanted me to explain the purpose of placebos, the importance of random sampling and assignment, etc. I think I got lucky again there, but make sure you have a firm understanding of experimentation procedure

you'll probably be fine without worrying about probabilities of events and compliments and independent v. dependent events and everything (PA, PB, PA union B, those things). they were a free-response on mine and I drew a flow chart and had no issue whatsoever, it was really simple. this one probably shouldn't be a priority unless it's something you personally struggle with

spend a bit of time with transforming data. ex: mean 1 foot, sd .5 foot --> mean 12 in, sd 6 in

last but far from least:

it's an ap test. for the free response, it's always better to over-explain than under-explain. it doesn't have to be readable; don't worry about repeated words or anything. just put as much on there as you can, and something should stick.

#ap tests #ap stats #math!!#uhhh how to tag this bitch #education #?? I guess??#how to bullshit an exam for a subject you took entirely online from september to january #when your entire amount of studying is three hours the night before the test #idk bro shit is wild #long post #but it's worth it because college board is a fucker

7 notes · View notes

shilkaren · 4 years ago

Text

How Do I Start My Data Science Career?

I would say that my data science learning path was fairly traditional. I did my undergrad in economics and have master’s degrees in global commerce and computer science (concentration in machine learning and artificial intelligence). I learned my business acumen from my coursework during my commerce degree and picked up most of the technical elements from my master’s in CS. I had a data science internship, and I was on my way. Looking back, there was nothing wrong with my path, but knowing what I do now, what would I change about my learning journey? This question is particularly relevant for people that are new to the field. Many things have changed since I started. Positions are more competitive and there are far more learning options. I would hope that my experience could help others learn data science faster, more completely, and give them better job opportunities.

I will caveat this article by saying that learning is a little bit different for everyone. My word is not gospel, and there is a good chance that you will find something that works a little bit better for you. Still, I hope this is a good foundation from which to build off. I hope it instills in you the big picture priorities that are relevant when learning this field. This article focuses more on how to learn than on where to learn (courses, boot camps, degrees, etc.). I recommend these two articles for specific courses and online resources for learning the field.

If I had to start learning Data Science again, how would I do it? (Santiago Viquez Segura)

My Self-Created Artificial Intelligence Masters Degree (Daniel Bourke)

For Those Who Would Prefer The Video Version:

Lesson 1: Break it down

When I first started learning data science I was overwhelmed with the size of the field. I had to learn programming languages and concepts from statistics, linear algebra, calculus, etc. When I was confronted with this many options, I didn’t know where to start. Fortunately for me, I had coursework to guide my studies. The degrees that I did broke down many of the concepts into smaller chunks (classes) so they were digestible. While this worked for me, I find that schools have a one-size-fits-all approach to this. They also include many extraneous classes that you don’t actually need. If I could go back, I could definitely break my data science learning journey into chunks better suited for me. Before diving into data science, it makes sense to understand the components that are used in the field. Rather than breaking things into “courses”, you can make data science into even smaller and more digestible chunks. I generally break data science into programming and math. Programming — familiarity with Python and/or R

Variables

Loops

Functions

Objects

Packages (pandas, NumPy, matplotlib, sklearn, TensorFlow, PyTorch, etc.)

Math Statistics

Probability theory

Regression (linear, multiple linear, ridge, lasso, random forest, SVM, etc.)

Classification (naive Bayes, knn, decision tree, random forest, SVM, etc.)

Clustering (k means, hierarchical)

Linear Algebra Calculus By breaking data science down into its components, you transform it from being an abstract concept into concrete steps.

Lesson 2: Start somewhere

When I was starting out, I was obsessed with learning things in the “correct” sequence. After entering the field, I found that many data scientists learned their skills in drastically different orders. I met Ph.D.’s that had studied math first, and only learned the programming concepts after taking a Bootcamp. I also met software engineers that were incredible programmers and learned math later through self-study and application.I now realize that it is most important to start somewhere, preferably with a topic you are interested in. I found that learning is additive. If you learn one thing, you are not forgoing learning another concept. If I had to go back, I would start with the concepts that were most interesting for me at the time. Once you learn a single concept, you can build on that knowledge to understand others. For example, if you learn a simple linear regression, multiple linear regression is a fairly easy step. Still, I probably wouldn’t jump right in and start with deep learning. It helps to start small and simple and build on that foundation.

Lesson 3: Build Minimum Viable Knowledge (MVK)

Over time, I’ve had a change of opinion about how much foundational knowledge you need. After experiencing many different types of learning myself, I believe that learning by doing real-world projects is the most effective way to grasp a field. I think that you should understand just enough of these concepts to be able to start exploring your own projects. This is where minimum viable knowledge comes into play. You should start by learning just enough to be able to learn through doing. This stage is fairly hard to identify. Generally, you will feel like you aren’t ready when you first get here. This is a good thing though, it means that you are pushing yourself out of your comfort zone. You can reach this stage fairly easily. I think you can get to this level of knowledge with every introductory online course, and I generally recommend the micro-courses at kaggle.com. To get to this step, all you really need to understand is the basics of python or R and have a familiarity with the packages used. You can start learning math later by applying some of the algorithms to real-world data.

Lesson 4: Get your hands dirty

With your basic knowledge, I recommend getting into projects as quickly as possible. Again, this sounds scary, but a project is all about how you define it. At the early stages, a project could be something as simple as experimenting with a for a loop. As you progress, you can graduate to projects using data on Kaggle, and eventually using data that you have collected. I am a HUGE believer that the best way to learn data science is to do data science. I think that the theory is VERY important, but no one says that you have to understand it all before you start applying it. The theory is something you can go back to after you have a functional understanding of the algorithms. For me, real-world examples were always what made things click. If you start with the real-world examples through projects, I think things have a far higher chance of things “clicking” when you start learning the theory. Projects also have the power to make data science smaller. One of the biggest challenges I see for new learners is that the field of data science can be overwhelming. Confining the things you are learning to the size of a small project allows you to break things down even further than you did in Lesson 1. Projects offer one additional benefit. They give you immediate feedback on where you need to improve. If you are working on a project and you run into a roadblock about what package, algorithm, or visual to use, you now know that you should probably study that area of the field further.

Lesson 5: Learn from other people’s code

While doing your own projects is great, sometimes you don’t know what you don’t know. I highly recommend going through the code of more experienced data scientists to get ideas about what to learn next and to better understand logic or syntax. On Kaggle and GitHub there are thousands (maybe millions) of kernels where people have shared the code that they used to analyze datasets. Going through these is a great way to complement your projects. I recommend making a list of the packages, algorithms, and visuals that you see being used. You should go to the documentation for the packages and expand your knowledge there. They almost always have examples in the docs for how they should be used. Again, this list can be used to help you think of new project ideas and experiments.

Lesson 6: Build algorithms from scratch

This is a rite of passage for most data scientists. After you have applied an algorithm and understand how it works in practice, I recommend trying to code it from scratch. This helps you to better understand the underlying math and other mechanisms that make it work. When doing this, you will undoubtedly have to learn the theory behind it as well. I personally think that learning in this direction is far more intuitive than trying to master the theory and then apply it. This is the approach that fastai has taken with their free MOOC. I highly recommend it if you are interested in deep learning. For this, I generally recommend starting with linear regression. This will help you to better understand gradient descent, which is an extremely important concept to build on. As you advance your data science career further, I think theory becomes increasingly important. You bring value by matching the correct algorithm to the problem. The theory associated with the algorithm greatly facilitates this process.

Lesson 7: Never stop learning

The beauty of the data science journey is that it never ends. You will need to keep learning to stay on top of new packages and advancements in the field. I recommend doing this through (you guessed it) more projects. I also recommend continuing with the code review and reading new research that is published. This is more of a mindset recommendation than anything practical. If you think that there is a pinnacle, you are in for a surprise!

Insideaiml is one of the best platforms where you can learn Python, Data Science, Machine Learning, Artificial Intelligence & showcase your knowledge to the outside world.

#ai #artificial intelligence #datascience #data scientist

1 note · View note