seattledataguy
seattledataguy
Data Science Consulting and Machine Learning
115 posts
I will be posting about data science and machine learning to help other consultants. Need a great blog post it, I will have it. Have questions, reach out to me! Good luck with your machine learning and data science projects
Don't wanna be here? Send us removal request.
seattledataguy · 8 years ago
Link
9 notes · View notes
seattledataguy · 8 years ago
Link
Web scraping typically requires a complex understanding of HTTP requests, faking headers, complex Regex statements, HTML parsers, and database management skills.
There are programming languages that make this much easier such as Python. This is because Python offers libraries like Scrapy and BeautifulSoup that make scraping and parsing HTML easier than old school web scrapers.
However, it still requires proper design and a decent understanding of programming and website architecture.
13 notes · View notes
seattledataguy · 8 years ago
Link
Unburdening the IT department
One of the issues analyst run into is getting access to data. Typically this requires putting in a ticket to the IT department…waiting 4 weeks and then maybe…getting an excel data pull that does not match the request of the analyst at all!
Sometimes, a simple data request can take months to get correct. A data warehouse allows analyst the ability to pull data for themselves. This still needs to be managed, and must not be the production database. However, if implemented properly, then analysts will be able to get the data for themselves.
2 notes · View notes
seattledataguy · 8 years ago
Link
Join us!
0 notes
seattledataguy · 8 years ago
Photo
Tumblr media
Twitter / BigDataBorat
26 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media
42 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media
An infographic look at Big Data.
69 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media
The Blood Stream by the Numbers!
384 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media
Take a look at the latest obesity data from the Centers for Disease Control and Prevention and you can see that the country’s obesity epidemic is far from over.
Even in Colorado, the state with the lowest rate, 21.3 percent of its population is obese. Arkansas tops the list with 35.9 percent.
“It is the largest epidemic of a chronic disease that we’ve ever seen in human history,” says Dr. Donald Lloyd-Jones, chair of the department of preventive medicine at the Northwestern University Feinberg School of Medicine.
Click on the CDC’s obesity prevalence maps and you’ll see something even more startling — the disparity among different ethnic groups. It’s not new that the obesity epidemic is hitting African-Americans the hardest, followed by Hispanics, but the maps highlight this worrying trend.
For African-Americans for example, there are 33 states with an obesity rate of at least 35 percent, whereas for white Americans only 1 state reports that rate. Nine states estimate the Hispanic obesity rate at 35 percent or higher.
“It is not about one group doing something wrong,” says Lloyd-Jones, who was not involved in creating the CDC maps. “It is about the environment that we have built that sets people up to fail.”
Obesity Maps Put Racial Differences On Stark Display
Map source: Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System. Credit: Alyson Hurt/NPR
562 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media Tumblr media
How DNA could store all the world’s data
It was Wednesday 16 February 2011, and Goldman was at a hotel in Hamburg, Germany, talking with some of his fellow bioinformaticists about how they could afford to store the reams of genome sequences and other data the world was throwing at them. He remembers the scientists getting so frustrated by the expense and limitations of conventional computing technology that they started kidding about sci-fi alternatives. “We thought, ‘What’s to stop us using DNA to store information?'”
Then the laughter stopped. “It was a lightbulb moment,” says Goldman, a group leader at the European Bioinformatics Institute (EBI) in Hinxton, UK. True, DNA storage would be pathetically slow compared with the microsecond timescales for reading or writing bits in a silicon memory chip. It would take hours to encode data by synthesizing DNA strings with a specific pattern of bases, and still more hours to recover that information using a sequencing machine. But with DNA, a whole human genome fits into a cell that is invisible to the naked eye. For sheer density of information storage, DNA could be orders of magnitude beyond silicon — perfect for long-term archiving.
667 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media
778 notes · View notes
seattledataguy · 8 years ago
Photo
Tumblr media
Every damn day
405 notes · View notes
seattledataguy · 8 years ago
Link
1 note · View note
seattledataguy · 8 years ago
Link
8 notes · View notes
seattledataguy · 8 years ago
Link
2 notes · View notes
seattledataguy · 8 years ago
Video
(via How To Apply Data Science To Real Business Problems - Seattle Data Guy)
2 notes · View notes
seattledataguy · 8 years ago
Link
The idea of a decision tree is to divide the data set into smaller data sets based on the descriptive features until you reach a small enough set that contains data points that fall under one label.
Each feature of the data set becomes a root[parent] node, and the leaf[child] nodes represent the outcomes. The decision on which feature to split on is made based on resultant entropy reduction or information gain from the split.
2 notes · View notes