seattledataguy - Tumblr blog

seattledataguy · 8 years ago

Link

#datascience #bigdata #analytics #ai #data #tech

9 notes · View notes

seattledataguy · 8 years ago

Link

Web scraping typically requires a complex understanding of HTTP requests, faking headers, complex Regex statements, HTML parsers, and database management skills.

There are programming languages that make this much easier such as Python. This is because Python offers libraries like Scrapy and BeautifulSoup that make scraping and parsing HTML easier than old school web scrapers.

However, it still requires proper design and a decent understanding of programming and website architecture.

#web design #google #programming #data #html #xml #http #excel #web scraping #easy

13 notes · View notes

seattledataguy · 8 years ago

Link

Unburdening the IT department

One of the issues analyst run into is getting access to data. Typically this requires putting in a ticket to the IT department…waiting 4 weeks and then maybe…getting an excel data pull that does not match the request of the analyst at all!

Sometimes, a simple data request can take months to get correct. A data warehouse allows analyst the ability to pull data for themselves. This still needs to be managed, and must not be the production database. However, if implemented properly, then analysts will be able to get the data for themselves.

#tech #data #big data #analytics #business #cost savings #ai #machinelearning

2 notes · View notes

seattledataguy · 8 years ago

Link

Join us!

#charity #suit up #change #nonprofit #makeadifference #challenge #makeyourown #seattle

0 notes

seattledataguy · 8 years ago

Photo

Twitter / BigDataBorat

26 notes · View notes

seattledataguy · 8 years ago

Photo

42 notes · View notes

seattledataguy · 8 years ago

Photo

An infographic look at Big Data.

69 notes · View notes

seattledataguy · 8 years ago

Photo

The Blood Stream by the Numbers!

384 notes · View notes

seattledataguy · 8 years ago

Photo

Take a look at the latest obesity data from the Centers for Disease Control and Prevention and you can see that the country’s obesity epidemic is far from over.

Even in Colorado, the state with the lowest rate, 21.3 percent of its population is obese. Arkansas tops the list with 35.9 percent.

“It is the largest epidemic of a chronic disease that we’ve ever seen in human history,” says Dr. Donald Lloyd-Jones, chair of the department of preventive medicine at the Northwestern University Feinberg School of Medicine.

Click on the CDC’s obesity prevalence maps and you’ll see something even more startling — the disparity among different ethnic groups. It’s not new that the obesity epidemic is hitting African-Americans the hardest, followed by Hispanics, but the maps highlight this worrying trend.

For African-Americans for example, there are 33 states with an obesity rate of at least 35 percent, whereas for white Americans only 1 state reports that rate. Nine states estimate the Hispanic obesity rate at 35 percent or higher.

“It is not about one group doing something wrong,” says Lloyd-Jones, who was not involved in creating the CDC maps. “It is about the environment that we have built that sets people up to fail.”

Obesity Maps Put Racial Differences On Stark Display

Map source: Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System. Credit: Alyson Hurt/NPR

562 notes · View notes

seattledataguy · 8 years ago

Photo

How DNA could store all the world’s data

It was Wednesday 16 February 2011, and Goldman was at a hotel in Hamburg, Germany, talking with some of his fellow bioinformaticists about how they could afford to store the reams of genome sequences and other data the world was throwing at them. He remembers the scientists getting so frustrated by the expense and limitations of conventional computing technology that they started kidding about sci-fi alternatives. “We thought, ‘What’s to stop us using DNA to store information?'”

Then the laughter stopped. “It was a lightbulb moment,” says Goldman, a group leader at the European Bioinformatics Institute (EBI) in Hinxton, UK. True, DNA storage would be pathetically slow compared with the microsecond timescales for reading or writing bits in a silicon memory chip. It would take hours to encode data by synthesizing DNA strings with a specific pattern of bases, and still more hours to recover that information using a sequencing machine. But with DNA, a whole human genome fits into a cell that is invisible to the naked eye. For sheer density of information storage, DNA could be orders of magnitude beyond silicon — perfect for long-term archiving.

667 notes · View notes

seattledataguy · 8 years ago

Photo

778 notes · View notes

seattledataguy · 8 years ago

Photo

Every damn day

405 notes · View notes

seattledataguy · 8 years ago

Link

#teams #datascience #bigdata #analytics #big data #business #growth #tech

1 note · View note

seattledataguy · 8 years ago

Link

#statistics

8 notes · View notes

seattledataguy · 8 years ago

Link

#data #database #datascience #bigdata #analytics #machinelearning #tech

2 notes · View notes

seattledataguy · 8 years ago

Video

(via How To Apply Data Science To Real Business Problems - Seattle Data Guy)

#datascience #bigdata #ai #data #statistics

2 notes · View notes

seattledataguy · 8 years ago

Link

The idea of a decision tree is to divide the data set into smaller data sets based on the descriptive features until you reach a small enough set that contains data points that fall under one label.

Each feature of the data set becomes a root[parent] node, and the leaf[child] nodes represent the outcomes. The decision on which feature to split on is made based on resultant entropy reduction or information gain from the split.

#data science #big data #analytics #math #basics

2 notes · View notes