datasciensediary
datasciensediary
Data Science Diary
45 posts
Learn Humbly | Think Critically | Improve Continuously
Don't wanna be here? Send us removal request.
datasciensediary · 8 years ago
Text
Looking into 2017
I have been delaying this post for a while now since it’s already half way through the first month in 2017. But hey, there are still 11.5 months left in the year! Sometime I felt the rush in getting things planned and done. This feeling probably comes from the super efficient Consultant inside me, who live in a hyper fast world with constant changes and new “new” things. 
I notice a slight chance in my mentality recently. “Why am I rushing? There are many, too many, things to enjoy in life than checking the next box.” I think this mental shift was triggered by the trip to Kyoto, Japan after my wedding in Nov 2016. Words simply cannot capture the experience of wandering through old streets, temples, and sub-urban. Time is precious, but quality moments seem to make time flow slower.
(pictures)
What quality moments am I looking for in 2017? Aside from having fun with my wife. 
I came down the data science path and enjoyed every scenery. I am comfortable to say that I am quite proficient in doing data science given my learning and work opportunity throughout 2015-2016. There is no doubt that I will keep learning, but start to focus on certain applications.
I started to wonder: 
What do people really need in a world of overdosed apps and information? What will be new data product look like?
Is Blockchain really the next Internet? What Blockchain features can we implement in current tech for faster incremental improvement?
How can I write better? I enjoy writing because it forces me to think and articulate. More importantly, it allows me to have a conversation with myself (yup, deep sh*t right?). I am not aiming to become a professional write, not yet at least. But I want to be able to write more concisely, provocative, and thoughtful. This is for my own entertainment at least, but I will be very happy if some people find it inspiring.
What is my action plan for 2017?
Complete the Insight to Innovation course with IDEO
Do at least 3 side projects using Deep Learning and Reinforcement Learning using Python
Read at least 3 books in each of the following topics:
Blockchain
Future of AI or Data
Biography of leaders in tech, politics, science, or sport
Economy, social phenomenon, history
Write at least once every two weeks; there is plenty of topics I can write about given all the things I am doing in 2017
This seems pretty ambitious. What happened to not rushing? Oh well, I think I will enjoy doing all these. So let’s see where this takes me :) 
0 notes
datasciensediary · 8 years ago
Link
0 notes
datasciensediary · 8 years ago
Link
0 notes
datasciensediary · 8 years ago
Link
0 notes
datasciensediary · 8 years ago
Link
Good article, but the way the most interesting thing is how Robert Herbold told the story.
0 notes
datasciensediary · 9 years ago
Text
Question of the Year
I have been asking this question over and over in the past a couple months: how can we combine design thinking and Big Data analytics to discover deeper human needs? 
Anthropology type of behaviour study is intimate and thoughtful, but not scalable and can be polluted by human emotion. 
Big Data analytics is scalable and efficient, but not reliable and comprehensive due to data constraint. 
It seems like a marriage of the two methods is a good solution. Should we try that and how? Looks like 2017 will be an provoking year :) 
0 notes
datasciensediary · 9 years ago
Text
Project Spectrum - Exploring Career Path based on Education
“What can I do with my degree?” - this is one of the most commonly asked questions by university students given the amount of opportunity and functional diversity in each industry.
I did a project in 2013 to help students to explore career path based on their education and co-op. We included some of the most common programs such as Commerce, Engineering, Science, and Humanity. 
By no means I am trying to take the credit for all the work, I actually have a team of 3 students working with me on this piece. It was wonderful to see all of them trying to push their boundaries. One of them didn’t know how to code, now he does! Another one wasn’t comfortable crunching numbers, now she is genius at it.  
The following graph is a snapshot of an interactive visualization. The actual solution is not maintained at the moment.
Tumblr media
More than 500 data points were collected from LinkedIn to generate this visual.
0 notes
datasciensediary · 9 years ago
Link
My side project on using co-shopping network and geographic analysis with open government data
0 notes
datasciensediary · 9 years ago
Link
One of the best introductory reading to Network Science
0 notes
datasciensediary · 9 years ago
Link
0 notes
datasciensediary · 9 years ago
Text
In the Search of my New Favourite NBA Team :)
I didn’t pay much attention to NBA after LA Lakers got their championship in 2010. It’s probably because I got really busy with work and don’t play much basketball any more (can’t run as fast, can’t jump as high, can’t find people to play with, etc.)
All the sudden, I realized the NBA landscape has changed so much! Kobe retired (sad face), Golden State Warriors had a record-breaking 73-9 W/L (well, too bad they lost the most important games to the Cavs), Toronto Raptors had the best run in the franchise history (proud of the home team), everyone is talking about players that I am not so familiar with, and so on ... 
Driven by bitterness of being out of date, I decided to catch up on the teams and develop a more holistic view on NBA teams by wielding my newly acquired data science and visualization skills in the past year. In particular, I want to find out which teams are the most effective at winning games.
To measure the effectiveness in winning games, I looked at the season standardized Field Goal Percentage and Win/Loss Ratio for each team over the past 30 years (1986 - 2016).
Tumblr media
The visual and methodology were inspired by a post from FiveThirtyEight - my favourite data journal. See the full size image here.
The reason I am using Field Goal Percentage (FG%) are because it is a good indicator of play execution and team dynamics. Using season average helps to normalize the effect from playing against strong / weaker team for all teams. 
Observations: 
1) Teams with higher upward arches means they can better capitalize  on points scored. Many legendary teams have the upward bowl shape. San Antonio Spurs is one of the most stable and effective teams because most points are towards the upper right. 
Tumblr media
2) Celtics has an interesting characteristic. The team was very resilient in some reasons when they had scoring issues. Celtics won ~60% while being one of the worst scoring teams in the season.
3) Although FG% is only one aspect of the game, it has very high correlation to winning games. 
Techniques
1) Web Scraping using XML library in R - all data obtain from http://www.basketball-reference.com/ 
2) Text Processing using Regex in R - Team names, season standing, etc. needed to be extracted from column names
3) Visualization using ggplot in R - team color from http://teamcolors.arc90.com/
2 notes · View notes
datasciensediary · 9 years ago
Link
Recently, someone asked me how to build a propensity model. My answer was just “... $@$##%... Make sense?” 
I have been looking for a good answer to entertain my own curiosity. Here is one of the best post that explains how it works and the process of building a propensity model. Shout out to Edwin Chen :) 
0 notes
datasciensediary · 9 years ago
Video
youtube
KNIME + R = AWESOMENESS :) 
0 notes
datasciensediary · 9 years ago
Video
hulu
Inspiring interview of Sebastian Thrun on education and machine learning by Charlie Rose. 
0 notes
datasciensediary · 9 years ago
Text
2015 Review
With a few hours left in 2015, it’s compulsive to write a review to check against this year’s plan :) In general, I think I’ve executed my plan pretty well along the way with a few great surprises. 
My original goal was to learn as much about machine learning as possible (see the very first post of this blog), but a few weeks later I realised this may be a bit too ambiguous. I am trained to come up with measurable plans as a Management Consultant at the end of the day :) 
So my 2015 goal in one sentence became “being able to perform basic data science tasks, including data ingestion, data exploration, machine learning, and visualisation using R and Python”.  
I am proud to give myself a stamp thanks to Udacity’s data analyst courses and many great tutorials, such as Data School by Kelvin Markham and many others that I included in appendix below.
As a surprise, I got a chance use and hone these data science skills to solve real client problems at work using a real big data stack (Hadoop + Revolution R). 
We were trying to develop novel analysis and solutions to predict and profile mortgage buyers with customer, product, and multi-channel interaction data. Also, we used the same platform to develop solutions to do micro-segmentation and predict churn. 
I was glad that we pulled it off because it was like the training was given in propeller plane, but I was given a F35 fighter jet in battle. So I am going to give myself an A+ on “Applying data science in real life”, which was a bonus :)   
Now looking forward to 2016, what should I do? To be updated. Need to spend a few days to do some self-assessment and trend analysis :) 
My favourite resources for learning, 
Udacity Data Analyst Courses
Cousera Machine Learning Case Study
Introduction to Statistical Learning using R
Data School
Yhat Blog
Kaggle Blog
FlowingData
My favourite reference for coding, 
R Data Wrangling Cheatsheet
R cookbook
Scikit Learn site
Google :) 
1 note · View note
datasciensediary · 9 years ago
Photo
Tumblr media
The most comprehensive and easy to understand process map of Data Science :) 
Credit: Udacity
1 note · View note
datasciensediary · 10 years ago
Link
Shouting out to Randy Carnevale for creating a great in-depth investigation of Gradient Boosting. Thought it’s a great drill down view of my own post of classification model overview
0 notes