7 habits of highly effective data analysis

Highly effective data analysis isn't learned overnight, but it can be learned faster. Here are 7 habits of data analysis I wish someone told me for effectively incorporating, communicating and investing in data analysis geared towards an engineering team.

1. Value simplicity of analysis over fancy algorithms

If you can't explain your analysis to a 5 year old, then you'll have a tough time selling it to others. The focus for product data analysis is not the analysis -- don't get me wrong, you need the analysis, but it's the story you tell and your recommendations based on that data that really matter.

Using complex analysis that confuses will result in the exact opposite of what you want. You want to be able drive engineering behaviors and investments with your analysis. If your analysis is too opaque and engineers aren't quickly wrapping their heads around the story you're telling, your analysis has lost its value.

The ultimate measure of the impact of your data analysis is how engineering behaviors and investments change. Make it easy for others to change.

2. Value more data sources over more data

Looking at more data across a broader time frame can give you more confidence in your analysis. However, a single pipeline of telemetry or logs is limited by the features being captured. Generally, a single pipeline only tells a part of the product story.

Same Analysis + Same Pipeline = Same Story

What you need is another source of data. Maybe all SQL operations are logged somewhere or maybe you have the facility to pull a sample of logs from your users. More data sources also allows you to confirm whether your story is consistent. More data will not give you more insight. More data sources will.

3. Value familiar tools over the latest shiny tools

tools Shiny, new tools are fun to play with and sometimes useful, but remember the ultimate measure of the impact of your data analysis?

You want to make it easy for others to change, and change is not easy. Here are 3 things from Your Brain at Work to keep in mind to give yourself the best shot at facilitating change:

Make it safe for your fellow engineers to change. Tell a story that everyone can quickly wrap their heads around, and tell it with familiar tools. Stay away from the latest, coolest technologies for visualization unless they are really necessary for your story.
Drill into the core message of your analysis.
Repeat the core message, then repeat it again. 🙂

Unless you're recommending the adoption of a new tool, the focus should not be on the tools, it's on the core message of your story.

4. Value insights and investments over indicators

Indicators are your key performance indicators (KPI). They'll likely come in the form of graphs, plots or tables. Your analysis doesn't stop there. Indicators are only the first "I" of the 3 I's of data-driven engineering. Tell an insightful story around your data, and then recommend investments. You're the agent of change, and your analysis must be infused with your insights and your recommendations for investments.

5. Value CUSS over trust

Data never comes clean. That's why I frequently feel like a janitor. As a data janitor, I rarely trust all the data is there and in the right format. I always apply Kern's CUSS acronym from Introduction to Probability and Statistics Using R to understand the data's Center, Unusual features, Spread and Shape.

Center - Where is the general tendency of the data?
Unusual features - Are there missing data points? Outliers? Clustering?
Spread - What is the variability of the data?
Shape - If you plot the data, what is the shape of the data?

Knowing how the data is generated and the CUSS of the data allows you to draw better reasoned insights and investments.

6. Value direction over definitive

sign_post The cost of data collection is often the major hurdle standing in the way of a definitive answer to a business or engineering question. You can almost always get a partial answer that's better than what you have now.

The author of How To Measure Anything recommends asking this question:

"Is there any measurement method at all that can reduce uncertainty enough to justify the cost of the measurement?"

Even if you don't have the instrumentation in place to definitively answer whether a specific component is the problem, you can find a cheap way to reduce the uncertainty by eliminating a few components. Maybe you can stitch together a few different sources of data and give some very rough tallies to get things going in the right direction.

Getting yourself or your team moving in the right direction is more important than getting that super accurate, definitive answer.

7. Value how software works over how you think software works

The beauty of product data analysis is seeing the footprints of actual users using your software product. Sometimes you'll get a really nice set of footprints. More likely than not, you'll get partial impressions making your investigation all the more difficult. Regardless, telemetry and log footprints are a reflection of reality.

Architectural knowledge is a great asset. However, the telemetry and logs represent hard evidence of what's actually going on rather than what we believe is going on. As a product data scientist, you have a unique view of the software. You see the software as it actually is.

This is powerful, because not only do you have evidence of how the software actually works, you can also scale that insight to a broad set of users. You can make claims like "77% of our users go down this code path which contradicts the design." Believe in the footprints left behind by your users, but always double-check. One of my favorite quotes from The Elements of Statistical Learning is "In God we trust, all others bring data."

Bonus: Architectural knowledge is your smartcut

construction I'm contradicting myself here, but knowing how the different components in your product work together is invaluable for product data analysis.

Completely relying on your telemetry and logs to tell you about how the software works is possible, but it's the reliable yet slow way. Take the smartcut and learn about how the code executes. Step through it with a debugger, and form a model in your mind about how the components flow and fit together.

Keep SFDIPOT in mind: structure, function, data, interfaces, platform, operations and timing.

In Smartcuts, the author makes the claim that you can learn and train yourself faster by building on platforms. Platforms are things like tools and frameworks built by others. Use a debugger tool or an architecture document to quickly get your platform in place. Then your analysis of telemetry and logs takes on completely new meaning, since you deliberately trained yourself to spot patterns of code execution.

Your Habits?

There you have it! 8 habits of data analysis that have helped me, and my hope is they help you. Do you have habits that got you to be more effective? Let me know in the comments or by email.

Disclaimer: Please note that some of the links in this post are affiliate links (e.g. Amazon), and at no additional cost to you, I will earn a small commission if you decide to make a purchase. Please do not spend any money on these products unless you feel you need them or that they will help you achieve your goals.

About the Author

Ray Li

Ray is a software engineer and data enthusiast who has been blogging for over a decade. He loves to learn, teach and grow. You’ll usually find him wrangling data, programming and lifehacking.

Comments 8

Paras Doshi
May 15, 2015 at 4:32 pm

This is amazing list, Raymond! Something that every data analyst should digest…Thanks for putting this together.

1. Raymond Li
  May 15, 2015 at 4:47 pm
  
  Thanks for the kind words. Really appreciate it!
  
Vicky
May 24, 2015 at 6:18 pm

Thanks for sharing

1. Ray Li
  May 24, 2015 at 7:56 pm
  
  My pleasure, Vicky!
  
RaySF
December 22, 2015 at 9:17 am

Great post, Ray!
(well, just excellent like your other post on Machine Learning).

Still owing you a coffee
when going through Seattle!

best,
RaySF

1. Ray Li
  December 26, 2015 at 5:06 pm
  
  Thanks, RaySF! 🙂
  
Pingback: NYU Data Science newsletter – April 22, 2015 | Sports.BradStenger.com
Pingback: 7 habits of highly effective data analysis | rayli.net – Unstable Contextuality Research