Kurvv Blog

Precision and Recall and Autumn leaves

Oct 27, 2021 10:46:06 AM / by Ryan Lee

TL;DR - Precision measures how much of your predictions are actually correct. Recall measures how much of all the correct ones you have correctly predicted. High Precision and Recall is obviously great, but you seldom can have both. The optimal balance between the two is purely based on the situation.

 

 

clay-banks-4YZvSsBJOuk-unsplash-1

 

Fall is now in full force here in the Seattle area and so came the blanket of leaves covering my yard. Last weekend, I was shoveling the leaves from the nice pile my 7 year old daughter proudly created into the organic waste bin, when I noticed a good amount of pebbles that were mixed in the pile. It occurred to me that this was a great analogy to explain the two concepts that is commonly used to measure the performance of ML models (but easily confused) - Precision & Recall.

 

Say there are 100 leaves and 100 pebbles in the pile. I take a calculated scoop and have 40 leaves 10 pebbles on my shovel.

 

The performance for my scoop (ML model or solution) can be measured by the following. Out of the 100 leaves (positives) in the pile I was able to pick out (predict) 40. My scoop has a Recall of 40% (40/100).

 

Of the 50 items you scooped (40 leaves + 10 pebbles) your were able to pick 40 leaves, Your scoop has a Precision of 80% (40/50).

 

An easy way to remember is to mentally note 'precision = shovel' - since everything you need to calculate precision is in the shovel.

 

We can also use this analogy to explain the tradeoff between Precision and Recall. Trade-off being that increasing one value decreases the other.

 

Obviously the higher Precision and Recall the better. But increasing one decreases the other.

 

Say I want to maximize my Recall. I can get a giant shovel and just scoop the entire pile. I would have scooped up 100 out of the 100 leaves in the pile resulting in a perfect Recall (100%- 100/100). But now you have 50% (100/200) Precision instead of 80%.

 

Say I now want to maximize Precision. I can use a much smaller shovel so I can prevent accidently picking up pebbles but due its small size I can only pick up 10 leaves. Now a scoop give me 10 leaves and no pebbles, and thus have perfect Precision (100% - 10/10). But now 10% (10/100) Recall instead 40%.

 

So which combination of Precision and Recall is best?

This totally depends on the situation under which the solution (shovel scoop) will be deployed. If I was pressed for time and I needed to complete the chore with the minimum amount of scoops, I would value Recall over Precision, and just scoop up the pile with a single giant shovel. (along with the 100 pebbles) But since in this case the whole point was spending time with my 7 year old, we hand picked leaves from the pile - aka, ran the solution with perfect precision 10 times.

 

This is also why data scientists cannot build great solution in isolation. They need to understand the environment or the nuances of the application in order to make the right decisions during development and training.

 

 

 

I found the following two well written blogs that describe 'precision and recall'. They both use fishing analogies coincidently and very intuitive.

Precision and Recall: Understanding the Trade-Off

Recall, precision, specificity, and sensitivity

 

Photo by Clay Banks on Unsplash 

Tags: Machine Learning, data science, data scientist

Ryan Lee

Written by Ryan Lee