ROC Curve

ROC curve nicely expresses the relationship between the  True positive and the False positive. It is often used  in the binary classification.

Business Decision : Evaluate Classification Threshold
It if often used in the business scenario to find the cost benefit analysis. For example, for a  household ad-paper delivery predictor model, ad-paper involves cost, hence we might want to send newsletter only to the household which will  increase sales. In such case,  we can  use the threshold  from the ROC, to find the optimal households.
In Figure ROC curve 1, we might not want to use the threshold of 0.5, since as we observe that it will involve a lot of negative  ad- response case, instead based on the ROC curve, we might want to use the threshold of 0.6 or above.

Cost- benefit analysis with ROC curve1 .JPG Cost- benefit analysis with ROC curve.JPG

Generally, the threshold of 0.5 is used to classify the class as 1 or 0. However we can skew our threshold towards 0 or 1,  based on the cost benefit analysis on the ROC curve.


AUC is used to evaluate the classifier. The classifier with AUC  = 0.5 is equivalent to random classifier, while AUC with 1 is the perfect classifier.

AUC random.JPG


Problem : Can ROC curve be used for all types of data, including biased class data ?
Answer : No ROC curves will not be able to give out more information for biased classes. In such cases PR curve will be more  useful.  Precision-recall (PR) curve.

CASE study :  ROC vs PR Example
Let’s see it with an example of two algorithms.
Algo 1 : 90 of 100 positive identified
Algo 2: 90 of 1000 true identified.

ROC computation :
ROC For Algo 1:  = TPR / FPR = (90/100) / (10/1,999,900) = 0.00000500025
ROC For Algo 2:  = TPR / FPR = (90/100) / (910/1,999,900) = 0.00045502275

PR Computation :
PR for Algo 1 :  
Precision / Recall = 0.9/0.9 = 1
PR for Algo 2 :  Precision / Recall = 0.09/0.9 = 0.81

The difference in PR is more pronounced in PR curve vs ROC curve, when the class is skewed. Hence PR curve is preferred when the class is heavily biased.