Question : Well, I know a lot of Ml algorithms, but how do I determine if  a machine learning (ML) algorithm is  better than another  Machine Learning Algorithm?
Answer : A very  simple  metrics, we use to compare two machine learning models is “Accuracy” . For example, lets say for a problem there are two models with two accuracy metric
Linear Regression ML model – Accuracy = 80%
Support Vector Machine SVM – Accuracy = 85%

Then, simply compared we can say the  SVM is  a better than Linear Regression, because 85% is better than 80% Accuracy.
To know more, click  on the Accuracy and Accuracy  Paradox

Question: Awesome, now I  know accuracy. But I have a problem. I created a model that has 99.99% accuracy,  but my ML Expert Supervisor says its the worst model.  I don’t understand?
Answer:  It’s possible that, you have fallen into a Accuracy Paradox.  Accuracy is not always enough and in cases it is erroneously misleading. Let’s see an example how, you can have a worst model with 99.99% accuracy.
Let’s say we are trying to create a model that is trying to detect Credit Card Fraud. In such cases, we have a large number of non-fraud Transactions and very few Fraud transactions.  With example illustrated, lets say

No. Of Good  Transactions = 1,000,000
No. of Fraud Transactions = 1000
Our ML model says =  all 1,001,000 = Good then
Accuracy of our model = 1,000,000 / 1,001,000 = 99.9% accuracy

Although, accuracy of our model is 99.9%, it is able to detect 0 /1000 fraudulent transactions. Now, you see why  you model is bad despite 99.99% accuracy.
NOTE : Specially, this is the case when the data is not balanced i.e the number of one class  is more than the number of another class data. So you must be careful with unbalanced Data

Question:  What does careful mean. How do I evaluate the performance of my model in such  unbalanced data cases?
Answer: In such cases,  you should always look at  F1 Score instead of the accuracy.
Let’s examine the above example, with the F1 score.

No. Of Good  Transactions = 1,000,000
No. of Fraud Transactions = 1000
ML Predict all data as Non-Fraud

 Predicted (Positive) (Predicted  Fraud) Predicted (Negative) (Predicted Not Fraud) Actual Positive (Actual Fraud) TP = 0 FN = 1000 Actual Negative (Actual Not Fraud) FP = 0 TN = 1,000,000
 Precision = TP / (TP + FP) =  0 /(0+0) ~0 Recall = TP / (TP+FN) = 0 / (0+1000) = 0 F1 Score = 2PR /(P+R) = 0

Now you can see that for the Fraud case,
Accuracy = 99.99% but F1-Score = 0

Fig. Ref : For Precision , Recall and F1-Score evaluation

Question : Nice, ok now I know F1- Score, I always will be looking at it for unbalanced Data-sets. However lately and strangely I have come across two Machine Learning models that have same F1-Score, does that mean I could use either one  of the Machine Learning models?
Answer :   You  need to do more detailed examination in such cases. Let me show you by an example.
Let’s say I have two ML models

 F1 Score (2PR / (P+R)) Precision (P) Recall (R) ML Model A 1 0.9 0.1 ML Model B 1 0.1 0.9

In both of the cases, we have the same F1-Score, but the two models are completely different in that, model A has high precision and low recall i.e correlated with the above example,  the model A is able to predict Fraudulent cases with high precision i.e  if the model says the Transaction is Fraudulent, then it does so with 90% accuracy. However the recall is very low i.e  it is  able to only identify 100/1000  of total Fraudulent transaction.

However Model B, does the exact opposite, it can identify 900/1000 Fraud cases. However when it says that a transaction is fraud there is only  10% certainty that it is actually Fraud. i.e it is mistakenly classifying a lot of  other Good transactions also as a Fraud transaction.

Hence it is very important that one also looking into the Precision and Recall for the Classifies in addition to F1-Score. And one needs to trade off the requirement.