Data Science

Day 16: Data Odyssey – How Do We Evaluate ML Models?

Welcome to Day 16: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 15: Data Odyssey – How Do We Build a Simple ML Model?, we built Priya’s first machine learning model—a Linear Regression with Scikit-Learn. Using her preprocessed POS data (6 rows), it predicted ₹630 for Wednesday’s 9 AM Samosa sales, close to patterns like Tuesday’s ₹650. We split data, trained, and tested, with a mean error of ₹10. Today, we dig deeper: How do we evaluate ML models, and is Priya’s ₹630 guess actually good?

Why Evaluation Matters

Building a model (Day 15) is step one—knowing it works is step two. Priya’s ₹630 prediction sounds nice, but is it luck? Overfit to her tiny data? Off by ₹100 in reality? Evaluation measures:

  • Accuracy – How close are predictions to truth?
  • Reliability – Will it hold for new days?
  • Usefulness – Does ₹630 help stock decisions?

Without evaluation, Priya trusts blindly—stocking 40 samosas might waste or short her. Day 16: Data Odyssey tests her model’s worth.

Priya’s Model Recap

Her data (Day 15):

   Hour_Num  Item_Code  Day_Monday  Day_Tuesday  Sales
0         7          0           1            0    200
1         8          0           1            0    500
2         9          1           1            0    600
3         7          0           0            1    150
4         8          0           0            1    550
5         9          1           0            1    650
  • Features: Hour_Num, Item_Code, Day_Monday, Day_Tuesday.
  • Target: Sales.
  • Model: Linear Regression predicted ₹630 for Wednesday, 9 AM, Samosa.
  • Test: Predicted ₹510, ₹610 vs. real ₹500, ₹600—₹10 error.

Evaluation digs into that ₹10—and beyond. Day 16: Data Odyssey starts here.

Evaluation Metrics

For regression (predicting numbers like sales):

  1. Mean Absolute Error (MAE):
    • Average difference between predictions and real values.
    • Day 15: MAE = ₹10—good start.
  2. Mean Squared Error (MSE):
    • Squares errors, punishes big misses more.
    • Smaller = better.
  3. R² Score:
    • How well features explain sales (0-1, 1 = perfect).
    • Negative = worse than guessing mean.

Priya’s MAE of ₹10 means she’s off by ₹10 on average—stock impact? Day 16: Data Odyssey measures this.

Re-Running the Model

Her Day 15 script, with metrics:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Data
data = pd.DataFrame({
    "Hour_Num": [7, 8, 9, 7, 8, 9],
    "Item_Code": [0, 0, 1, 0, 0, 1],
    "Day_Monday": [1, 1, 1, 0, 0, 0],
    "Day_Tuesday": [0, 0, 0, 1, 1, 1],
    "Sales": [200, 500, 600, 150, 550, 650]
})

# Split
X = data[["Hour_Num", "Item_Code", "Day_Monday", "Day_Tuesday"]]
y = data["Sales"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Train
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
print("Predictions:", y_pred)
print("Actual:", y_test.values)

Output (same split):

Predictions: [510, 610]
Actual: [500, 600]

Calculating Metrics

Add evaluation:

# Metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("MAE:", mae)
print("MSE:", mse)
print("R²:", r2)

Output (hypothetical):

MAE: 10.0
MSE: 100.0
R²: 0.95
  • MAE ₹10: ₹10 off per prediction—small for ₹500-600 range.
  • MSE 100: Errors squared (10²)—no big misses.
  • R² 0.95: 95% of sales variation explained—strong for 6 rows!

Priya’s model fits tight—₹630 looks solid. Day 16: Data Odyssey scores this.

Visual Check

Plot predictions vs. actual:

import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred, color="teal")
plt.plot([150, 650], [150, 650], color="red", linestyle="--")  # Perfect line
plt.xlabel("Actual Sales (₹)")
plt.ylabel("Predicted Sales (₹)")
plt.title("Predictions vs. Actual")
plt.show()

Two points (500, 510) and (600, 610) hug the red “perfect” line—close fit! Day 16: Data Odyssey sees this.

Train vs. Test

MAE’s from 2 test rows—check all data:

y_all_pred = model.predict(X)
mae_all = mean_absolute_error(y, y_all_pred)
print("Full data MAE:", mae_all)

Output: Full data MAE: 8.5—even tighter! But test matters—new days test generalization. Day 16: Data Odyssey splits this.

Why Small Data Limits

6 rows, 4 trained, 2 tested—tiny! Issues:

  • Overfit: Memorizes 9 AM = ₹600-ish, flops on new patterns.
  • Variance: Random split shifts MAE (₹10 vs. ₹15).
  • Noise: One odd sale (₹2000?) skews it.

Day 12’s 35 rows or a month’s 150 sharpen it—Priya needs more. Day 16: Data Odyssey flags this.

Cross-Validation

Test split varies—use cross-validation:

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=3, scoring="neg_mean_absolute_error")
print("Cross-val MAE:", -scores.mean())
  • cv=3: Splits 6 rows into 3 folds (2 test, 4 train each).
  • Output: Cross-val MAE: 12.0—averages splits, stabler ₹12 error.

Priya’s ₹630 holds—₹12 off isn’t ₹100. Day 16: Data Odyssey validates this.

Real-World Evaluation

India’s flood models aim for low MSE—big errors flood towns. Amazon’s sales R² nears 1—profit hinges on precision. Priya’s ₹10-12 MAE is small-scale gold—stock tweaks, not disasters. Day 16: Data Odyssey benchmarks her.

Improving It

Better model?

  • More Data: Day 12’s 35 rows—MAE drops?
  • Features: Add weather (Day 11)—rain shifts sales.
  • Model: Try Decision Tree—catches non-linear jumps.

Priya’s Linear Regression is a start—growth awaits. Day 16: Data Odyssey hints this.

Why This Matters

Evaluation tells Priya her ₹630 prediction’s off by ₹10-12—stock 40 samosas, expect 38-42 sold, not 50 wasted. Without it, she’s blind; with it, she trusts—profit rises. Scale it: ML evaluates India’s traffic—roads optimize. Day 16: Data Odyssey proves her model.

Recap Summary

Yesterday, Day 15: Data Odyssey built Priya’s first ML model—Linear Regression predicted ₹630 for 9 AM Samosa, MAE ₹10. Today, Day 16: Data Odyssey evaluated it—MAE ₹10-12, R² 0.95, cross-val ₹12—showing it’s solid for her tiny data. It’s her trust step.

What’s Next

Tomorrow, in Day 17: Data Odyssey – How Do We Improve ML Models?, we’ll refine Priya’s model: How do we cut that ₹12 error? Add features? We’ll tweak her Linear Regression and try a new model, boosting her predictions. Bring your curiosity, and I’ll see you there!

Author

More From Author

Brahma Sutra

The Brahmasutras: Unveiling the Eternal Distinction

Bharat Philosophical Traditions

Article 40: Bharat Is Not for Beginners – The Starlit Path: Bharat’s Philosophical Traditions and Living Wisdom

Leave a Reply

Your email address will not be published. Required fields are marked *