Day 15: Data Odyssey – How Do We Build a Simple ML Model?

Welcome to Day 15: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 14: Data Odyssey – What is Machine Learning?, we introduced machine learning (ML) as the art of teaching computers to predict from data—like Priya forecasting her café’s sales. We explored its types (supervised, unsupervised), process (features to predictions), and promise: turning her 9 AM ₹600 into tomorrow’s guess. With her data preprocessed (Day 13), it’s time to act. Today, we dive in: How do we build a simple ML model, and what’s Priya’s first prediction?

From Vision to Action

Day 14 promised ML could predict Priya’s sales—say, 9 AM tomorrow—using her preprocessed data (Hour_Num, Item_Code, Day_Monday). Today, we use Scikit-Learn (installed Day 13) to build a supervised model: Linear Regression. It’s simple—learns a line through her data (e.g., “sales rise with hour”)—and fits her goal: predict sales in ₹. No theory overload—just code, results, and lessons. Day 15: Data Odyssey makes ML real.

Priya’s Preprocessed Data

Her week’s data (Day 13, simplified to 6 rows for clarity):

   Hour_Num  Item_Code  Day_Monday  Day_Tuesday  Sales
0         7          0           1            0    200
1         8          0           1            0    500
2         9          1           1            0    600
3         7          0           0            1    150
4         8          0           0            1    550
5         9          1           0            1    650

Features: Hour_Num (7-9), Item_Code (0=Chai, 1=Samosa), Day_Monday/Tuesday (1=Yes, 0=No).
Target: Sales (₹150-650). Goal: Predict Wednesday, 9 AM, Samosa (Hour_Num=9, Item_Code=1, Day_Monday=0, Day_Tuesday=0). Day 15: Data Odyssey targets this.

Building the Model

Steps:

Load Data:

import pandas as pd
data = pd.DataFrame({
    "Hour_Num": [7, 8, 9, 7, 8, 9],
    "Item_Code": [0, 0, 1, 0, 0, 1],
    "Day_Monday": [1, 1, 1, 0, 0, 0],
    "Day_Tuesday": [1, 1, 1, 0, 0, 0],
    "Sales": [200, 500, 600, 150, 550, 650]
})

Split Features and Target:

X = data[["Hour_Num", "Item_Code", "Day_Monday", "Day_Tuesday"]]  # Features
y = data["Sales"]  # Target

Train-Test Split:
- Test it—use 4 rows to train, 2 to check.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Fit Model:

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)  # Learns from 4 rows

Predict:
- Test on X_test (e.g., Hour_Num=8, Item_Code=0, Day_Monday=0, Day_Tuesday=1):

predictions = model.predict(X_test)
print(predictions)

Output (example): [520, 620]—close to real (550, 600)? Day 15: Data Odyssey runs this.

Priya’s Prediction

Wednesday, 9 AM, Samosa:

new_data = pd.DataFrame({
    "Hour_Num": [9],
    "Item_Code": [1],
    "Day_Monday": [0],
    "Day_Tuesday": [0]
})
pred = model.predict(new_data)
print("Predicted 9 AM Wednesday Samosa sales:", pred[0])

Output (hypothetical): Predicted 9 AM Wednesday Samosa sales: 630. Near Tuesday’s ₹650—makes sense! Day 15: Data Odyssey delivers this.

Full Script

Priya’s first model:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Data
data = pd.DataFrame({
    "Hour_Num": [7, 8, 9, 7, 8, 9],
    "Item_Code": [0, 0, 1, 0, 0, 1],
    "Day_Monday": [1, 1, 1, 0, 0, 0],
    "Day_Tuesday": [0, 0, 0, 1, 1, 1],
    "Sales": [200, 500, 600, 150, 550, 650]
})

# Features and target
X = data[["Hour_Num", "Item_Code", "Day_Monday", "Day_Tuesday"]]
y = data["Sales"]

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Train
model = LinearRegression()
model.fit(X_train, y_train)

# Test prediction
predictions = model.predict(X_test)
print("Test predictions:", predictions)
print("Actual test values:", y_test.values)

# Wednesday 9 AM Samosa
new_data = pd.DataFrame({
    "Hour_Num": [9],
    "Item_Code": [1],
    "Day_Monday": [0],
    "Day_Tuesday": [0]
})
pred = model.predict(new_data)
print("Predicted 9 AM Wednesday Samosa sales:", pred[0])

Output (varies with split):

Test predictions: [510, 610]
Actual test values: [500, 600]
Predicted 9 AM Wednesday Samosa sales: 630

Priya’s model learns—₹630 for Wednesday! Day 15: Data Odyssey codes this.

How It Learns

Linear Regression fits a line: Sales = a × Hour_Num + b × Item_Code + c × Day_Monday + d × Day_Tuesday + e. It tweaks a, b, c, d, e to match her data—say, “hour up, sales up.” No math today—just trust it works. Day 15: Data Odyssey simplifies this.

Checking It

Accuracy? Compare predictions to real:

Test: ₹510 vs. ₹500, ₹610 vs. ₹600—close!
Error: from sklearn.metrics import mean_absolute_error.

error = mean_absolute_error(y_test, predictions)
print("Mean error:", error)

Output: Mean error: 10—₹10 off, decent for 6 rows. Day 15: Data Odyssey measures this.

Why This Works

Small data (6 rows) limits precision, but:

Patterns: 9 AM, Samosa = high sales.
Features: Hour, item, day guide it.
Simplicity: Linear Regression fits her trends.

More rows (Day 12’s 35) sharpen it—Priya’s growing! Day 15: Data Odyssey starts small.

Real-World ML

India’s weather models predict rain—ML on past data. Amazon forecasts sales—ML on purchases. Priya’s ₹630 guess is her café’s ML debut—same roots. Day 15: Data Odyssey connects her.

Challenges

ML stumbles:

Tiny Data: 6 rows—overfits or guesses poorly.
Features: Miss weather, lose accuracy.
Errors: X_tain typo—fix to X_train.

Priya’s prediction wobbles—more data fixes it. Day 15: Data Odyssey learns this.

Why This Matters

Priya’s model predicts ₹630 for 9 AM—stock 40 samosas, not 50, saving ₹200 waste. Without it, she averages (₹400); with it, she plans—profit rises. Scale it: ML predicts India’s traffic—roads clear. Day 15: Data Odyssey makes her predictive.

Recap Summary

Yesterday, Day 14: Data Odyssey introduced ML—learning from Priya’s preprocessed data to predict sales. Today, Day 15: Data Odyssey built her first model—Linear Regression with Scikit-Learn, guessing ₹630 for Wednesday’s 9 AM Samosa. It’s her ML start.

What’s Next

Tomorrow, in Day 16: Data Odyssey – How Do We Evaluate ML Models?, we’ll evaluate Priya’s model: How good is ₹630? What’s accuracy mean? We’ll test it deeper with Scikit-Learn, refining her predictions. Bring your curiosity, and I’ll see you there!

Author

Vinay Karanam

Author

Leave a Reply Cancel reply

Recent Posts

Authors

Authors List

A

B

C

D

E

G

H

I

K

L

M

N

P

R

S

T

V

W