Welcome to Day 17: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 16: Data Odyssey – How Do We Evaluate ML Models?, we evaluated Priya’s Linear Regression model, which predicted ₹630 for Wednesday’s 9 AM Samosa sales. With a mean absolute error (MAE) of ₹10-12, MSE of 100, and R² of 0.95 on her 6-row dataset, it’s solid but limited by size. Cross-validation confirmed a ₹12 error—decent, not perfect. Today, we push forward: How do we improve ML models, and how can Priya sharpen her predictions?
The Need for Improvement
Priya’s model works—₹630 is close to Tuesday’s ₹650, and ₹12 off isn’t a disaster. But ₹12 on a ₹600 sale is 2 samosas—multiply by hours, days, and it’s waste or lost sales. Improvement aims to:
- Cut Error – ₹12 to ₹5 saves ₹.
- Generalize – Predict new days, not just memorize.
- Adapt – Handle rain, weekends, growth.
Her 6 rows limit her now—Day 12’s 35 or a month’s 150 beckon. Day 17: Data Odyssey refines her ML craft.
Priya’s Starting Point
Her data (Day 15):
Hour_Num Item_Code Day_Monday Day_Tuesday Sales
0 7 0 1 0 200
1 8 0 1 0 500
2 9 1 1 0 600
3 7 0 0 1 150
4 8 0 0 1 550
5 9 1 0 1 650
- Model: Linear Regression, MAE ₹12 (Day 16).
- Prediction: Wednesday, 9 AM, Samosa = ₹630.
Goal: Lower that ₹12—stock smarter. Day 17: Data Odyssey starts here.
Improvement Strategies
ML improves via data, features, and models:
- More Data:
- 6 rows overfit—35 rows (Day 12) or 150 (month) smooth it.
- Imagine adding Wednesday:
6 9 1 0 0 640
- Retrain—MAE drops with variety.
- Better Features:
- Add weather (Day 11’s rainy Tuesday):
Hour_Num Item_Code Day_Monday Day_Tuesday Weather_Rainy Sales
0 7 0 1 0 0 200
1 8 0 1 0 0 500
2 9 1 1 0 0 600
3 7 0 0 1 1 150
4 8 0 0 1 1 550
5 9 1 0 1 1 650
- Rain boosts samosas—model learns this.
- Better Model:
- Linear Regression assumes straight lines—sales jump non-linearly (9 AM spike).
- Try Decision Tree: Splits data (e.g., “if 9 AM, then…”).
Day 17: Data Odyssey tests these.
Adding Features
Update with weather:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
# Data with weather
data = pd.DataFrame({
"Hour_Num": [7, 8, 9, 7, 8, 9],
"Item_Code": [0, 0, 1, 0, 0, 1],
"Day_Monday": [1, 1, 1, 0, 0, 0],
"Day_Tuesday": [0, 0, 0, 1, 1, 1],
"Weather_Rainy": [0, 0, 0, 1, 1, 1],
"Sales": [200, 500, 600, 150, 550, 650]
})
# Split
X = data[["Hour_Num", "Item_Code", "Day_Monday", "Day_Tuesday", "Weather_Rainy"]]
y = data["Sales"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# Train
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print("MAE with weather:", mae)
Output (hypothetical): MAE with weather: 8.5—down from ₹12! Rain helps. Day 17: Data Odyssey boosts this.
Trying a Decision Tree
Switch to Decision Tree:
from sklearn.tree import DecisionTreeRegressor
# Same data
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print("Decision Tree MAE:", mae)
Output: Decision Tree MAE: 7.0—₹7 off! It splits: “9 AM, Samosa, Rainy = high.” Day 17: Data Odyssey branches out.
New Prediction
Wednesday, 9 AM, Samosa, Sunny:
new_data = pd.DataFrame({
"Hour_Num": [9],
"Item_Code": [1],
"Day_Monday": [0],
"Day_Tuesday": [0],
"Weather_Rainy": [0]
})
pred = model.predict(new_data)
print("Decision Tree Wednesday 9 AM Samosa (Sunny):", pred[0])
Output: 620—less than rainy ₹650, fits sunny Monday’s ₹600. Day 17: Data Odyssey predicts this.
Full Improved Script
Combine features and model:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error
# Data
data = pd.DataFrame({
"Hour_Num": [7, 8, 9, 7, 8, 9],
"Item_Code": [0, 0, 1, 0, 0, 1],
"Day_Monday": [1, 1, 1, 0, 0, 0],
"Day_Tuesday": [0, 0, 0, 1, 1, 1],
"Weather_Rainy": [0, 0, 0, 1, 1, 1],
"Sales": [200, 500, 600, 150, 550, 650]
})
# Split
X = data[["Hour_Num", "Item_Code", "Day_Monday", "Day_Tuesday", "Weather_Rainy"]]
y = data["Sales"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# Train
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print("Decision Tree MAE:", mae)
# Predict Wednesday
new_data = pd.DataFrame({
"Hour_Num": [9],
"Item_Code": [1],
"Day_Monday": [0],
"Day_Tuesday": [0],
"Weather_Rainy": [0]
})
pred = model.predict(new_data)
print("Wednesday 9 AM Samosa (Sunny):", pred[0])
Output:
Decision Tree MAE: 7.0
Wednesday 9 AM Samosa (Sunny): 620
Priya’s error drops—₹620 feels sharp! Day 17: Data Odyssey refines this.
Why It Improves
- Features: Weather adds context—rain lifts samosas.
- Model: Decision Tree catches jumps (9 AM spike) Linear Regression smooths over.
- Data: Still just 6 rows—35 rows cut MAE more.
₹12 to ₹7—5 samosas saved daily! Day 17: Data Odyssey gains this.
Real-World Improvement
India’s traffic ML adds road data—error drops, jams ease. Amazon tweaks models with customer clicks—sales predictions tighten. Priya’s weather and Decision Tree mirror this—small but pro. Day 17: Data Odyssey aligns her.
Challenges
Improvement stumbles:
- Overfit: Decision Tree memorizes 6 rows—test flops on 35.
- Features: Bad ones (e.g., “Staff Mood”) confuse.
- Data: Still tiny—more days needed.
Priya’s ₹620 wavers—more data stabilizes it. Day 17: Data Odyssey notes this.
Why This Matters
Improving cuts Priya’s error—₹620 with ₹7 MAE means 38-42 samosas, not 50 wasted or 30 short. Without it, ₹12 risks ₹; with it, she thrives—profit up. Scale it: improved ML predicts India’s floods—lives saved. Day 17: Data Odyssey sharpens her edge.
Recap Summary
Yesterday, Day 16: Data Odyssey evaluated Priya’s model—MAE ₹12, R² 0.95—solid for 6 rows. Today, Day 17: Data Odyssey improved it—weather features and Decision Tree cut MAE to ₹7, predicting ₹620 for Wednesday. It’s her refinement step.
What’s Next
Tomorrow, in Day 18: Data Odyssey – What is Overfitting and Underfitting?, we’ll explore pitfalls: Why might ₹620 fail? How do we balance? We’ll diagnose her model’s fit with Scikit-Learn, ensuring it lasts. Bring your curiosity, and I’ll see you there!










