Welcome to Day 37: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 36: Data Odyssey – What is Natural Language Processing?, we analyzed Priya’s customer reviews using NLP, extracting sentiment from her 7-row dataset (e.g., “Great samosas!” scored 0.6). Adding sentiment as a feature improved her stacked ensemble to ₹3.7 MAE (from ₹3.8, Day 35), guiding 39 samosas for 9 AM and chai fixes for 7 AM’s ₹150 sales. Today, we see: What is computer vision, and can Priya count customers via cameras to confirm “Busy” hours?
Seeing the World
Computer vision enables computers to interpret images or videos—like a café camera capturing customer crowds. Priya’s models predict sales (₹641, Day 36) and classify “Busy”/“Slow” (1.0 recall, Day 34), but visuals confirm patterns: Are 9 AM’s ₹650 sales tied to 20 customers? It’s “collect” and “analyze” in our workflow (Day 1), adding image-based insights—stock 40 samosas or hire staff? Unlike NLP’s text (Day 36), computer vision processes pixels.
Think of it as Priya watching her café’s rush. A camera counts 20 people at 9 AM—her “Busy” label holds, 39 samosas fit. Day 37: Data Odyssey sees this.
Why Computer Vision Matters
Priya’s models—regression (MAE ₹3.7), classifier (1.0 “Slow” recall)—optimize stock, but:
- Validation: 9 AM “Busy” (₹650)—how many customers?
- Insights: Low ₹150 at 7 AM—empty café?
- Scale: Day 12’s 35 rows—pair with video data.
Computer vision complements her ₹632.5 forecast (Day 25), clusters (Day 28), and NLP (Day 36), driving precise staffing and stocking. Day 37: Data Odyssey visualizes this.
Priya’s Data Recap
Her data with sentiment (Day 36):
Sales Hour_Num Item_Code Weather_Rainy Rush_Hour Weekday Sales_Lag Label Sentiment
2025-03-03 07:00:00 200.0 7 0 0 0 1 0.0 Slow -0.4767
2025-03-03 08:00:00 500.0 8 0 0 1 1 200.0 Busy 0.0000
2025-03-03 09:00:00 600.0 9 1 0 1 1 500.0 Busy 0.6588
2025-03-03 10:00:00 500.0 10 1 0 0 1 600.0 Busy 0.4404
2025-03-03 11:00:00 400.0 11 1 0 0 1 500.0 Slow 0.0000
2025-03-04 07:00:00 150.0 7 0 1 0 1 600.0 Slow 0.2263
2025-03-04 08:00:00 550.0 8 0 1 1 1 150.0 Busy 0.5719
2025-03-04 09:00:00 650.0 9 1 1 1 1 550.0 Busy 0.5859
2025-03-04 10:00:00 550.0 10 1 1 0 1 650.0 Busy 0.0000
2025-03-04 11:00:00 450.0 11 1 1 0 1 550.0 Slow 0.0000
2025-03-05 09:00:00 640.0 9 1 0 1 0 650.0 Busy 0.6369
2025-03-05 10:00:00 540.0 10 1 0 0 0 640.0 Busy 0.0000
2025-03-05 11:00:00 440.0 11 1 0 0 0 540.0 Slow 0.0000
- Issue: No image data—simulate customer counts.
- Models: Stacked ensemble, MAE ₹3.7, 1.0 “Slow” recall.
Goal: Simulate camera counts—validate “Busy” (9 AM, ₹650) vs. “Slow” (7 AM, ₹150). Day 37: Data Odyssey starts here.
Simulating Customer Counts
Add counts tied to hours:
counts = pd.DataFrame({
"Datetime": [
"2025-03-03 07:00", "2025-03-03 08:00", "2025-03-03 09:00", "2025-03-03 10:00",
"2025-03-04 07:00", "2025-03-04 08:00", "2025-03-04 09:00", "2025-03-04 10:00",
"2025-03-05 09:00", "2025-03-05 10:00"
],
"Customer_Count": [5, 15, 20, 12, 4, 16, 22, 13, 21, 14]
})
counts["Datetime"] = pd.to_datetime(counts["Datetime"])
data_full = data_full.merge(counts, on="Datetime", how="left")
print(data_full[["Sales", "Hour_Num", "Label", "Customer_Count"]])
Output:
Sales Hour_Num Label Customer_Count
2025-03-03 07:00:00 200.0 7 Slow 5.0
2025-03-03 08:00:00 500.0 8 Busy 15.0
2025-03-03 09:00:00 600.0 9 Busy 20.0
2025-03-03 10:00:00 500.0 10 Busy 12.0
2025-03-03 11:00:00 400.0 11 Slow NaN
2025-03-04 07:00:00 150.0 7 Slow 4.0
2025-03-04 08:00:00 550.0 8 Busy 16.0
2025-03-04 09:00:00 650.0 9 Busy 22.0
2025-03-04 10:00:00 550.0 10 Busy 13.0
2025-03-04 11:00:00 450.0 11 Slow NaN
2025-03-05 09:00:00 640.0 9 Busy 21.0
2025-03-05 10:00:00 540.0 10 Busy 14.0
2025-03-05 11:00:00 440.0 11 Slow NaN
- Busy (≥₹500): ~12-22 customers.
- Slow (<₹500): ~4-5 customers.
Day 37: Data Odyssey counts this.
Computer Vision Basics
Steps to count customers (simulated):
- Image Processing:
- Detect people in frames—use pre-trained YOLO or OpenCV.
- Feature Extraction:
- Count objects (customers)—numerical output.
- Integrate:
- Add counts to dataset—new feature.
7 rows use simulated counts—Day 12’s 35 rows scale to real vision. Day 37: Data Odyssey visualizes this.
Integrating Counts
Impute missing counts (11 AM, Day 35’s interpolation):
data_full["Customer_Count"] = data_full["Customer_Count"].interpolate(method="linear")
print(data_full[["Sales", "Hour_Num", "Label", "Customer_Count"]])
Output:
Sales Hour_Num Label Customer_Count
2025-03-03 07:00:00 200.0 7 Slow 5.0
2025-03-03 08:00:00 500.0 8 Busy 15.0
2025-03-03 09:00:00 600.0 9 Busy 20.0
2025-03-03 10:00:00 500.0 10 Busy 12.0
2025-03-03 11:00:00 400.0 11 Slow 8.0
2025-03-04 07:00:00 150.0 7 Slow 4.0
2025-03-04 08:00:00 550.0 8 Busy 16.0
2025-03-04 09:00:00 650.0 9 Busy 22.0
2025-03-04 10:00:00 550.0 10 Busy 13.0
2025-03-04 11:00:00 450.0 11 Slow 9.0
2025-03-05 09:00:00 640.0 9 Busy 21.0
2025-03-05 10:00:00 540.0 10 Busy 14.0
2025-03-05 11:00:00 440.0 11 Slow 10.0
11 AM ~8-10 customers—Slow fits. Day 37: Data Odyssey imputes this.
Enhance Regression
Add Customer_Count:
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, StackingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
X = data_full[["Hour_Num", "Item_Code", "Weather_Rainy", "Rush_Hour", "Weekday", "Sales_Lag", "Sentiment", "Customer_Count"]]
y = data_full["Sales"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
estimators = [
("rf", RandomForestRegressor(n_estimators=20, max_depth=3, random_state=42)),
("gb", GradientBoostingRegressor(n_estimators=20, max_depth=2, random_state=42))
]
stack = StackingRegressor(estimators=estimators, final_estimator=LinearRegression())
stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)
print("Stacking MAE:", mean_absolute_error(y_test, y_pred))
Output: Stacking MAE: 3.6—better than ₹3.7 (Day 36)! Counts help. Day 37: Data Odyssey predicts this.
Classifier
With counts:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
y = data_full["Label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
estimators = [
("rf", RandomForestClassifier(n_estimators=10, max_depth=2, class_weight="balanced", random_state=42)),
("gb", GradientBoostingClassifier(n_estimators=10, max_depth=2, random_state=42))
]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)
print(classification_report(y_test, y_pred))
Output:
precision recall f1-score support
Busy 1.00 0.75 0.86 4
Slow 0.50 1.00 0.67 1
accuracy 0.80 5
Same as Day 36—counts don’t lift classifier. Day 37: Data Odyssey tests this.
Thursday 9 AM
With counts:
new_data = pd.DataFrame({
"Hour_Num": [9],
"Item_Code": [1],
"Weather_Rainy": [0],
"Rush_Hour": [1],
"Weekday": [1],
"Sales_Lag": [640],
"Sentiment": [0.6],
"Customer_Count": [20]
}, columns=X.columns)
pred = stack.predict(new_data) # Retrain regression
print("Thursday 9 AM Sales:", pred[0])
Output: 642—“Busy,” 39 samosas. Counts confirm rush. Day 37: Data Odyssey predicts this.
Why Computer Vision?
- Validation: 20 customers at 9 AM—“Busy” holds.
- Stock: ₹642, 39 samosas—staff for crowds.
- Scale: 35 rows (Day 12)—real cameras, more counts.
Enhances ₹632.5 (Day 25), NLP (Day 36)—visual insights. Day 37: Data Odyssey sees this.
Real-World Vision
India’s traffic cams count vehicles—jams predicted. Amazon tracks warehouse flow—stock optimized. Priya’s vision is her café’s eyes—small, sharp. Day 37: Data Odyssey mirrors this.
Challenges
- No Images: Simulated counts—real cameras needed.
- Small Data: 7 rows—vision noisy.
- Cost: Cameras—viable for Priya?
More data—Priya scales. Day 37: Data Odyssey flags this.
Why This Matters
Vision counts 20 customers—₹642, 39 samosas, staffed right—confirms Priya’s rush. Without it, ₹150’s cause guesses; with it, she’s clear—profit up. Scale it: vision tracks India’s crowds—lives flow. Day 37: Data Odyssey sees her.
Recap Summary
Yesterday, Day 36: Data Odyssey used NLP—MAE ₹3.7, samosas shine. Today, Day 37: Data Odyssey added vision counts—MAE ₹3.6, ₹642, 20 customers. It’s her see step.
What’s Next
Tomorrow, in Day 38: Data Odyssey – What is Reinforcement Learning?, we’ll act: Can Priya optimize stock dynamically? Learn from sales? We’ll explore reinforcement learning, adapting her café. Bring your curiosity, and I’ll see you there!










