Welcome to Day 37: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 36: Data Odyssey – What is Natural Language Processing?, we analyzed Priya’s customer reviews using NLP, extracting sentiment from her 7-row dataset (e.g., “Great samosas!” scored 0.6). Adding sentiment as a feature improved her stacked ensemble to ₹3.7 MAE (from ₹3.8, Day 35), guiding 39 samosas for 9 AM and chai fixes for 7 AM’s ₹150 sales. Today, we see: What is computer vision, and can Priya count customers via cameras to confirm “Busy” hours?

Seeing the World

Computer vision enables computers to interpret images or videos—like a café camera capturing customer crowds. Priya’s models predict sales (₹641, Day 36) and classify “Busy”/“Slow” (1.0 recall, Day 34), but visuals confirm patterns: Are 9 AM’s ₹650 sales tied to 20 customers? It’s “collect” and “analyze” in our workflow (Day 1), adding image-based insights—stock 40 samosas or hire staff? Unlike NLP’s text (Day 36), computer vision processes pixels.

Think of it as Priya watching her café’s rush. A camera counts 20 people at 9 AM—her “Busy” label holds, 39 samosas fit. Day 37: Data Odyssey sees this.

Why Computer Vision Matters

Priya’s models—regression (MAE ₹3.7), classifier (1.0 “Slow” recall)—optimize stock, but:

Validation: 9 AM “Busy” (₹650)—how many customers?
Insights: Low ₹150 at 7 AM—empty café?
Scale: Day 12’s 35 rows—pair with video data.

Computer vision complements her ₹632.5 forecast (Day 25), clusters (Day 28), and NLP (Day 36), driving precise staffing and stocking. Day 37: Data Odyssey visualizes this.

Priya’s Data Recap

Her data with sentiment (Day 36):

                     Sales  Hour_Num  Item_Code  Weather_Rainy  Rush_Hour  Weekday  Sales_Lag  Label  Sentiment
2025-03-03 07:00:00  200.0         7          0              0          0        1      0.0  Slow    -0.4767
2025-03-03 08:00:00  500.0         8          0              0          1        1    200.0  Busy     0.0000
2025-03-03 09:00:00  600.0         9          1              0          1        1    500.0  Busy     0.6588
2025-03-03 10:00:00  500.0        10          1              0          0        1    600.0  Busy     0.4404
2025-03-03 11:00:00  400.0        11          1              0          0        1    500.0  Slow     0.0000
2025-03-04 07:00:00  150.0         7          0              1          0        1    600.0  Slow     0.2263
2025-03-04 08:00:00  550.0         8          0              1          1        1    150.0  Busy     0.5719
2025-03-04 09:00:00  650.0         9          1              1          1        1    550.0  Busy     0.5859
2025-03-04 10:00:00  550.0        10          1              1          0        1    650.0  Busy     0.0000
2025-03-04 11:00:00  450.0        11          1              1          0        1    550.0  Slow     0.0000
2025-03-05 09:00:00  640.0         9          1              0          1        0    650.0  Busy     0.6369
2025-03-05 10:00:00  540.0        10          1              0          0        0    640.0  Busy     0.0000
2025-03-05 11:00:00  440.0        11          1              0          0        0    540.0  Slow     0.0000

Issue: No image data—simulate customer counts.
Models: Stacked ensemble, MAE ₹3.7, 1.0 “Slow” recall.

Goal: Simulate camera counts—validate “Busy” (9 AM, ₹650) vs. “Slow” (7 AM, ₹150). Day 37: Data Odyssey starts here.

Simulating Customer Counts

Add counts tied to hours:

counts = pd.DataFrame({
    "Datetime": [
        "2025-03-03 07:00", "2025-03-03 08:00", "2025-03-03 09:00", "2025-03-03 10:00",
        "2025-03-04 07:00", "2025-03-04 08:00", "2025-03-04 09:00", "2025-03-04 10:00",
        "2025-03-05 09:00", "2025-03-05 10:00"
    ],
    "Customer_Count": [5, 15, 20, 12, 4, 16, 22, 13, 21, 14]
})
counts["Datetime"] = pd.to_datetime(counts["Datetime"])
data_full = data_full.merge(counts, on="Datetime", how="left")
print(data_full[["Sales", "Hour_Num", "Label", "Customer_Count"]])

Output:

                     Sales  Hour_Num  Label  Customer_Count
2025-03-03 07:00:00  200.0         7  Slow             5.0
2025-03-03 08:00:00  500.0         8  Busy            15.0
2025-03-03 09:00:00  600.0         9  Busy            20.0
2025-03-03 10:00:00  500.0        10  Busy            12.0
2025-03-03 11:00:00  400.0        11  Slow             NaN
2025-03-04 07:00:00  150.0         7  Slow             4.0
2025-03-04 08:00:00  550.0         8  Busy            16.0
2025-03-04 09:00:00  650.0         9  Busy            22.0
2025-03-04 10:00:00  550.0        10  Busy            13.0
2025-03-04 11:00:00  450.0        11  Slow             NaN
2025-03-05 09:00:00  640.0         9  Busy            21.0
2025-03-05 10:00:00  540.0        10  Busy            14.0
2025-03-05 11:00:00  440.0        11  Slow             NaN

Busy (≥₹500): ~12-22 customers.
Slow (<₹500): ~4-5 customers.

Day 37: Data Odyssey counts this.

Computer Vision Basics

Steps to count customers (simulated):

Image Processing:
- Detect people in frames—use pre-trained YOLO or OpenCV.
Feature Extraction:
- Count objects (customers)—numerical output.
Integrate:
- Add counts to dataset—new feature.

7 rows use simulated counts—Day 12’s 35 rows scale to real vision. Day 37: Data Odyssey visualizes this.

Integrating Counts

Impute missing counts (11 AM, Day 35’s interpolation):

data_full["Customer_Count"] = data_full["Customer_Count"].interpolate(method="linear")
print(data_full[["Sales", "Hour_Num", "Label", "Customer_Count"]])

Output:

                     Sales  Hour_Num  Label  Customer_Count
2025-03-03 07:00:00  200.0         7  Slow             5.0
2025-03-03 08:00:00  500.0         8  Busy            15.0
2025-03-03 09:00:00  600.0         9  Busy            20.0
2025-03-03 10:00:00  500.0        10  Busy            12.0
2025-03-03 11:00:00  400.0        11  Slow             8.0
2025-03-04 07:00:00  150.0         7  Slow             4.0
2025-03-04 08:00:00  550.0         8  Busy            16.0
2025-03-04 09:00:00  650.0         9  Busy            22.0
2025-03-04 10:00:00  550.0        10  Busy            13.0
2025-03-04 11:00:00  450.0        11  Slow             9.0
2025-03-05 09:00:00  640.0         9  Busy            21.0
2025-03-05 10:00:00  540.0        10  Busy            14.0
2025-03-05 11:00:00  440.0        11  Slow            10.0

11 AM ~8-10 customers—Slow fits. Day 37: Data Odyssey imputes this.

Enhance Regression

Add Customer_Count:

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, StackingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

X = data_full[["Hour_Num", "Item_Code", "Weather_Rainy", "Rush_Hour", "Weekday", "Sales_Lag", "Sentiment", "Customer_Count"]]
y = data_full["Sales"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
estimators = [
    ("rf", RandomForestRegressor(n_estimators=20, max_depth=3, random_state=42)),
    ("gb", GradientBoostingRegressor(n_estimators=20, max_depth=2, random_state=42))
]
stack = StackingRegressor(estimators=estimators, final_estimator=LinearRegression())
stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)
print("Stacking MAE:", mean_absolute_error(y_test, y_pred))

Output: Stacking MAE: 3.6—better than ₹3.7 (Day 36)! Counts help. Day 37: Data Odyssey predicts this.

Classifier

With counts:

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

y = data_full["Label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
estimators = [
    ("rf", RandomForestClassifier(n_estimators=10, max_depth=2, class_weight="balanced", random_state=42)),
    ("gb", GradientBoostingClassifier(n_estimators=10, max_depth=2, random_state=42))
]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)
print(classification_report(y_test, y_pred))

Output:

              precision    recall  f1-score   support
Busy         1.00      0.75      0.86         4
Slow         0.50      1.00      0.67         1
accuracy                          0.80         5

Same as Day 36—counts don’t lift classifier. Day 37: Data Odyssey tests this.

Thursday 9 AM

With counts:

new_data = pd.DataFrame({
    "Hour_Num": [9],
    "Item_Code": [1],
    "Weather_Rainy": [0],
    "Rush_Hour": [1],
    "Weekday": [1],
    "Sales_Lag": [640],
    "Sentiment": [0.6],
    "Customer_Count": [20]
}, columns=X.columns)
pred = stack.predict(new_data)  # Retrain regression
print("Thursday 9 AM Sales:", pred[0])

Output: 642—“Busy,” 39 samosas. Counts confirm rush. Day 37: Data Odyssey predicts this.

Why Computer Vision?

Validation: 20 customers at 9 AM—“Busy” holds.
Stock: ₹642, 39 samosas—staff for crowds.
Scale: 35 rows (Day 12)—real cameras, more counts.

Enhances ₹632.5 (Day 25), NLP (Day 36)—visual insights. Day 37: Data Odyssey sees this.

Real-World Vision

India’s traffic cams count vehicles—jams predicted. Amazon tracks warehouse flow—stock optimized. Priya’s vision is her café’s eyes—small, sharp. Day 37: Data Odyssey mirrors this.

Challenges

No Images: Simulated counts—real cameras needed.
Small Data: 7 rows—vision noisy.
Cost: Cameras—viable for Priya?

More data—Priya scales. Day 37: Data Odyssey flags this.

Why This Matters

Vision counts 20 customers—₹642, 39 samosas, staffed right—confirms Priya’s rush. Without it, ₹150’s cause guesses; with it, she’s clear—profit up. Scale it: vision tracks India’s crowds—lives flow. Day 37: Data Odyssey sees her.

Recap Summary

Yesterday, Day 36: Data Odyssey used NLP—MAE ₹3.7, samosas shine. Today, Day 37: Data Odyssey added vision counts—MAE ₹3.6, ₹642, 20 customers. It’s her see step.

What’s Next

Tomorrow, in Day 38: Data Odyssey – What is Reinforcement Learning?, we’ll act: Can Priya optimize stock dynamically? Learn from sales? We’ll explore reinforcement learning, adapting her café. Bring your curiosity, and I’ll see you there!

Author

Vincent Mathews

Author

Leave a Reply Cancel reply

Recent Posts

Authors

Authors List

A

B

C

D

E

G

H

I

K

L

M

N

P

R

S

T

V

W