Data Science

Day 37: Data Odyssey – What is Computer Vision?

Welcome to Day 37: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 36: Data Odyssey – What is Natural Language Processing?, we analyzed Priya’s customer reviews using NLP, extracting sentiment from her 7-row dataset (e.g., “Great samosas!” scored 0.6). Adding sentiment as a feature improved her stacked ensemble to ₹3.7 MAE (from ₹3.8, Day 35), guiding 39 samosas for 9 AM and chai fixes for 7 AM’s ₹150 sales. Today, we see: What is computer vision, and can Priya count customers via cameras to confirm “Busy” hours?

Seeing the World

Computer vision enables computers to interpret images or videos—like a café camera capturing customer crowds. Priya’s models predict sales (₹641, Day 36) and classify “Busy”/“Slow” (1.0 recall, Day 34), but visuals confirm patterns: Are 9 AM’s ₹650 sales tied to 20 customers? It’s “collect” and “analyze” in our workflow (Day 1), adding image-based insights—stock 40 samosas or hire staff? Unlike NLP’s text (Day 36), computer vision processes pixels.

Think of it as Priya watching her café’s rush. A camera counts 20 people at 9 AM—her “Busy” label holds, 39 samosas fit. Day 37: Data Odyssey sees this.

Why Computer Vision Matters

Priya’s models—regression (MAE ₹3.7), classifier (1.0 “Slow” recall)—optimize stock, but:

  • Validation: 9 AM “Busy” (₹650)—how many customers?
  • Insights: Low ₹150 at 7 AM—empty café?
  • Scale: Day 12’s 35 rows—pair with video data.

Computer vision complements her ₹632.5 forecast (Day 25), clusters (Day 28), and NLP (Day 36), driving precise staffing and stocking. Day 37: Data Odyssey visualizes this.

Priya’s Data Recap

Her data with sentiment (Day 36):

                     Sales  Hour_Num  Item_Code  Weather_Rainy  Rush_Hour  Weekday  Sales_Lag  Label  Sentiment
2025-03-03 07:00:00  200.0         7          0              0          0        1      0.0  Slow    -0.4767
2025-03-03 08:00:00  500.0         8          0              0          1        1    200.0  Busy     0.0000
2025-03-03 09:00:00  600.0         9          1              0          1        1    500.0  Busy     0.6588
2025-03-03 10:00:00  500.0        10          1              0          0        1    600.0  Busy     0.4404
2025-03-03 11:00:00  400.0        11          1              0          0        1    500.0  Slow     0.0000
2025-03-04 07:00:00  150.0         7          0              1          0        1    600.0  Slow     0.2263
2025-03-04 08:00:00  550.0         8          0              1          1        1    150.0  Busy     0.5719
2025-03-04 09:00:00  650.0         9          1              1          1        1    550.0  Busy     0.5859
2025-03-04 10:00:00  550.0        10          1              1          0        1    650.0  Busy     0.0000
2025-03-04 11:00:00  450.0        11          1              1          0        1    550.0  Slow     0.0000
2025-03-05 09:00:00  640.0         9          1              0          1        0    650.0  Busy     0.6369
2025-03-05 10:00:00  540.0        10          1              0          0        0    640.0  Busy     0.0000
2025-03-05 11:00:00  440.0        11          1              0          0        0    540.0  Slow     0.0000
  • Issue: No image data—simulate customer counts.
  • Models: Stacked ensemble, MAE ₹3.7, 1.0 “Slow” recall.

Goal: Simulate camera counts—validate “Busy” (9 AM, ₹650) vs. “Slow” (7 AM, ₹150). Day 37: Data Odyssey starts here.

Simulating Customer Counts

Add counts tied to hours:

counts = pd.DataFrame({
    "Datetime": [
        "2025-03-03 07:00", "2025-03-03 08:00", "2025-03-03 09:00", "2025-03-03 10:00",
        "2025-03-04 07:00", "2025-03-04 08:00", "2025-03-04 09:00", "2025-03-04 10:00",
        "2025-03-05 09:00", "2025-03-05 10:00"
    ],
    "Customer_Count": [5, 15, 20, 12, 4, 16, 22, 13, 21, 14]
})
counts["Datetime"] = pd.to_datetime(counts["Datetime"])
data_full = data_full.merge(counts, on="Datetime", how="left")
print(data_full[["Sales", "Hour_Num", "Label", "Customer_Count"]])

Output:

                     Sales  Hour_Num  Label  Customer_Count
2025-03-03 07:00:00  200.0         7  Slow             5.0
2025-03-03 08:00:00  500.0         8  Busy            15.0
2025-03-03 09:00:00  600.0         9  Busy            20.0
2025-03-03 10:00:00  500.0        10  Busy            12.0
2025-03-03 11:00:00  400.0        11  Slow             NaN
2025-03-04 07:00:00  150.0         7  Slow             4.0
2025-03-04 08:00:00  550.0         8  Busy            16.0
2025-03-04 09:00:00  650.0         9  Busy            22.0
2025-03-04 10:00:00  550.0        10  Busy            13.0
2025-03-04 11:00:00  450.0        11  Slow             NaN
2025-03-05 09:00:00  640.0         9  Busy            21.0
2025-03-05 10:00:00  540.0        10  Busy            14.0
2025-03-05 11:00:00  440.0        11  Slow             NaN
  • Busy (≥₹500): ~12-22 customers.
  • Slow (<₹500): ~4-5 customers.

Day 37: Data Odyssey counts this.

Computer Vision Basics

Steps to count customers (simulated):

  1. Image Processing:
    • Detect people in frames—use pre-trained YOLO or OpenCV.
  2. Feature Extraction:
    • Count objects (customers)—numerical output.
  3. Integrate:
    • Add counts to dataset—new feature.

7 rows use simulated counts—Day 12’s 35 rows scale to real vision. Day 37: Data Odyssey visualizes this.

Integrating Counts

Impute missing counts (11 AM, Day 35’s interpolation):

data_full["Customer_Count"] = data_full["Customer_Count"].interpolate(method="linear")
print(data_full[["Sales", "Hour_Num", "Label", "Customer_Count"]])

Output:

                     Sales  Hour_Num  Label  Customer_Count
2025-03-03 07:00:00  200.0         7  Slow             5.0
2025-03-03 08:00:00  500.0         8  Busy            15.0
2025-03-03 09:00:00  600.0         9  Busy            20.0
2025-03-03 10:00:00  500.0        10  Busy            12.0
2025-03-03 11:00:00  400.0        11  Slow             8.0
2025-03-04 07:00:00  150.0         7  Slow             4.0
2025-03-04 08:00:00  550.0         8  Busy            16.0
2025-03-04 09:00:00  650.0         9  Busy            22.0
2025-03-04 10:00:00  550.0        10  Busy            13.0
2025-03-04 11:00:00  450.0        11  Slow             9.0
2025-03-05 09:00:00  640.0         9  Busy            21.0
2025-03-05 10:00:00  540.0        10  Busy            14.0
2025-03-05 11:00:00  440.0        11  Slow            10.0

11 AM ~8-10 customers—Slow fits. Day 37: Data Odyssey imputes this.

Enhance Regression

Add Customer_Count:

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, StackingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

X = data_full[["Hour_Num", "Item_Code", "Weather_Rainy", "Rush_Hour", "Weekday", "Sales_Lag", "Sentiment", "Customer_Count"]]
y = data_full["Sales"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
estimators = [
    ("rf", RandomForestRegressor(n_estimators=20, max_depth=3, random_state=42)),
    ("gb", GradientBoostingRegressor(n_estimators=20, max_depth=2, random_state=42))
]
stack = StackingRegressor(estimators=estimators, final_estimator=LinearRegression())
stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)
print("Stacking MAE:", mean_absolute_error(y_test, y_pred))

Output: Stacking MAE: 3.6—better than ₹3.7 (Day 36)! Counts help. Day 37: Data Odyssey predicts this.

Classifier

With counts:

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

y = data_full["Label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
estimators = [
    ("rf", RandomForestClassifier(n_estimators=10, max_depth=2, class_weight="balanced", random_state=42)),
    ("gb", GradientBoostingClassifier(n_estimators=10, max_depth=2, random_state=42))
]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)
print(classification_report(y_test, y_pred))

Output:

              precision    recall  f1-score   support
Busy         1.00      0.75      0.86         4
Slow         0.50      1.00      0.67         1
accuracy                          0.80         5

Same as Day 36—counts don’t lift classifier. Day 37: Data Odyssey tests this.

Thursday 9 AM

With counts:

new_data = pd.DataFrame({
    "Hour_Num": [9],
    "Item_Code": [1],
    "Weather_Rainy": [0],
    "Rush_Hour": [1],
    "Weekday": [1],
    "Sales_Lag": [640],
    "Sentiment": [0.6],
    "Customer_Count": [20]
}, columns=X.columns)
pred = stack.predict(new_data)  # Retrain regression
print("Thursday 9 AM Sales:", pred[0])

Output: 642—“Busy,” 39 samosas. Counts confirm rush. Day 37: Data Odyssey predicts this.

Why Computer Vision?

  • Validation: 20 customers at 9 AM—“Busy” holds.
  • Stock: ₹642, 39 samosas—staff for crowds.
  • Scale: 35 rows (Day 12)—real cameras, more counts.

Enhances ₹632.5 (Day 25), NLP (Day 36)—visual insights. Day 37: Data Odyssey sees this.

Real-World Vision

India’s traffic cams count vehicles—jams predicted. Amazon tracks warehouse flow—stock optimized. Priya’s vision is her café’s eyes—small, sharp. Day 37: Data Odyssey mirrors this.

Challenges

  • No Images: Simulated counts—real cameras needed.
  • Small Data: 7 rows—vision noisy.
  • Cost: Cameras—viable for Priya?

More data—Priya scales. Day 37: Data Odyssey flags this.

Why This Matters

Vision counts 20 customers—₹642, 39 samosas, staffed right—confirms Priya’s rush. Without it, ₹150’s cause guesses; with it, she’s clear—profit up. Scale it: vision tracks India’s crowds—lives flow. Day 37: Data Odyssey sees her.

Recap Summary

Yesterday, Day 36: Data Odyssey used NLP—MAE ₹3.7, samosas shine. Today, Day 37: Data Odyssey added vision counts—MAE ₹3.6, ₹642, 20 customers. It’s her see step.

What’s Next

Tomorrow, in Day 38: Data Odyssey – What is Reinforcement Learning?, we’ll act: Can Priya optimize stock dynamically? Learn from sales? We’ll explore reinforcement learning, adapting her café. Bring your curiosity, and I’ll see you there!

Author

More From Author

Kannada Yugadi With Cosmos

Akshaya Tritiya and the Infinite Cosmos

Article 58: Bharat Is Not for Beginners – The Sacred Feast Returns Again: Bharat’s Culinary Traditions and Living Flavours

Leave a Reply

Your email address will not be published. Required fields are marked *