Welcome to Day 27: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 26: Data Odyssey – What is Anomaly Detection?, we pinpointed oddities in Priya’s 7-row time series. Tuesday’s 7 AM ₹150 sales flagged as an anomaly—explained by rain (Day 21’s feature)—against her 9 AM ₹630 average and forecasts like ₹632.5 (Day 25). Today, we explore patterns: What is clustering, and how can Priya group her café’s hours like 7 AM vs. 9 AM?
Finding Natural Groups
Clustering is unsupervised machine learning (Day 14), grouping data by similarity without labels—unlike Priya’s “Busy”/“Slow” classifier (Day 19). It’s “analyze” in our workflow (Day 1), finding hidden structures: Are 7 AM hours alike? 8-9 AM a rush cluster? Unlike anomaly detection (Day 26), which flagged ₹150, clustering organizes—stock strategies emerge.
Think of it as Priya sorting her café’s vibe. 9 AM buzzes, 7 AM’s quiet—clustering spots these tribes naturally. Day 27: Data Odyssey groups this.
Why Clustering Matters
Priya’s models predict sales (₹642, Day 23) or “Busy” (95% cross-val, Day 22), but don’t group hours:
- Patterns: 8-9 AM rush vs. 7 AM lull—stock differently?
- Insights: Rainy 7 AMs cluster—adjust chai prep?
- Scale: Day 12’s 35 rows—group days, items.
Her 7 rows hint at 7 AM (₹150-200) vs. 9 AM (₹600-650)—clustering confirms. Day 27: Data Odyssey reveals this.
Priya’s Data Recap
Her time series (Day 26):
Sales Hour_Num Item_Code Weather_Rainy
2025-03-03 07:00:00 200 7 0 0
2025-03-03 08:00:00 500 8 0 0
2025-03-03 09:00:00 600 9 1 0
2025-03-04 07:00:00 150 7 0 1
2025-03-04 08:00:00 550 8 0 1
2025-03-04 09:00:00 650 9 1 1
2025-05-03 09:00:00 640 9 1 0
- Features: Sales, Hour_Num, Item_Code, Weather_Rainy.
- Patterns: 7 AM low, 8-9 AM high (Day 24).
Goal: Cluster hours—7 AM vs. 8-9 AM? Day 27: Data Odyssey starts here.
Clustering Methods
Try K-Means—simple, groups by distance:
- Assign k groups (e.g., 2: low, high sales).
- Minimize distance within groups.
7 rows suit small k—35 rows (Day 12) scale later. Day 27: Data Odyssey picks this.
K-Means Clustering
Use Sales, Hour_Num:
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Data
data = pd.DataFrame({
"Datetime": ["2025-03-03 07:00", "2025-03-03 08:00", "2025-03-03 09:00",
"2025-03-04 07:00", "2025-03-04 08:00", "2025-03-04 09:00",
"2025-03-05 09:00"],
"Sales": [200, 500, 600, 150, 550, 650, 640],
"Hour_Num": [7, 8, 9, 7, 8, 9, 9],
"Item_Code": [0, 0, 1, 0, 0, 1, 1],
"Weather_Rainy": [0, 0, 0, 1, 1, 1, 0]
})
data["Datetime"] = pd.to_datetime(data["Datetime"])
data.set_index("Datetime", inplace=True)
# Cluster
X = data[["Sales", "Hour_Num"]]
kmeans = KMeans(n_clusters=2, random_state=42)
data["Cluster"] = kmeans.fit_predict(X)
print(data[["Sales", "Hour_Num", "Cluster"]])
Output:
Sales Hour_Num Cluster
2025-03-03 07:00:00 200 7 0
2025-03-03 08:00:00 500 8 1
2025-03-03 09:00:00 600 9 1
2025-03-04 07:00:00 150 7 0
2025-03-04 08:00:00 550 8 1
2025-03-04 09:00:00 650 9 1
2025-03-05 09:00:00 640 9 1
- Cluster 0: 7 AM—₹150, ₹200 (low).
- Cluster 1: 8-9 AM—₹500-650 (high).
Rush vs. quiet—nailed! Day 27: Data Odyssey groups this.
Visualizing Clusters
Plot it:
plt.scatter(data["Hour_Num"], data["Sales"], c=data["Cluster"], cmap="viridis", label="Data")
plt.scatter(kmeans.cluster_centers_[:, 1], kmeans.cluster_centers_[:, 0], c="red", marker="x", s=200, label="Centers")
plt.xlabel("Hour")
plt.ylabel("Sales (₹)")
plt.title("Clusters of Priya’s Sales")
plt.legend()
plt.show()
- Yellow (0): 7 AM, low sales.
- Purple (1): 8-9 AM, high sales.
- Red X’s: Centers (~₹175 at 7, ~₹585 at 8.5).
Clear split—stock light at 7 AM, heavy at 8-9 AM. Day 27: Data Odyssey sees this.
Adding Features
Include Weather_Rainy, Item_Code:
X = data[["Sales", "Hour_Num", "Weather_Rainy", "Item_Code"]]
X = (X - X.mean()) / X.std() # Scale
kmeans = KMeans(n_clusters=3, random_state=42) # Try 3
data["Cluster"] = kmeans.fit_predict(X)
print(data[["Sales", "Hour_Num", "Weather_Rainy", "Item_Code", "Cluster"]])
Output:
Sales Hour_Num Weather_Rainy Item_Code Cluster
2025-03-03 07:00:00 200 7 0 0 0
2025-03-03 08:00:00 500 8 0 0 1
2025-03-03 09:00:00 600 9 0 1 2
2025-03-04 07:00:00 150 7 1 0 0
2025-03-04 08:00:00 550 8 1 0 1
2025-03-04 09:00:00 650 9 1 1 2
2025-03-05 09:00:00 640 9 0 1 2
- Cluster 0: 7 AM—low, chai, mixed weather.
- Cluster 1: 8 AM—mid, chai, mixed weather.
- Cluster 2: 9 AM—high, samosa, mixed weather.
9 AM samosas stand out—stock 40! Day 27: Data Odyssey refines this.
Evaluating Clusters
No labels—use silhouette score:
from sklearn.metrics import silhouette_score
score = silhouette_score(X, data["Cluster"])
print("Silhouette Score:", score)
Output: Silhouette Score: 0.65—0-1, higher’s better. 0.65—decent for 7 rows. Day 27: Data Odyssey scores this.
Why Cluster?
- Stock: Cluster 0 (7 AM)—15 chais; Cluster 2 (9 AM)—40 samosas.
- Plan: 8 AM (Cluster 1)—30 chais, prep for rush.
- Grow: 35 rows (Day 12)—cluster days, items.
Complements ₹632.5 forecast (Day 25)—group, then predict. Day 27: Data Odyssey organizes this.
Real-World Clustering
India’s retailers cluster stores—urban vs. rural stock. Amazon groups buyers—high vs. low spenders. Priya’s hours cluster—small café, big insight. Day 27: Data Odyssey aligns this.
Challenges
- Small Data: 7 rows—3 clusters stretch.
- Features: Sales, Hour dominate—balance others.
- K: 2 or 3? 35 rows clarify.
More data—Priya scales. Day 27: Data Odyssey flags this.
Why This Matters
Clustering splits Priya’s 7 AM (15 chais) from 9 AM (40 samosas)—stock fits, no waste. Without it, ₹632.5 averages; with it, she targets—profit up. Scale it: clustered traffic eases India’s jams—lives flow. Day 27: Data Odyssey groups her.
Recap Summary
Yesterday, Day 26: Data Odyssey flagged ₹150—rainy 7 AM anomaly. Today, Day 27: Data Odyssey clustered—7 AM low, 8-9 AM high, 9 AM samosas distinct. It’s her pattern step.
What’s Next
Tomorrow, in Day 28: Data Odyssey – How Do We Evaluate Clustering?, we’ll assess: Are Priya’s clusters solid? How tight is 9 AM? We’ll measure her groups, refining insights. Bring your curiosity, and I’ll see you there!










