Welcome to Day 26: Data Odyssey, our 365-day journey to master data science and artificial intelligence (AI), launched on Shivaratri, February 26, 2025! Yesterday, in Day 25: Data Odyssey – How Do We Forecast with Time Series?, we forecasted Priya’s Thursday 9 AM sales using her 7-row time series. Moving average gave ₹630, exponential smoothing ₹632.5, and trend extension ₹650—settling near ₹632.5 to stock 41 samosas, aligning with her Random Forest’s ₹642 (Day 23). Today, we shift focus: What is anomaly detection, and why did Tuesday’s 7 AM sales drop to ₹150?
Spotting the Unusual
Anomaly detection finds oddities in data—points that don’t fit patterns. Day 24’s time series showed Priya’s 9 AM peak (₹630 avg) and 7 AM low (₹175 avg), but Tuesday’s ₹150 at 7 AM stands out against Monday’s ₹200. It’s “analyze” in our workflow (Day 1), flagging outliers that skew forecasts (Day 25) or signal issues—rain, a late start?
Think of it as Priya checking her café’s pulse. Most hours hum—₹500-650 at 9 AM—but ₹150 jars. Is it noise or a clue? Day 26: Data Odyssey hunts this.
Why Anomaly Detection Matters
Priya’s forecasts (₹632.5) and models (₹642, MAE ₹4) assume consistency. Anomalies like ₹150:
- Skew: Pull averages down—7 AM ₹175 vs. ₹200 expected.
- Signal: Rainy Tuesday (Day 21’s feature)—fewer chai sales?
- Fix: Clean data or adjust stock—15 chais, not 20.
Her 7 rows hide few oddities—Day 12’s 35 rows reveal more. Day 26: Data Odyssey spots this.
Priya’s Time Series Recap
Her data (Day 24):
Sales
2025-03-03 07:00:00 200
2025-03-03 08:00:00 500
2025-03-03 09:00:00 600
2025-03-04 07:00:00 150
2025-03-04 08:00:00 550
2025-03-04 09:00:00 650
2025-05-03 09:00:00 640
- Pattern: 7 AM low (₹150-200), 8-9 AM high (₹500-650).
- Oddity: Tuesday 7 AM ₹150—below Monday’s ₹200, hourly avg ₹175.
Goal: Flag ₹150—why? Day 26: Data Odyssey starts here.
Anomaly Detection Methods
Simple tricks for 7 rows:
- Threshold:
- Mean ± standard deviation—outside is odd.
- Rolling Statistics:
- Compare to moving average—big deviations flag.
- Isolation Forest:
- ML isolates outliers—scalable later.
Her sparse data suits basics—35 rows (Day 12) unlock ML. Day 26: Data Odyssey tries these.
Threshold Method
Mean and std for all sales:
import pandas as pd
data = pd.DataFrame({
"Datetime": ["2025-03-03 07:00", "2025-03-03 08:00", "2025-03-03 09:00",
"2025-03-04 07:00", "2025-03-04 08:00", "2025-03-04 09:00",
"2025-03-05 09:00"],
"Sales": [200, 500, 600, 150, 550, 650, 640]
})
data["Datetime"] = pd.to_datetime(data["Datetime"])
data.set_index("Datetime", inplace=True)
mean = data["Sales"].mean() # ~470
std = data["Sales"].std() # ~208
lower = mean - 2 * std # ~54
upper = mean + 2 * std # ~886
data["Anomaly"] = (data["Sales"] < lower) | (data["Sales"] > upper)
print(data[["Sales", "Anomaly"]])
Output:
Sales Anomaly
2025-03-03 07:00:00 200 False
2025-03-03 08:00:00 500 False
2025-03-03 09:00:00 600 False
2025-03-04 07:00:00 150 False
2025-03-04 08:00:00 550 False
2025-03-04 09:00:00 650 False
2025-03-05 09:00:00 640 False
₹150 within 54-886—no flag. Too broad—mixes 7-9 AM. Day 26: Data Odyssey adjusts this.
Hourly Threshold
Group by hour:
hourly = data.groupby(data.index.hour)["Sales"].agg(["mean", "std"])
hourly["Lower"] = hourly["mean"] - 2 * hourly["std"]
hourly["Upper"] = hourly["mean"] + 2 * hourly["std"]
print(hourly)
data["Hour"] = data.index.hour
data = data.merge(hourly[["Lower", "Upper"]], left_on="Hour", right_index=True)
data["Anomaly"] = (data["Sales"] < data["Lower"]) | (data["Sales"] > data["Upper"])
print(data[["Sales", "Anomaly"]])
Output:
mean std Lower Upper
Hour
7 175.0 35.355339 104.3 245.7
8 525.0 35.355339 454.3 595.7
9 630.0 7.071068 615.9 644.1
Sales Anomaly
2025-03-03 07:00:00 200 False
2025-03-03 08:00:00 500 False
2025-03-03 09:00:00 600 True
2025-03-04 07:00:00 150 False
2025-03-04 08:00:00 550 False
2025-03-04 09:00:00 650 True
2025-03-05 09:00:00 640 False
- 7 AM: ₹150 vs. 104.3-245.7—okay.
- 9 AM: ₹600, ₹650 outside 615.9-644.1—odd!
₹150 isn’t low enough—9 AM flags high. Day 26: Data Odyssey narrows this.
Rolling Statistics
Moving average deviation:
rolling_mean = data["Sales"].rolling(window=3, min_periods=1).mean()
rolling_std = data["Sales"].rolling(window=3, min_periods=1).std()
data["Lower"] = rolling_mean - 2 * rolling_std
data["Upper"] = rolling_mean + 2 * rolling_std
data["Anomaly"] = (data["Sales"] < data["Lower"]) | (data["Sales"] > data["Upper"])
print(data[["Sales", "Anomaly"]])
Output:
Sales Anomaly
2025-03-03 07:00:00 200 False
2025-03-03 08:00:00 500 True
2025-03-03 09:00:00 600 False
2025-03-04 07:00:00 150 True
2025-03-04 08:00:00 550 False
2025-03-04 09:00:00 650 False
2025-03-05 09:00:00 640 False
- ₹500: Jumps from ₹200—odd.
- ₹150: Drops from ₹600—flagged!
₹150 caught—Tuesday’s dip! Day 26: Data Odyssey rolls this.
Plotting Anomalies
Visualize:
import matplotlib.pyplot as plt
plt.plot(data.index, data["Sales"], marker="o", color="teal", label="Sales")
plt.fill_between(data.index, data["Lower"], data["Upper"], color="gray", alpha=0.2, label="Normal Range")
plt.scatter(data[data["Anomaly"]].index, data[data["Anomaly"]]["Sales"], color="red", label="Anomaly")
plt.title("Priya’s Sales with Anomalies")
plt.xlabel("Date and Hour")
plt.ylabel("Sales (₹)")
plt.legend()
plt.show()
Red dots: ₹500 (jump), ₹150 (drop)—clear oddities! Day 26: Data Odyssey sees this.
Why ₹150?
Check features (Day 21):
- Tuesday 7 AM: Weather_Rainy = 1.
- Rain slows chai—₹150 vs. ₹200 sunny.
Anomaly signals rain—adjust stock! Day 26: Data Odyssey explains this.
Why Detect?
- Clean: ₹150 skews ₹632.5—remove or adjust.
- Insight: Rain dips 7 AM—15 chais, not 20.
- Scale: 35 rows (Day 12)—catch ₹5000 typos.
Priya’s ₹150—fix forecasts. Day 26: Data Odyssey flags this.
Real-World Anomalies
India’s power grid spots usage spikes—fixes outages. Amazon catches sales drops—restocks fast. Priya’s ₹150 is her café’s alert—small, critical. Day 26: Data Odyssey ties this.
Challenges
- Sparse: 7 rows—false flags (₹500).
- Threshold: 2*std—tweak to 1.5?
- Context: Rain explains—features key.
More data (35 rows) refines—Priya grows. Day 26: Data Odyssey notes this.
Why This Matters
Detecting ₹150—rainy dip—means 15 chais, not 20 wasted, refining ₹632.5. Without it, forecasts drift; with it, she adapts—profit up. Scale it: anomaly detection saves India’s grids—lives hold. Day 26: Data Odyssey guards her.
Recap Summary
Yesterday, Day 25: Data Odyssey forecasted Priya’s 9 AM—₹632.5 via ES. Today, Day 26: Data Odyssey detected anomalies—₹150 at 7 AM flagged, rain explained. It’s her alert step.
What’s Next
Tomorrow, in Day 27: Data Odyssey – What is Clustering?, we’ll group: How do Priya’s hours cluster? 7 AM vs. 9 AM? We’ll use her data to find patterns, no labels needed. Bring your curiosity, and I’ll see you there!










