Chapter 54
AI মনিটরিং
Monitoring AI Systems
📡 Model deploy করেই শেষ না
Production-এ data drift, accuracy decay, latency spike, hallucination — সব চুপচাপ ঘটে। Monitoring ছাড়া আপনি কখনোই জানবেন না কখন model fail করল।
চার ধরনের Metric
- System: CPU, GPU, RAM, QPS, latency p50/p95/p99।
- Model: accuracy, F1, RMSE (ground-truth এলে)।
- Data: input distribution, missing values, drift score।
- Business: conversion, click-through, revenue per request।
Data Drift Detection
- Covariate drift: P(X) বদলে গেছে।
- Concept drift: P(Y|X) বদলে গেছে।
- Tests: KS-test, PSI (Population Stability Index), Wasserstein।
# Evidently দিয়ে drift report
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=train_df, current_data=prod_df)
report.save_html("drift.html")Tooling Stack
- Metrics: Prometheus + Grafana।
- Logs: Loki / ELK / CloudWatch।
- Traces: OpenTelemetry + Jaeger।
- ML-specific: Evidently, Arize, WhyLabs, Fiddler।
- LLM Observability: LangSmith, Langfuse, Helicone।
- Errors: Sentry।
FastAPI + Prometheus
from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app)
# Custom metric
from prometheus_client import Histogram
INFER_TIME = Histogram("inference_seconds", "Model inference latency")
@INFER_TIME.time()
def run(x): return model(x)Alerts যা সবসময় থাকা দরকার
- p99 latency > SLA।
- Error rate > 1%।
- GPU OOM / restart loop।
- Drift score threshold cross।
- Token/cost spike (LLM)।
Feedback Loop
Production prediction + actual outcome → store → periodic retraining pipeline (Airflow/Prefect)। এটাই continuous learning-এর ভিত্তি।
LLM-specific Monitoring
- Hallucination rate (LLM-as-judge / RAGAS)।
- Token usage, cost per request।
- Prompt + response logging (PII redact করে)।
- User feedback (thumbs up/down)।
💡 Shadow Deployment
নতুন model production traffic-এর copy পায়, কিন্তু response user-কে যায় না। Old vs new compare করে safe rollout।
সারসংক্ষেপ
✨ এই অধ্যায়ে যা শিখলাম
- System + Model + Data + Business — চার metric।
- Drift detection (PSI, KS-test, Evidently)।
- Prometheus, Grafana, LangSmith — modern stack।
- Alerts, feedback loop, shadow deploy।