Chapter 50
AI-এর জন্য FastAPI
FastAPI for AI
⚡ AI-এর জন্য FastAPI কেন
FastAPI = async + Pydantic validation + auto OpenAPI docs। ML model serve করতে Flask-এর চেয়ে দ্রুত, type-safe, এবং production-ready।
Minimal Inference API
# main.py
from fastapi import FastAPI
from pydantic import BaseModel, Field
import torch, joblib
app = FastAPI(title="Iris Classifier")
model = joblib.load("model.pkl")
class Features(BaseModel):
sepal_length: float = Field(gt=0, lt=10)
sepal_width: float = Field(gt=0, lt=10)
petal_length: float = Field(gt=0, lt=10)
petal_width: float = Field(gt=0, lt=10)
@app.get("/health")
def health(): return {"status": "ok"}
@app.post("/predict")
def predict(x: Features):
pred = model.predict([[x.sepal_length, x.sepal_width,
x.petal_length, x.petal_width]])
return {"class": int(pred[0])}uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4Lifespan — মডেল একবারই load
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.model = torch.load("model.pt", map_location="cuda")
app.state.model.eval()
yield
# cleanup
app = FastAPI(lifespan=lifespan)Async + Batching
একই সময়ে আসা request-গুলো একসাথে batch করে GPU utilization বাড়ান।
# pseudo: micro-batching with asyncio.Queue
queue = asyncio.Queue()
async def worker():
while True:
batch = []
item = await queue.get(); batch.append(item)
try:
while len(batch) < 32:
batch.append(queue.get_nowait())
except asyncio.QueueEmpty: pass
# run model on batch, resolve futuresStreaming Response (LLM)
from fastapi.responses import StreamingResponse
@app.post("/chat")
def chat(prompt: str):
def gen():
for token in llm.stream(prompt):
yield token
return StreamingResponse(gen(), media_type="text/plain")Production Essentials
- CORS middleware।
- Auth (API key / JWT)।
- Rate limiting (slowapi)।
- Request ID + structured logging।
- Prometheus metrics (
prometheus-fastapi-instrumentator)। - Gunicorn + uvicorn workers।
💡 GPU দিয়ে workers
GPU model-এ
--workers 1 রাখুন (প্রতি worker আলাদা VRAM)। Throughput বাড়াতে batching ব্যবহার করুন, worker count নয়।সারসংক্ষেপ
✨ এই অধ্যায়ে যা শিখলাম
- FastAPI দিয়ে দ্রুত, type-safe ML API।
- Lifespan, async batching, streaming response।
- Production checklist — auth, metrics, rate limit।