বাংলা

শুরু করুন

Chapter 58

AI ইনফ্রাস্ট্রাকচার

AI Infrastructure

🏗️ AI চালানোর পেছনের পুরো stack

AI infrastructure = compute + storage + network + orchestration + MLOps tooling। ভালো infra ছাড়া best model-ও production-এ ব্যর্থ।

Compute Layer

GPU: NVIDIA A100/H100/B200 — training; L4/T4/L40 — inference।
TPU: Google — JAX/TF-এ excellent।
Inferentia/Trainium: AWS custom silicon।
CPU: light model, embedding — ARM Graviton cost-effective।

Storage Layer

Object store: S3, GCS — dataset, model artifact।
Parallel FS: Lustre, WekaIO — training-এ high throughput।
Local NVMe: dataloader-এর hot cache।
Vector DB: Pinecone, Weaviate, Qdrant, pgvector।
Feature Store: Feast, Tecton।
Warehouse: BigQuery, Snowflake, Databricks।

Network

NVLink/NVSwitch — intra-node GPU-GPU।
InfiniBand HDR/NDR — inter-node, 200–400 Gbps।
CDN — model artifact + static asset।

Orchestration — Kubernetes

# Inference pod (GPU)
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: llm
        image: ghcr.io/org/llm:1.4
        resources:
          limits: { nvidia.com/gpu: 1 }
        readinessProbe:
          httpGet: { path: /health, port: 8000 }

Key Add-ons

KServe / Seldon: model serving on K8s।
Kubeflow: ML workflows, training operator।
Argo Workflows: pipeline DAG।
Karpenter: intelligent autoscaler।
NVIDIA GPU Operator: driver + MIG + DCGM।

MLOps Tooling

Experiment: MLflow, Weights & Biases, Neptune।
Data versioning: DVC, LakeFS।
Pipeline: Airflow, Prefect, Dagster।
CI/CD: GitHub Actions, GitLab CI, Buildkite।
Secrets: Vault, AWS Secrets Manager।

Build vs Buy

Buy (Managed): SageMaker, Vertex AI, Modal, Replicate, Anyscale — fast, costly।
Build (Self-host): K8s + GPU node + open source stack — cheap at scale, ops-heavy।
Hybrid: training self-host (cheap), serving managed (uptime)।

Security & Governance

VPC isolation, private endpoint।
IAM least-privilege।
PII redaction in logs।
Model card + data lineage।
EU AI Act, SOC2, HIPAA compliance।

💡 Stack ছোট শুরু করুন

Day-1 stack: Postgres + S3 + 1 GPU VM + FastAPI + Docker + Grafana। Scale-এর সাথে Kafka, vector DB, K8s, Kubeflow যোগ হবে।

সারসংক্ষেপ

✨ এই অধ্যায়ে যা শিখলাম

Compute, storage, network, orchestration — চার pillar।
K8s + GPU Operator + KServe — modern serving infra।
MLOps tool — experiment থেকে production পর্যন্ত।
Build vs Buy — context-dependent trade-off।

পূর্ববর্তী

ডিস্ট্রিবিউটেড ট্রেনিং

পরবর্তী

বিগিনার প্রজেক্ট