Chapter 64

SOTA মডেল

SOTA Models

🏆 SOTA মানে কী?

State-of-the-Art = একটি specific benchmark-এ বর্তমান best model। কিন্তু "SOTA" cherry-pick করা সহজ — তাই compute, data, এবং generalization context সহ দেখা জরুরি।

2025-এর SOTA Landscape (snapshot)

Task	Leading Model	Metric	Note
Language (general)	GPT-5 / Claude 4.5 / Gemini 2.5	MMLU ~92	Closed; open: Llama-4, Qwen-3
Reasoning	o3 / DeepSeek-R1	AIME 90+	RL-based long CoT
Code	Claude 4.5 Sonnet	SWE-bench ~70%	Agentic coding
Vision (cls)	EVA-02 / DINOv3	ImageNet 90+%	Self-supervised
Detection	DETR / YOLOv11	COCO mAP 60+	Real-time: YOLO
Segmentation	SAM 2 / Mask2Former	zero-shot	Promptable
Image-gen	Flux.1 / SD3.5 / Imagen 4	FID ↓	Open: Flux
Video-gen	Veo 3 / Sora 2 / Kling	—	Closed mostly
Speech	Whisper-v3 / SeamlessM4T	WER ↓	100+ languages
3D	Gaussian Splatting / TRELLIS	—	Real-time render
Multimodal	GPT-4o / Gemini 2.5 / Qwen2.5-VL	MMMU 70+	Image+text+audio

SOTA Track করার উপায়

Papers With Code — task-wise leaderboard।
Hugging Face Open LLM Leaderboard — open model।
LMSYS Chatbot Arena — human preference Elo।
MTEB — embedding model।
SWE-bench / WebArena — agent benchmark।
Twitter/X — @_akhaliq, @arankomatsuzaki, @abacaj।
Newsletter — Sebastian Raschka, Lilian Weng blog, Latent Space, Import AI।

SOTA Critique — যেভাবে evaluate করবেন

Benchmark contamination — test set কি training data-তে leak হয়েছে?
Compute — 1000× compute দিয়ে 1% improvement worth?
Generalization — multiple benchmark-এ vs একটা cherry-pick?
Cost — inference latency/token cost।
Reproducibility — code/weight release আছে?
Real-world — your domain-এ test করেছেন?

SOTA-এর পেছনে চলমান Trend

1. Scaling → Efficiency

2020-23 ছিল scaling era। 2024+ MoE, distillation, quantization, এবং small-but-mighty model (Phi, Gemma, Qwen) — efficiency ফোকাস।

2. Reasoning Era (o1, R1, o3)

RL দিয়ে long chain-of-thought train করে test-time compute scale করা — math, code, science-এ breakthrough।

3. Agentic AI

Tool use, computer use, browser agent — Claude Computer Use, OpenAI Operator, Devin।

4. Multimodal Native

Late fusion থেকে native multimodal (GPT-4o, Gemini 2.5) — image, audio, video একসাথে।

5. World Models

Sora, Genie 3, V-JEPA — video থেকে physics learn করে simulation।

⚠️ SOTA Trap-এ পড়বেন না

Production-এ সবসময় SOTA চাইবেন না। GPT-4o-mini, Llama-3-8B, Mistral-7B দিয়ে 95% use case সম্ভব — 10× cheap এবং fast। SOTA শুধু সেখানেই যেখানে accuracy-ই বটমলাইন।

সারসংক্ষেপ

✨ এই অধ্যায়ে যা শিখলাম

2025-এর SOTA snapshot — domain-wise।
Track tools — PWC, HF, Arena, MTEB।
SOTA-কে critique করার ৬টি প্রশ্ন।
Mega-trend — reasoning, agent, multimodal, world model।

পূর্ববর্তী

পেপার রিপ্রোডিউস

পরবর্তী

ওপেন-সোর্স কন্ট্রিবিউশন