Chapter 23

অবজেক্ট ডিটেকশন

Object Detection

🎬 কী + কোথায়

Object Detection মানে শুধু "কী আছে" নয় — "কোথায় আছে" তাও বের করা। প্রতিটি object-এর জন্য bounding box + class label + confidence score।

Classification vs Detection vs Segmentation

টাস্ক

আউটপুট

উদাহরণ

Classification

একটি label

"cat"

Detection

Box + label

"cat at (x,y,w,h)"

Segmentation

Pixel-wise mask

cat-এর প্রতিটি পিক্সেল

Bounding Box ফরম্যাট

COCO:   [x_min, y_min, width, height]
Pascal: [x_min, y_min, x_max, y_max]
YOLO:   [x_center, y_center, width, height]   (normalized 0–1)

মূল্যায়ন Metric

IoU (Intersection over Union)

IoU = |A ∩ B| / |A ∪ B|

IoU > 0.5 হলে detection সঠিক ধরা হয়

mAP (mean Average Precision)

প্রতিটি class-এর জন্য precision-recall curve থেকে AP, তারপর সব class-এর গড়। mAP@0.5 মানে IoU threshold 0.5। COCO-তে mAP@[0.5:0.95] — ১০টি threshold-এর গড়।

দুই পরিবারের Detector

Two-stage — R-CNN পরিবার

R-CNN (2014): Selective Search → CNN → SVM। ধীর।
Fast R-CNN: পুরো ইমেজ একবার CNN, RoI pooling।
Faster R-CNN: Region Proposal Network (RPN) — end-to-end।
Mask R-CNN: Faster R-CNN + segmentation head।

One-stage — দ্রুত, real-time

YOLO v1–v8: grid-based prediction।
SSD: multi-scale feature map।
RetinaNet: Focal loss → class imbalance সমাধান।
DETR: Transformer-based, anchor-free।

Non-Maximum Suppression (NMS)

def nms(boxes, scores, iou_thresh=0.5):
    idxs = scores.argsort()[::-1]
    keep = []
    while len(idxs):
        i = idxs[0]
        keep.append(i)
        ious = iou(boxes[i], boxes[idxs[1:]])
        idxs = idxs[1:][ious < iou_thresh]
    return keep

🔑 Anchor Box

Pre-defined বিভিন্ন aspect ratio-র box যেগুলোর সাপেক্ষে মডেল offset শেখে। DETR-এর মতো আধুনিক anchor-free model এই concept বাদ দিয়েছে।

Pretrained দিয়ে দ্রুত শুরু

# torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
model = fasterrcnn_resnet50_fpn(weights="DEFAULT").eval()

with torch.no_grad():
    out = model([image_tensor])
print(out[0]["boxes"], out[0]["labels"], out[0]["scores"])

অনুশীলন

১. Pretrained Faster R-CNN দিয়ে নিজের ইমেজে inference চালান।

২. দুটি box-এর IoU compute করার ফাংশন লিখুন।

৩. NMS implement করে আউটপুট compare করুন।

সারসংক্ষেপ

✨ এই অধ্যায়ে যা শিখলাম

Detection = box + label + score।
Two-stage: accuracy; One-stage: speed।
IoU, NMS, mAP — তিনটি core টুল।