본문 바로가기
딥러닝 머신러닝 데이터 분석/BoostCampAITech

[CV] Object detection

by SteadyForDeep 2021. 9. 12.
반응형

Fundamental image recognition tasks[Kirillov et al., CVPR 2019]

- Semantic segmentation [instance recognition : X | semantic recognition : O]

- Instance segmentation [instance recognition : O | semantic recognition : X]

- Panoptic segmentation [instance recognition : O | semantic recognition : O]

 

Further topic

- Object detection [classification + Box localization]

- OCR

 

Traditional method (hand crafted techniques)

- Gradient method

  - Average gradient : edge detection, gradient based detector(e.g. HOG)

  - max(+ or -) SVM weight

  - R-HOG description or R-HOG SVM weight

- Selective search [UIjling et al., IJCV 2013]

 

R-CNN[Girshick et al., CVPR 2014]

- Region with CNN features

- Extract region (such as selective search) and warpping -> CNN features -> Classifier

- Traditional method for preprocessing : performance limitation

- Model prediction for every region proposal : heavy computation

 

Fast R-CNN[Girshick et al., CVPR 2014]

- Conv feature map (independent of original image size, extractor is not needed)

- RoI(Region of Interest) feature extraction and resample

- Region proposal is hand-crafted algorithm -> limited performance

 

Faster R-CNN[Ren etal., NeurIPS 2015]

- IoU = Intersection / Union (the higher, the better)

- Region proposal

  - Anchor boxes(A set of pre-defined bounding boses)

  - IoU between Ground truth proposed anchor box is the criteria of positive and negative

  - Region Proposal Network(RPN)

- Non-Maximum Suppression (NMS)

 

One-stage(single-stage) detector

- [Ndonhon et al., offshore Technology Conference 2019]

- No explicit RoI pooling

- You only look once (YOLO)[Redmon et al ., CVPR 2016]

- Single Shot MultiBox Detector (SSD)[Liu et al., ECCV 2016]

 

Two-stage vs. One-stage

- Focal loss

  - Class imbalance problem on Single-stage detector (# of negative anchor boxes >> # of positive boxes)

  - Improved cross entropy loss

Detector with Transformer

- ViT by Google

- DeiT by Facebook

- DETR[Carion et al., ECCV 2020]

  - objective query : Learned positional encodings for querying

 

 

 

###

 

피어세션

 

https://www.notion.so/Bilinear-resize-convolution-c893a921898f4987aded25f85674c730

 

 

 

반응형

댓글