[논문 리뷰 스터디] Momentum Contrast for Unsupervised Visual Representation Learning [MoCo, CVPR’20]

심화 스터디/논문 리뷰

by 이듄 2022. 11. 24. 19:55

작성자: 15기 이승은

Contrastive Learning

대상들의 차이를 명확하게 보여줄 수 있도록 학습 -> 차이의 기준: 거리

Note. SimCLR

image augmentation을 통해 positive pairs 생성 -> self-supervised image recognition: contrastive learning (contrastive loss) – 정답 target을 주지 않고 이미지를 분류할 수 있는 feature를 학습

(아래의 Moco와 비교했을 때, 모든 image에 대해서 loss를 계산하여 update: out of memory error)

deep metric learning: deep neural network로 적절한 manifold를 찾아서 하는 metric learning
contrastive loss

dimension reduction (embedding) 후 distance (loss) minimization (positive pairs) maximization (negative loss) with metric (mostly Euclidean distance or cosine similarity)

negative pair 데이터가 margin 이상의 거리를 갖게 하도록 학습, 유사한 데이터들끼리 clustering

Moco Encoder

Define Contrastive Learning as Dynamic Dictionary System

dictionary는 크게 만들기 – dictionary를 queue로 구성
progressively updating the encoder – momentum update

minibatch image data -> augmentation (pair 생성)

augmentation version 1 -> q encoder (minibatch image 1개씩을 input -> feature 계산)

augmentation version 2 -> k encoder (minibatch image 전체를 input -> feature 계산)

queue (dictionary)에는 k encoder를 거친 m (minibatch size)만큼의 feature가 쌓이고, queue에 있는 feature들과 q encoder를 거친 feature들 사이의 contrastive loss를 계산

이 과정을 총 m번 수행: positive pair에 대한 loss 1회 + negative pair에 대한 loss (m-1)회
다 더해서 최종 loss: backpropagation, backpropagation을 통해 q encoder update, k encoder는 q encoder와 momentum을 계산해서 update
이후 다음 minibatch 진행: 이때, overflow 전까지 이전 minibatch의 데이터를 남기고, overflow 발생 시 dequeue (dictionary가 minibatch size보다 크게 유지)
momentum (m=0.999 학습 잘됨, 최대한 천천히 움직이도록 학습
memory bank: 모든 image의 feature 저장 후 sampling해서 사용 (sampling으로 일관된 업데이트 x)
InfoNCE loss [Noise-Contrastive Estimation]

(positive pair의 output) / (positive와 negative pair output의 합)

positive pair의 output값을 크게 만들어서 log안의 값이 1로, 전체적으로는 0으로 가게 함.

More

DCL [Decoupled Contrastive Learning], NNCLR, MoCo v2

Reference

https://github.com/mi2rl/MI2RLNet_V2

https://arxiv.org/abs/1911.05722

https://89douner.tistory.com/m/334

https://ffighting.tistory.com/entry/CVPR-2020-MoCo-리뷰

https://towardsdatascience.com/contrastive-learning-in-3-minutes-89d9a7db5a28

'심화 스터디 > 논문 리뷰' 카테고리의 다른 글

[논문 리뷰 스터디] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction (0)	2022.11.24
[논문리뷰] Interpreting Neural Networks with Nearest Neighbors (0)	2022.11.24
[논문 리뷰 스터디] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction (0)	2022.11.24
[논문 리뷰] Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference (0)	2022.11.10
[논문 리뷰 스터디] What uncertainties do we need in Bayesian deep learning for computer vision? [NeurlPS 2017] (0)	2022.11.10