Technical Talk: SELF-supervised learning for Image Embedding

Technical Talk: SELF-supervised learning for Image Embedding

COGO Working Space, 4th Floor, Sun Grand City Ancora Residence, No.3 Luong Yen Street, Hai Ba Trung District, Hanoi, Vietnam


11:00 am - 12:30 pm

Trieu H. Trinh



We introduce a pretraining technique called Selfie, which stands for SELF-supervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method learns to select the correct patch, among other “distractor” patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches. The pretraining architecture includes a network of convolutional blocks to process patches followed by an attention pooling network to summarize the content of unmasked patches before predicting masked ones. During finetuning, we reuse the convolutional weights found by pretraining. We evaluate our method on three benchmarks (CIFAR-10, ImageNet 32 x 32, and ImageNet 224 x 224) with varying amounts of labeled data, from 5% to 100% of the training sets. Our pretraining method provides consistent improvements to

ResNet-50 across all settings compared to the standard supervised training of the same network. Notably, on ImageNet 224 x 224 with 60 examples per class (5%), our method improves the mean accuracy of

ResNet-50 from 35.6% to 46.7%, an improvement of 11.1 points in absolute accuracy. Our pretraining method also improves ResNet-50 training stability, especially on low data regime, by significantly lowering the standard deviation of test accuracies across datasets.

About the Speaker

Trieu H. Trinh, GoogleBrain Resident

Trieu graduated in 2017 with a Bachelor in Computer Science from HCM University of Science. He joined the Google Brain team in Mountain View, CA after graduation to work on Machine Learning/ Deep Learning research, with the mentorship of Quoc V. Le and Thang Luong. His research interests center around unsupervised learning and its applications. His work on using unsupervised learning to aid long-term dependencies learning in RNNs was accepted to ICML 2018. In a later work using language models, Trieu and collaborators achieve state-of-the-art zero-shot transfer learning on the hard commonsense reasoning benchmark Winograd Schema Challenge, pre-cursing the success of GPT-2 . His most recent collaboration explores how to extend the success in self-supervised learning in NLP to Computer Vision.

Follow us on Facebook for further events and seminars!

Event Schedule

11:00 am – 12:30 pm.

Monday, July 22, 2019

Upcoming Events