Self Supervised Learning: What is Next?

ECCV 2024, September 29th 9AM-1PM, Space 2 MiCo Milan

Summary

Training robust visual models from unlabeled data is a long-standing problem, as human-provided annotations are often costly, error-prone, and incomplete. Consequently, self-supervised learning (SSL) has become an attractive research direction for learning generic visual representations useful for a wide range of downstream tasks such as classification and semantic segmentation. SSL operates on the principle of pretext tasks, self-generated challenges that encourage models to learn from the data's inherent structure. Initially, these tasks were relatively simplistic, such as predicting the rotation of an image or restoring its original colors. These early experiments laid the groundwork for more sophisticated techniques that extract deeper understandings from visual content through, e.g. contrastive learning or deep clustering.

Although these methods have already outperformed supervised representations in numerous downstream tasks, the field continues to advance at an unprecedented pace, introducing many new techniques. Some of the directions are predictive architectures removing the need for augmentation, masked image modeling, auto-regressive approaches, leveraging the self-supervision signals in videos, and exploiting the representations of generative models.

With so many new techniques flooding the field, it is important to pause and discuss how we can make optimal use of self-supervised representations in applications, as well as what are the remaining obstacles and possible approaches to tackle them. The workshop aims to give space to ask and discuss fundamental, longer-term questions with researchers leading this area. Key questions we aim to tackle include:

This is the third iteration of the SSL-WIN workshop. The workshop will be organized as a half-day event where a series of invited speakers will present their views on how the field needs to evolve in the coming years.

Invited Speakers

Schedule

The workshop is a half-day event consisting of a series of invited talks on recent developments on self-supervised learning from leading experts in academia and industry, along with a poster session highlighting recent papers in the field.

Time Speaker Talk Title Slides
09:00 Opening
09:00 -
09:30
Oriane Siméoni From unsupervised object localization to open-vocabulary semantic segmentation Slides
09:30 - 10:00 Ishan Misra What world priors do generative visual models learn? ArXiv
10:00 - 10:30 Xinlei Chen Diffusion Models for Self-Supervised Learning: A Deconstructive Journey Slides
10:30 - 11:20 Poster Session
11:20 -
11:55
Olivier J. Hénaff Data curation is the next frontier of self-supervised learning Slides
11:55 - 12:30 Yuki M. Asano Vision Foundation Models (with academic compute) Slides
12:30 - 13:00 Yutong Bai Listening to the Data: Visual Learning from the Bottom Up Slides

Poster Session

List of accepted papers for presentation:

Organizers