The year 2020 has seen major advances in self-supervised representation learning, with many new methods reaching high performances on standard benchmarks. Using better losses and augmentation methods, this trend will surely continue to slowly advance the field. However, it is also apparent that there are still major unresolved challenges and it is not clear what the next step-change is going to be. In this half-day workshop we want to highlight and provide a forum to discuss potential research direction seeds from radically new self-supervision tasks to downstream self-supervised learning and semi-supervised learning approaches.
As the methods are maturing, the field is now at the point where we have to start discussing how we can make optimal use of self-supervised representations in applications, as well as what are the remaining obstacles and possible approaches to tackle them. The workshop aims to give space to ask and discuss fundamental, longer-term questions with researchers that are leading this area. Key questions we aim to tackle include:
- What can be learned by the current generation of self-supervised techniques? And what does, instead, still require manual supervision?
- How can we make optimal use of self-supervised learning?
- Is combining tasks the new way forward?
- Are images enough? Video and multi-modal data as self-supervision is becoming popular.
- Why do contrastive losses work well and do they scale?
- How do we evaluate the quality of the learned representations?
- How can we move to meaningful down-stream tasks that benefit from feature learning?
- Why do methods such as clustering or contrastive losses work better than those that focus on interpretable, image-specific tasks as for example colorization or jigsaw puzzles?