Yann LeCun1 on July 2, 2018, in a Science and Future interview, talked about the current abilities and limitations of supervised learning and presented self-supervised learning as the next step as well as a significant challenge of AI for the next decade.
What is Self-Supervised Learning?
Self-supervised learning is autonomous supervised learning. It is a representation learning approach that eliminates the pre-requisite of humans to label data, by extracting and using naturally available relevant context and embedded metadata as a supervisory signal.
For example, the semantic meaning of word “orange” can be learned differently when it appears near “t-shirt”, “fridge”, “county”, or “mobile”.
Although self-supervised learning is not limited to learning from visual cues or associated meta-data in cat images or videos, cats continue to play an important role in everything significant in machine learning. The example, Unsupervised Visual Representation Learning by Context Prediction is based on predicting the positional location of one rectangular section of an image relative to another. For instance, the right ear of a cat would be in the top-right position relative to the eyes of a cat. This approach allows learning about cats, dogs, or buses without prior explicit semantic labeling. The research is based on word2vec which predicts the semantic context of a word based on surrounding words, e.g., “orange” surrounded by words, e.g., “wear,” “t-shirt” is semantically different from “orange” surrounded by word “fridge”, as shown earlier.
Self-Supervised vs. Supervised Learning
Self-Supervised Learning is Supervised Learning because its goal is to learn a function from pairs of inputs and labeled outputs. In Self-Supervised Learning, labeled outputs are not provided explicitly. Instead, supervisory signals, embedded metadata, or domain knowledge available within the input is implicitly and autonomously extracted from the data. Like Supervised Learning, Self-Supervised learning may be used for regression and classification.
Self-Supervised vs. Unsupervised Learning
Self-Supervised learning is like Unsupervised Learning because no external labeling needs to be provided. The system learns without using explicitly-provided labels. It is different from unsupervised learning because we are not learning the inherent structure of data. It is not centered around clustering and grouping, dimensionality reduction, recommendation engines, density estimation, or anomaly detection.
Self-Supervised vs. Semi-Supervised Learning
Semi-supervised learning algorithms are trained on a combination of labeled and unlabeled data. Usually smaller amounts of labeled data in conjunction with large amounts of unlabeled data can speed up learning tasks. Self-supervised Learning is different as systems learn entirely without using explicitly-provided labels.
Relevance of Self-Supervised Learning
Self-Supervised learning is essential for many reasons but particularly because of shortcomings in both approach and scalability of supervised learning.
Supervised Learning is an arduous process, requiring collecting massive amounts of data, cleaning it up, manually labeling it, training and perfecting a model purpose-built for the classification or regression use case you are solving for, and then using it to predict labels for unknown data. For instance, with images, we collect a large image data set, label the objects in images manually, learn the network and then use it for one specific use case. This way is very different from the approach of learning in humans. Human learning is trial-based, perpetual, multi-sourced, and simultaneous for multiple tasks. We learn mostly in an unsupervised manner, using experiments and curiosity. We also learn in a supervised manner but we can learn from much fewer samples and we generalize extremely well.
For supervised learning, we have spent years collecting and professionally annotating tens of millions of labeled bounding boxes or polygons and image level annotations, but these datasets Open Images, PASCAL Visual Object Classes, Image Net, and Microsoft COCO collectively pale in comparison to billions of images generated on a daily basis on social media, or millions of videos requiring object detection or depth perception in autonomous driving. Similar scalability arguments exist for common sense knowledge.
Self-Supervised Reinforcement Learning
A dog trainer can reward a dog for positive behavior and punish for negative behavior. Over time the dog figures out and learns actions it took to get a reward. Similarly, in Reinforcement learning, a navigating robot learns how to navigate a course when rewarded for staying on course and punished when it collides with something in the environment. In both cases, this reward and punishment feedback reinforces which actions to perform and which to avoid. Reinforcement Learning works well in the presence of feedback system for rewards. It also requires a comprehensive set of training data and may be impractical in terms of cost of time and the number of iterations required before succeeding.
In the absence of rewards based feedback system a dog or a navigating robot may learn on its own by curiously exploring the environment. Researchers from BAIR created an “Intrinsic Curiosity Model,” a Self-Supervised Reinforcement Learning system that can work even in the absence of explicit feedback. It uses curiosity as a natural reward signal to enable the agent to explore its environment and learn skills for use later in its life. See Curiosity-driven Exploration by Self-supervised Prediction and this.
Use cases of Self-Supervised Learning
Self-Supervised Learning has found success in:
- Estimating relative scene depths without human supervision, by using motion segmentation techniques to determine relative depth from geometric constraints between scene’s motion field and camera motion
- Dense Depth Estimation in Monocular Endoscopy
- Terrain Roughness Estimator for Off-Road Autonomous Driving
- Robotic Surgery – Siamese Learning on Stereo Image Pairs for Depth Estimation
- Depth Completion from LiDAR and Monocular Camera
Latest papers on additional Self-Supervised Learning use cases can be found here.
Using Self-Supervised learning machines are able to predict through natural evolution and consequences of its own actions, similar to how newborns are able to learn incredible amounts of information in their first weeks/months of life by observing and being and curious. Self-Supervised Learning has the potential to scale learning to levels required by new use cases including but not limited to use cases in medicine, autonomous driving, robotics, language understanding, and image recognition.
1: Yann LeCun – Chief AI Scientist for Facebook AI Research (FAIR) and Director AI Research at Facebook, Professor NYU, known for Convolutional Neural Networks. https://www.sciencesetavenir.fr/videos/yann-lecun-explique-lintelligence-artificielle-et-ses-defis-a-venir_kzrzpf