What is Transfer Learning and Data Augmentation in Computer Vision?

What is Transfer Learning and Data Augmentation in Computer Vision?

Transfer Learning and Data Augmentation tackle two different but critical issues in the Computer Vision sector. We researchers could leapfrog the drawbacks that traditional Machine Learning methods present by utilizing some of these improved techniques. Let’s briefly get into an introduction to Computer Vision. Post this; we will describe Transfer Learning and Data Augmentation with a few examples.

What is Computer Vision?

Computer Vision is arguably the most widespread sector of Deep Learning. We use it to tackle image-based tasks. Now that can include Image Classification, Object Detection, Semantic Segmentation, Video Analysis, and much more. As the name suggests, we provide algorithms or “models” to the computer to learn the image-oriented dataset offered by us. We define these models to detect the edges and patterns in those images and derive conclusions from them.

Wait, isn’t this human vision? Yes! And that is what we have replicated in this sector. We provide computers with a basis to function exactly how our vision performs. We provide it with digital image data and ask it to make decisions, just as we do in our daily lives.

Computer Vision
Human Vision and Computer Vision

What is Transfer Learning?

Transfer Learning is a Machine Learning method in which you use a pre-trained model as the base model for a new task. You take a model that has already been trained on millions of samples. Therefore, this model has some features which are already “learned.” Thus, we do not need to “create and train” a model from scratch for our task. We can take this pre-trained model as the starting point, provide our sample data and predict our results.

The reason why Transfer Learning is so popular is that most tasks have a small dataset. Thus, it is incredibly tough for researchers to develop a model that can impressively learn features based purely on this tiny dataset. As a result, scientists use a model (for example, ImageNet) that has been generically trained on millions of samples. Because of such a large training dataset, it has reliably learned the features and can better describe the task than most other handcrafted models. Transfer Learning helps to save time and effort and also produces better results. Take a look at the example below for more clarity:

Transfer Learning
Traditional ML Model v/s Transfer Learning

Types of Transfer Learning

There are three major types of Transfer Learning. We can choose one method over the other based on the application domain, the kind of task, and the size of the dataset. Before diving into the discussion, let’s clear up one confusion. The source (domain, task, model, and so on) stands for the model’s initial environment (one where it is pre-trained). The target is everything related to the user’s reference (where the pre-training is applied).

Inductive Transfer Learning

In this type of Transfer Learning, we have the source and the target domains to be the same, although the individual tasks may differ. We have a dedicated set of labeled data in the target domain. The algorithm can execute much better on the target task since it already knows the relevant features. This is easier than creating a new model and training it on the target data from scratch. We can divide Inductive Transfer Learning into two categories based on the presence of labeled source data: Self-taught Learning (no labeled source data) and Multi-task Learning (labeled source data available)

Transductive Transfer Learning

Here, the domains are not necessarily related. However, the tasks performed by the source and target bear a striking resemblance. This form of learning is most used when we do not have labeled target data but extensive labeled source data. We can divide Transductive Transfer Learning into two categories: Domain Adaptation (different domains but single task) and Covariance Shift (single domain and single task).

Unsupervised Transfer Learning

We use this method when we do not have labeled source and target data. The algorithms used here focus specifically on unsupervised tasks

Types of Transfer Learning
Types of Transfer Learning

How to implement Transfer Learning

Let us have a look what are the basic steps to implement Transfer Learning in your project:

  1. Select the pre-trained model: We need to select the pre-trained model which will suit our purpose. I will provide a few suggestions that you can use:
    • Computer Vision:
      • VGG-16
      • VGG-19
      • ImageNet
      • Resnet-50
    • Natural Language Processing:
  2. Create your model with the pre-trained model as the starting point: We will construct our model by keeping this pre-trained model as the starting point. This will bring in all the learned features of the pre-trained model, and now our custom model will work with them.
  3. Freeze starting layers of the pre-trained model: This is perhaps the most crucial step in the entire process. We need to “freeze” the initial layers of this model to preserve the learning. If we do not do so and train, as usual, we will lose all the features learned, and it will execute from scratch.
  4. Add new layers to the end of the model according to your task: Pre-trained models generally have different output classes and final layers as how you would need. So it is best to add new layers, customized to your needs, at the end of the model.
  5. Train these new layers according to your dataset: Once you add them, they will need training before using them for predictions.
  6. Fine-tune the custom model and improve your accuracy: Voila! Your model is ready! The last step you need to perform is to fine-tune the model, regulate the hyperparameters and improve your accuracy.

Uses of Transfer Learning

Transfer learning finds its uses in a lot of prime Deep learning fields. We will go over some of them below:

Natural Language Processing (NLP)

NLP uses the concept of Transfer Learning heavily throughout its work. Models like BERT, Word2Vec, XLNet, etc., are used in almost every NLP task, such as machine translation, interrogative studies, autocorrects, etc.

Transfer Learning in NLP
Transfer Learning in NLP

Computer Vision

We use pre-trained deep neural networks in Image Processing, Image Classification, Object Detection, and Segmentation. The presence of millions of learned parameters in these pre-trained models makes it incredibly easy to implement tricky tasks using just a few lines of code.

What is Data Augmentation?

Data Augmentation in Computer Vision solves the extremely troubling issue of small datasets. Generally, people do not have many resources to develop thousands and millions of images required to train a model for the desired accuracy sufficiently. Yes, we can use Transfer Learning, as seen above. But still, sometimes the dataset may be too small even to implement that. As a result, we need to find some way to increase our dataset without spending many human resources. That’s where Data Augmentation comes in.

Data Augmentation is the process of artificially increasing the size of the dataset by creating new data from the existing data. This may include multiplying the initial data by geometric transformations of individual images (Augmented Data). Or we can also create new images from existing ones by using Generative Adversarial Networks (GANs) (Synthetic Data)

Why do we need Data Augmentation?

As seen above, Data Augmentation solves the ever-necessary issue of data shortage. But more than that, it also provides diversity to the dataset, thus exposing our model to different kinds of data. In doing so, we increase our model efficiency to predict additional images.

Another important use of this technique is that it saves human resources. It saves time and cost to obtain extra resources, label them, make sure they are model ready, and so on. Since the algorithm handles it completely, it’s quick and consistent with the original dataset.

Data Augmentation
Data Augmentation

Data Augmentation in Computer Vision

Now, we will see some techniques to implement Data Augmentation in Computer Vision tasks. Two are implemented on the existing data, and one is used to create new data from the current data.

Position Augmentation

We apply position augmentation to modify the dimensional properties of the image:

  • Crop: We can perform a “Center Crop” of the image based on the size given by the user. We can also perform a “Random Crop” at any location in the picture.
  • Flip: Probability defines the randomness in this method. We represent a probability for the flip based on which the given image will either generate a newly flipped image or not. “Random Vertical Flip” and “Random Horizontal Flip” are the two types of flips possible.
  • Random Rotation: It randomly rotates the given image by a specific angle selected from a range.
  • Resize: The user specifies a particular size, and the image is resized according to that.
  • Random Affine: This is used when we apply a combination of two or more basic transformations to the image. We can apply scaling, rotation, translation, shearing, or variety.
Position Augmentation
Position Augmentation

Color Augmentation

Color Augmentation is applied to enhance the dataset in terms of color intensity, brightness, coloring types, etc. Brightness, contrast, saturation, hue, and color normalization are some methods to implement color augmentation.

Color Augmentation
Color Augmentation

Synthetic Augmentation

This is an exciting category. As we have seen earlier, synthetic augmentation is used to create new images from the existing dataset. We will look at some techniques widely used in this type of augmentation:

  • Adversarial Machine Learning: We apply pixel-level changes to modify the image. This new image is invisible to human eyes. Such images form the idea of an “adversarial attack” on the system. We transform the images until our model incorrectly classifies the data. Then, we use these altered images as part of our dataset to make our model robust.
  • Generative Adversarial networks (GANs) comprise a generator and discriminator. It is a game between the two. The generator generates an image based on the dataset and is compared with the discriminator. The result of this is used to fine-tune the generator, and the cycle repeats. The most significant disadvantage of this method is its high computational cost.
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
  • Neural Style Transfer: We define a CNN network to break down the content and style of an image. Then we merge the content of one image and the style of another to create an augmented image. Here, the content remains the same, but the style changes, adding robustness to the system.
Neural Style Transfer
Neural Style Transfer

Uses of Data Augmentation

We use Data Augmentation in healthcare to extend the medical dataset at hand. This enables the models to predict the disease region or help with scans accurately. This speeds up the diagnosis process by a high margin. Another widespread use case is that of self-driving cars. Self-driving cars need to be tried and tested before trials on natural roads. And they can make their model robust and precise by training on an extended dataset available possible with the help of data augmentation.


I hope this was a clear and concise guide toward Transfer Learning and Data Augmentation in Computer Vision. You can never get enough of these concepts by reading articles and blogs, so I highly suggest going to your coding playgrounds. Try and try and test out this stuff as much as you can. Because the more you try, the more you observe, and the more you watch, the more you learn. Happy Coding!

Sharing is caring

Did you like what Sanchet Sandesh Nagarnaik wrote? Thank them for their work by sharing it on social media.


No comments so far