What Is Semantic Segmentation In Computer Vision?

Semantic Segmentation is a crucial technique in the arsenal of any Machine Learning enthusiast pursuing Computer Vision. Therefore, this article will first cover a brief introduction to Semantic Segmentation. It will be followed by a few methods to implement it. Lastly, it’s practical, real-life uses.

What is Segmentation?

We know about Object Detection. Take the example given below:

image segmentation — Reference: *cs231n.stanford.edu*

Consider the image on the left. Yes, you guessed it right. That is Object Detection in action. The brief idea in this technique is that we use a “Principle of Localization” to detect the rough outline of the specific object in the image. So, as you can see in the image on the left, rectangular “boxes” are constructed around each animal. This indicates their respective position in the picture. However, is all the information in the red box dog related if we go on pixel-level image analysis? No, right? As we can see, some parts of the red mattress, some of the cushion behind, and more such “non-dog” information.

Thus, we must devise an efficient method that eliminates noise and only highlights the necessary information about an object. This is where Segmentation comes into the picture. Segmentation classifies the entire image on a pixel level and assigns each pixel a specific class. Therefore, all the pixels belonging to a particular form of an object are one “segment.”

What is Semantic Segmentation?

There are multiple types of Segmentation. We will not discuss these details, but just a quick gloss-over. Semantic Segmentation classifies all the pixels of all the entities of one specific class with the same color. On the contrary, we use Instance Segmentation to classify different entities of the same class with different colors. And Panoptic Segmentation combines the results of the previous two methods to produce exceptional results.

As we can see in the example above, the image on the left uses Semantic Segmentation. However, the image on the right uses Instance Segmentation. Here we can see that the image on the left has segmented all the people with the same color—the same color as the color of the “Person” class. But the image on the right shows that all five individuals have distinctive colors. Despite all five objects belonging to the same category, “Person.”

Semantic Segmentation Implementation

Now, let’s discuss a few ways to implement Semantic Segmentation and how these methods differ.

Convolutional Encoder Decoder

The first and foremost thought occurring in any Computer Vision enthusiast’s mind is using a CNN. Convolutional Neural Networks are an essential tool in image-based tasks. But the usual method of employing CNNs as we do in Image Classification tasks will not work. We need to separate different features in the image. Then group them into segments, and classify this segment as a whole (on a pixel level). Therefore, instead of a Fully Connected Layer as the last layer, we will extract features. And these extracted features, after grouping into segments, will be interpolated to reconstruct the original image. However, each pixel will carry a specific class label this time.

Thus, we encode the image into features, and based on these features; we decode this image differently. In this way, we get the concept of Encode-Decode. The first part of this structure makes up the “Encoder Network.” And the second part comprises the “Decoder Network.”

Convolutional Encode Decode
*Reference: https://onlinelibrary.wiley.com/doi/10.1002/mrm.26841*

Pyramid Scene Parsing Network

Pyramid Scene Parsing Network, also known as “PSPNet,” was created to gain an understanding of the scenic context in an image. Since we individually classify objects in an image, scenic context becomes essential. Take a look at the below example for more clarity:

PSPNet helps us exploit the picture’s global context rather than attribute classes to pixels. This becomes important because, as you can see in the image above, information is lost because of spatial similarity with noise. PSPNet helps us catch that loss and transmit it as valuable data, which Convolutional Encoder-Decoder fails to do.

DeepLab

Developed by Google, DeepLab also places its base belief on CNNs. It uses features provided by the last convolutional block before upsampling it. It employs the “Atrous Convolution” for upsampling.

Now a natural question arises, what is Atrous Convolution? Atrous convolution uses defined gaps in its convolution computation. The dilation factor “k” represents how many pixels to skip between two successive pixel selections for calculation. Confusing right? Have a look at an illustration below:

Here, the “Rate” is nothing but the dilation factor “k.” So as we can see, for k=1, the convolution proceeds like usual. But when k=2, it skips one pixel while selecting the next pixel, which is consistent for all selections. As you can guess, this “skip selection” helps us to reduce computation costs while tracking more information.

Semantic Segmentation Real-Life Applications

Semantic Segmentation finds a lot of day-to-day use which we don’t even perceive. Let’s discuss a few examples to appreciate the need to study Semantic Segmentation:

Self-driving cars

Self-driving cars need to map their surroundings to make precise decisions accurately. We cannot use Object Detection as every bounding box may include noise and confuse the vehicle about the actual boundary of the object. Thus, Segmentation becomes an inescapable necessity. Take a look at the following example:

Semantic Segmentation in self-driving car — Semantic Segmentation in Self-Driving Car

We can portray the road by a specific color and pedestrians and cars as one “obstruction” color. Doing this can ensure when the vehicle moves and when it should stop.

Diagnosis of Medical Issues in Medical Scans and Images

Complex medical images, MRI scans, and CT scans can make it tricky for the doctor to analyze the issue. However, technology can assist the person in this sense. We can use Semantic Segmentation to pinpoint the specific area of desire within the scan with sufficient accuracy. This can save a lot of time on initial analysis and speed up the doctor’s actual process of treatment. Take a look at the image below to gain more clarity:

Semantic Segmentation in medical diagnosis — Semantic Segmentation in Medical Diagnosis

Understanding Scenic Contexts

As we have seen earlier, Semantic Segmentation can also help understand scenic contexts. We can accurately separate different segments of an image based purely on the context they carry. This can be helpful in filmmaking, designing new projects, construction, and many more.

Drone View Processing

In this real-life application, we will focus on specific information. You might be curious about what drone view is. A drone view is the view of a region or area vertically over it. For this to work, the image captured must be from quite a distance above. Semantic Segmentation can help determine the region’s general terrain, group localities or communities together, and target specific spots for operation. Therefore, this is particularly helpful in military operations, farming purposes, construction designs, etc. For instance, consider the example below:

Semantic Segmentation in drone view — Semantic Segmentation in Drone View Analysis

Conclusion

I hope I have given a successful overview of Semantic Segmentation. Its applications in Medical Analysis particularly intrigue me. With proper research and commitment to this area, it can prosper as a fantastic technique. I recommend practicing the algorithms on your own on test datasets. Participate in as many Kaggle competitions as you can regarding Segmentation to truly grasp its core and implementation. This will develop your coding skills and help you get the theory behind it. Happy Coding!