Data annotation is the process of labeling training data to make it usable in supervised learning tasks. In 2018 the survey What AI can and can’t do (yet) for your business by McKinsey states that the first limitation to AI applications is the lack of labeled data. To tackle this limitation we created Kili technology annotation platform to provide companies with relevant high quality labeled data. As a matter of fact, while AI allows us to automate more and more human tasks, we cannot get rid of the “human in the loop” when it comes to data annotation. Human data annotation requires an organized workforce and a software to provide the quality and quantity of labeled data for industrial AI applications.
Labeled images are the backbone of AI systems such as self-driving cars and automated medical imagery analysis which now requires tens, hundreds of thousands, even millions of images to train. Therefore the acquisition cost of those data can not be neglected. Previous techniques of annotation such as bounding boxes while cheap limits the performance of deep learning models. Bounding boxes are limited when annotating overlapping entities or non rectangular objects. As shown below the bounding box annotating a crack in a wall covers the entire image ! Nowadays, pixel accurate labeling has become the new norm as it removes most of the noise that bounding boxes introduce in the data. Because annotating an image at pixel level is more time consuming than drawing a bounding box it can cost up to 10x more !
In this article we will focus on image segmentation and offer a comparison of the segmentation tools available at the moment and see how they can reduce annotation time and cost.
Image segmentation is the process of partitioning an image in multiple segments. Every pixel within a segment represents a semantic concept label. Here we present the three task of image segmentation present in the industry:
In addition to being time consuming, image segmentation is also not safe against human errors especially when taking into account the tiredness of annotators after labeling multiple images !
We can class the tools used to perform image segmentation in three categories:
In its most classic form pixel accurate segmentation can be obtained using a digital pen or digital brush that allow the user to manually annotate the different entities of an image. When considering such a tool, a user should verify that the drawn boundaries of objects are automatically adjusted when overlapping. This functionality will save you a lot of time when annotating, as it can be really tedious to perfectly annotate objects with common boundaries.
Recent progress in deep learning and image segmentation such as the papers Polygon RNN+ and DEXTR, allowed for the creation of deep learning based tools for image segmentation. Those tools allow the user to generate a pixel accurate annotation of an object either by placing a bounding box around it (left) or by placing multiple points among its edges (right).
Tools based on superpixels segmentation displays pre-computed clusters of pixels on the image allowing users to annotate an object in only a couple clicks. In order to be relevant those tools must generate superpixels that precisely separate the different objects of an image. One of the most known superpixel algorithms known is SLIC, which performs segmentation of the image based on the pixel colors and position which perform quite poorly in real life (right). Newer techniques of superpixels have come to light which are capable to faster produce really accurate segmentation of entities within an image allowing to consequently speed up the annotation process (left).
As shown on the previous pictures adaptive superpixels allow for a high precision segmentation of entities within a picture. Using this tool relieves the annotator from drawing complex boundaries of an object increasing both annotation speed and quality.
In this part we compare the three categories of tools by annotating pictures and measuring the time taken to perform the image/instance segmentation on different use cases. In the first use cases we performed instance segmentation only identifying some entities within the image whereas for the last application we performed the full image segmentation. We measured for each task the time taken to annotate images at the pixel level until reaching similar quality of annotation.
From the presented experiments we can conclude the following: