I’ll give you four reasons.
To make artificial intelligence you need three components:
It is now widely available, easily scalable and relatively inexpensive with the cloud and GPU. And computing power is growing exponentially. On your iPhone you have more computing power than the entire Apollo program!
The state is roughly available on github thanks to Google and Facebook publications. You can now build a translation stack that embeds Google’s latest Transformer architectures. The development of open source, benefiting from the network effect of the internet, is also exponential. The deep learning algorithms are in the process more and more greedy in learning data.
And there’s a lot. Since the digital revolution, pretty much all the information in the world is recorded in digital format. But this data, you have to annotate it. That’s what Facebook does today when it asks you to comment on a picture or identify a friend in a picture. Every day we upload more than a billion pictures to Facebook and annotate them. We produce a huge learning database for Facebook so that they have been able to develop models that can identify people from behind. But in most companies, the data is siloed and unstructured.
That’s what Kili Technology is all about 🙂
At BNPP, to solve this problem of lack of learning data, we have set up annotation interfaces. But in doing so, we were confronted with the following problems:
For example, a retail customer is working on a project to provide a customer experience similar to Amazon Go. To do this, more than a million images need to be annotated, so you need to be able to take people on board and measure their performance and rendering quality. And to give you an order of magnitude, with a basic tool not very powerful, 10k images is 3 months for 1 person. So for a project like this, you need a lot of annotators and you have to coordinate them. How do I manage data access rights?
Even if you’re good at interfaces and annotator coordination, it’s still quite long. And so it has to be accelerated in an intelligent way by getting the best out of what man and machine can do. The machine being very good at repetitive tasks where man gets tired quickly and man being good at distinguishing and assimilating new nuances.
We’re typically going to want to do
That is, to start learning a model being annotated in order to pre-annotate the data.
That is, which assets to start annotating. Typically if you do deep learning, you want maximum diversity from the very beginning of your training. How do you manage an optimal prioritization thread that is not at first glance the alphabetical order of the files to annotate?
How to use business rules to massively pre annotate? For example if I need to annotate product names in text and I can also extract a name dictionary from my product repository?
Even if we manage to produce an annotated dataset in sufficient quantity and quality to train a good model and have acceptable results, we never get 100% performance.
Indeed, all of this allows the initial training to be done. We will extract the data from your systems and annotate it to create the dataset.
Imagine if you do automatic entry of customer orders read in mails directly into the CRM and this launches the production line of a 12-ton metal bale?
On customer order emails, for example, we achieve well over 95% performance on classification and over 80% performance on named entity recognition. Which is already very good. But to capture only data with 100% reliability in the systems, in order to guarantee the integrity of the orders, it is essential to orchestrate human supervision in production.
You have to be able to keep the human in the loop, to capture the feedback to keep the models in production learning through annotation.
It’s key to keep the humans in the loop when you want to put the A.I. into production. #Human in the loop.#