Artificial Intelligence (AI) has become a technology that is used in our daily lives, but understanding how machine learning works is a completely different story. Typically, machine learning has been developed by people who are experts in the AI field and have access to high computing power. Microsoft has removed these barriers by providing Azure Cognitive Services. These services are available to anyone with an Azure cloud subscription and make it easy for developers to add AI features into their own applications. Today, I will be discussing Microsoft’s easy-to-use image classifier service Custom Vision.
I first came across Custom Vision while developing an application to generate fashionable outfits. The idea was to be able to upload an image of an outfit, (found on Pinterest or a fashion blog) and then have the application find similar items at a handful of websites I typically shop at. Fashion is not an easy subject to teach an AI model, but an image classifier was the perfect solution to pick out key features of an outfit and use these features to find similar products.
So how does Custom Vision work? Well, the best part about Custom Vision is that you don’t need to know. Custom Vision provides a simple UI and API for creating, training, and testing a model and applying it to an application.
Let’s learn just how easy it is to build a model. The process can be broken down into four steps:
- Collect training images
- Upload and tag images
- Train the model
- Analyze and test the model
For the purposes of this post, I will create a shoe classifier that classifies heels, sneakers, and boots. First things first, you need to set up your project. You can refer to Microsoft’s documentation to do so. After your project has been set up, you can start collecting and training images.
1. Collect Training Images
To train your model, you need to provide it with images. The images you train your model with are essential to its success. Common issues with image classifiers are underfitting and overfitting. Underfitting is when the model has not learned enough and performs poorly on both the training and testing images. Overfitting is the opposite and occurs when the model has learned too much. The model has learned the training images so well that it “memorizes” aspects of the training images that are not actually important. For example, if every photo of sneakers has green grass in the background the model may associate green grass with the tag “sneakers”. When the model is tested with new images it may tag all the images with grass in the background as “sneakers.” This will also cause the model to perform poorly on the testing images. Here are some key points to keep in mind to avoid underfitting and overfitting:
The more the better!
Generally, the more data provided to the model, the better. Microsoft encourages users to upload at least 50 images per tag to ensure the model has a strong understanding of that tag.
When training the model on specific tags, it is important to upload a balanced number of images per tag. If I uploaded 800 pictures of heels and 100 pictures of boots, I would not expect the model to have a strong understanding of both tags. A more balanced dataset will ensure the model can accurately predict every tag.
Diverse set of images
It is essential that the model is trained with images that vary in angle, lighting, background, etc. This will make the model much more accurate so it can determine which parts of the image are most important and which parts are just background or noise.
2. Upload and tag images
Now it is time to upload the training images to the model and tag them. Assuming you have already set up a new project in Custom Vision your screen will look like this:
Tags are what the model will use to identify the image. For each image, the model will determine how likely that tag applies to the image. You can add as many tags as you want, but keep in mind each tag should be sufficiently represented in your training images.
You have the option to add tags and then upload images, or you can upload images and tag them individually. For this demo I am going to create all the tags first. This will make it easy for me to upload images in bulk for each tag. On the left side of the screen is the option to add tags.
By clicking on the “+” button a window to add new tags will appear. I added heels, sneakers, and boots as my three tags for my model.
Next, upload the training images. By clicking “Add Images” I can upload pictures in bulk and assign them all one or more tags. Below you can see I added 50 pictures of heels and tagged them all as “Heels.” I repeated this process with 50 images of boots and 50 images of sneakers.
3. Train the Model
After the images are uploaded it is time to train the model. Just click “Train” in the top of the window and Custom Vision does all the work for you
4. Analyze and test the model
When training is complete, Custom Vision provides three metrics regarding the performance of your model: precision, recall, and AP. It is important to understand what each of these numbers mean:
- Precision is the percent of identified classifications that were correct. If my model identified 100 images to be heels and only 90 of those images were actually heels, it would have a precision of 90%.
- Recall is the percent of actual classifications that were correct. In other words, if my model identified 80 images to be heels but there were actually 100 total images of heels in the training data, it would have a recall of 80%.
- AP is an overall measure of the model’s performance. It averages the precision of the model over different probability thresholds.
What is the probability threshold? You can it find on the left side of your screen and it should automatically be set to 50%.
This means that when the model is 50% sure that a tag applies to an image it will classify the image with that tag. So, as you increase the probability threshold the model will tend to be more accurate as it will only classify when it is more confident in the tag. As a result, the precision of the model will be higher. The recall of the model will decrease because the model is strictly tagging photos it is most confident in, and many images will go undetected. On the other hand, decreasing the probability threshold will decrease the precision of your model, but the model will have a much higher recall as it classifies more photos.
Finally, it is time for the fun part which is testing the trained model. In the top of screen click “Quick Test”
This will direct you to a screen where you can upload images to test your model. To get accurate results, you must use images the model has never seen before, therefore you should not use any image from the training data.
First, I tested an image of heels, sneakers, and boots.
The Custom Vision model has identified each style of shoe very accurately. It can clearly see the difference between a pair of Christian Louboutin stilettos, New Balance running sneakers, and Steve Madden boots. What would happen if we were to present it with a shoe that looked like a combination of both boots and heels?
Boots with heels test:
This test is especially impressive because it was able to identify characteristics of both heels and boots in the image, however we never trained the model with a pair of shoes like this.
In just four simple steps I was able to create an accurate image classifier for different shoe styles using Azure’s Custom Vision service. A task that would have traditionally taken many hours of programming and a large amount of data was achieved using only 50 images for each style and one iteration of training. So, whether you’re classifying images to help you with your next online purchase or you have a new application in mind that would benefit from an image classifier, Custom Vision can help you reach your goal.