Stay informed and never miss an webAI™ update!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Written by David Stout
We know that an AI model is only as good as the data used to train it. The capabilities of webAI™ make this particularly true, so using a large number of data points with a mixture of poor-quality or poorly-labeled data points to train your model won’t help you achieve high-performance results. Building a good data set that can effectively train a model means collecting high-quality images and accurately labeling them; quantity should not be the focus.
Use Case
Understanding your use case is key to determining what modality of data you should use and how many representations are needed to build & train your model accurately.
A Classifier learns everything it sees in the images, including the background, and identifies the vector (image) according to the label you provide.
An Object Detector learns based on the area of interest that you identify with a bounding box.
When collecting & labeling data, it is also helpful to think about the context of the use case; think of realistic scenarios when capturing/curating your data.
Example: in retail, you may not want a person who sees a portion of a yellow box to decide that the box contains “Cheerios” until they’ve seen more of the box (e.g. the logo). Likewise, your model shouldn’t identify a portion of a yellow box as “Cheerios” either.
A complete data set will contain enough images to train and test your model effectively; webAI™ will then split your data into a training and validation set, typically with an 80/20 ratio (note this is customizable).
Remember that often issues arise from the dataset, which makes assessing a model very difficult. As such, you should consider the following best practices for accurate data labeling:
Building a model for a use case allows you to focus on a specific domain. High-quality images reflect the model's environment, including sensor/camera quality, pixel density, lighting conditions, etc. Good data will also allow some augmentations (rotation, darkening/brightening, etc.) to generalize a network better.
Remember, when training neural networks they resize images, so it is important to remember that your model is working with a ratio of an image. As a result, higher resolution doesn't necessarily dictate higher quality.