Preloader

Computer vision trains machines to interpret and understand the visual world. In essence, it makes it possible for machines to “see,” bringing to life some of the world’s most innovative technology. Use cases that leverage visual data to train neural networks are growing – from smartphone applications that identify animals to contactless food delivery and precision farming. Computer vision holds great promise for organizations around the world to introduce innovative solutions and disrupt entire industries.

In computer vision, the opportunity and the challenge are the same. There is a vast amount of data available to use in developing computer vision models, with seemingly endless possibilities for creating visual AI. However, that massive amount of data must be accurately annotated or labeled, to be useful in supervised machine learning. The choices you make about the workforce that prepares your data for computer vision are critical ones that will affect the success of your project.

We’ve created this guide to be a handy reference about computer vision applications, data quality, and the workforce. Feel free to bookmark and revisit this page if you find it helpful.

Image
Read the full guide below, or download a PDF version of the guide you can reference later.

    In this guide, we’ll cover computer vision using supervised learning.

    First, we’ll explain computer vision in greater detail, introducing you to key terms and concepts. Next, we’ll explore common computer vision applications in the real world. We’ll also cover data quality, including the kind of data used to create computer vision and the importance of data quality for your models.

    Finally, we’ll share why decisions about your workforce choice may determine the success of your computer vision project. We’ll give you considerations for selecting the right workforce and share best practices for the workforce that prepares your data for machine learning.

    Introduction

    Will this guide be helpful to me?

    This guide will be helpful to you if:

    • You are determining if computer vision is the right technique to solve a problem, innovate a product, or provide a service.
    • You are getting started on a computer vision project and want to learn more about how data annotation quality can affect your AI model’s performance.
    • You want to learn about best practices for data quality when you are building computer vision models.

    The Basics

    Computer Vision and Visual Data

    What is computer vision?

    Computer vision is a form of artificial intelligence (AI) that trains machines to interpret and understand the visual world. Using visual data from the real world, machines can be taught to accurately identify and classify objects, and make a decision or take some action based on what they “see.”

    We interact with computer vision applications and algorithms every day without even knowing it, every time we shop in a retail store, use a touchless delivery service, or bite into an apple that was produced and distributed by a farm that uses AI.

    Some applications put the power of computer vision in our hands. When you use your smartphone to scan a retail receipt to, for example, get a reimbursement or a refund, optical character recognition (OCR) can be used to transcribe the text on the receipt to automatically approve or reject your request. The free app Seek, by iNaturalist, allows you to use computer vision to identify plants, animals, and insects - simply by pointing your device’s camera at the object of interest.

    The U.S. National Aeronautics and Space Administration’s (NASA) free Globe Observer app invites you to make and submit your environmental observations about trees, clouds, mosquitoes, and land cover. NASA uses these images to enrich its satellite observations to help scientists who study Earth and our global environment.

    What kind of data is used for computer vision?

    Images, multi-frame images (i.e., video) and sensor data (i.e., satellite) can be labeled to train and refresh machine learning models for computer vision. The most common types of data used to train computer vision models are:

    • Two-dimensional (2-D) images and video (multi-frame) from cameras or other imaging technology, such as: SLR (single lens reflex) camera, thermal (infrared) camera, optical microscope, or hyperspectral imaging (HSI) device
    • Three-dimensional (3-D) images and video (multi-frame), including data from cameras, scanners, or other imaging technology, such as electron, ion, or scanning probe microscopes
    • Sensor data captured with remote technology, such as satellite, RADAR (Radio Detection and Ranging System), LiDAR (Light Detection and Ranging), or SAR (Synthetic Aperture Radar). A point cloud is an example of sensor data.

    In supervised learning, data is annotated, or labeled, to teach the machine to recognize the objects it is designed to detect. In unsupervised learning, unlabeled data is used to find patterns in the data. There are hybrid machine learning models that allow you to use a combination of supervised and unsupervised learning.

    How is data annotated for computer vision?

    You can annotate images using data annotation tools you build yourself. Or, you can use commercially available, open source, or freeware tools. With computer vision, you’ll be working with a tremendous amount of data, so you’ll likely need a trained workforce to annotate the images.

    Data annotation tools provide feature sets with various combinations of capabilities, which can be used by your workforce to annotate images or multi-frame images. Video can be annotated as a stream or frame by frame.

    What annotation techniques are used in computer vision?

    Annotating visual data for computer vision is called image annotation. Here are nine of the most common image annotation techniques for computer vision:

    1. Bounding box - This is used to draw a box around a target object in visual data. Bounding boxes can be 2-D or 3-D.
    2. Landmarking (key point annotation) - This is used to plot characteristics in the data, such as eyes and nose in an image used for facial recognition.
    3. Wireframe - This is a more complex version of landmarking that is used to annotate geometric features, straight lines, and their intersections to assemble 3-D structures within a scene.

     

    his image depicts a keypoint schema of straight lines, occlusions, and intersections for identifying a basketball player's position within space. Source: CloudFactory using its internal data annotation tool.

    1. Masking - This applies semantic or instance segmentation to conceal areas in an image and reveal other areas of interest. Image masking makes it easier to focus on certain areas of an image over other areas.
    2. 3-D cuboids - This refers to the use of 3-D bounding boxes to annotate and/or measure many points on an external surface of an object. These typically are generated using 3-D laser scanners, RADAR sensors, and LiDAR sensors.

    Let's Chat ...

    Our team is happy to offer advice and answer your questions about Generative AI, NLP & Data Labeling Solutions

      Frequently Asked Questions

      Computer vision is a form of artificial intelligence (AI) that trains machines to interpret and understand the visual world. Using visual data from the real world, machines can be taught to accurately identify and classify objects, and make a decision or take some action based on what they “see.”

      In supervised learning, humans are in the loop. They annotate, or label, visual data that can be used to teach the machine to recognize, and sometimes track, the objects it is designed to detect. In unsupervised learning, unlabeled data is used to find patterns in the data.

      Convolutional neural networks (CNNs or ConvNets) are commonly used in deep learning for computer vision, along with other algorithms. Common applications of computer vision include AgTech (or farmtech) to optimize food production and distribution, medical AI for detecting disease, device security, and autonomous vehicles.

      One of the most cited references on this topic is a book, Computer Vision Algorithms and Applications, written by Richard Szeliski, based on his lectures at the University of Washington and Stanford University. While the first version is dated 2010, it provides an excellent resource for foundational knowledge about algorithms and applications of computer vision.