The Essential Guide to Computer Vision

Table of content

Will this guide be helpful to me?

Computer Vision and Visual Data

Introduction

Will this guide be helpful to me?

The Basics

Computer Vision and Visual Data

What is computer vision?

Computer vision is a form of artificial intelligence (AI) that trains machines to interpret and understand the visual world. Using visual data from the real world, machines can be taught to accurately identify and classify objects, and make a decision or take some action based on what they “see.”

We interact with computer vision applications and algorithms every day without even knowing it, every time we shop in a retail store, use a touchless delivery service, or bite into an apple that was produced and distributed by a farm that uses AI.

Some applications put the power of computer vision in our hands. When you use your smartphone to scan a retail receipt to, for example, get a reimbursement or a refund, optical character recognition (OCR) can be used to transcribe the text on the receipt to automatically approve or reject your request. The free app Seek, by iNaturalist, allows you to use computer vision to identify plants, animals, and insects - simply by pointing your device’s camera at the object of interest.

The U.S. National Aeronautics and Space Administration’s (NASA) free Globe Observer app invites you to make and submit your environmental observations about trees, clouds, mosquitoes, and land cover. NASA uses these images to enrich its satellite observations to help scientists who study Earth and our global environment.

What kind of data is used for computer vision?

Images, multi-frame images (i.e., video) and sensor data (i.e., satellite) can be labeled to train and refresh machine learning models for computer vision. The most common types of data used to train computer vision models are:

Two-dimensional (2-D) images and video (multi-frame) from cameras or other imaging technology, such as: SLR (single lens reflex) camera, thermal (infrared) camera, optical microscope, or hyperspectral imaging (HSI) device
Three-dimensional (3-D) images and video (multi-frame), including data from cameras, scanners, or other imaging technology, such as electron, ion, or scanning probe microscopes
Sensor data captured with remote technology, such as satellite, RADAR (Radio Detection and Ranging System), LiDAR (Light Detection and Ranging), or SAR (Synthetic Aperture Radar). A point cloud is an example of sensor data.

In supervised learning, data is annotated, or labeled, to teach the machine to recognize the objects it is designed to detect. In unsupervised learning, unlabeled data is used to find patterns in the data. There are hybrid machine learning models that allow you to use a combination of supervised and unsupervised learning.

How is data annotated for computer vision?

You can annotate images using data annotation tools you build yourself. Or, you can use commercially available, open source, or freeware tools. With computer vision, you’ll be working with a tremendous amount of data, so you’ll likely need a trained workforce to annotate the images.

Data annotation tools provide feature sets with various combinations of capabilities, which can be used by your workforce to annotate images or multi-frame images. Video can be annotated as a stream or frame by frame.

What annotation techniques are used in computer vision?

Annotating visual data for computer vision is called image annotation. Here are nine of the most common image annotation techniques for computer vision:

Bounding box - This is used to draw a box around a target object in visual data. Bounding boxes can be 2-D or 3-D.
Landmarking (key point annotation) - This is used to plot characteristics in the data, such as eyes and nose in an image used for facial recognition.
Wireframe - This is a more complex version of landmarking that is used to annotate geometric features, straight lines, and their intersections to assemble 3-D structures within a scene.

Frequently Asked Questions

In supervised learning, humans are in the loop. They annotate, or label, visual data that can be used to teach the machine to recognize, and sometimes track, the objects it is designed to detect. In unsupervised learning, unlabeled data is used to find patterns in the data.

Convolutional neural networks (CNNs or ConvNets) are commonly used in deep learning for computer vision, along with other algorithms. Common applications of computer vision include AgTech (or farmtech) to optimize food production and distribution, medical AI for detecting disease, device security, and autonomous vehicles.

One of the most cited references on this topic is a book, Computer Vision Algorithms and Applications, written by Richard Szeliski, based on his lectures at the University of Washington and Stanford University. While the first version is dated 2010, it provides an excellent resource for foundational knowledge about algorithms and applications of computer vision.

Read the full guide below, or download a PDF version of the guide you can reference later.

Table of content

Will this guide be helpful to me?

Computer Vision and Visual Data

Introduction

Will this guide be helpful to me?

The Basics

Computer Vision and Visual Data

What is computer vision?

What kind of data is used for computer vision?

How is data annotated for computer vision?

What annotation techniques are used in computer vision?

Let's Chat ...

Frequently Asked Questions

Read the full guide below, or download a PDF version of the guide you can reference later.

Table of content

Will this guide be helpful to me?

Computer Vision and Visual Data

Introduction

Will this guide be helpful to me?

The Basics

Computer Vision and Visual Data

What is computer vision?

What kind of data is used for computer vision?

How is data annotated for computer vision?

What annotation techniques are used in computer vision?

Let's Chat ...

Frequently Asked Questions

What is computer vision?

What are algorithms and applications of computer vision?