Comupter Vision: What Computers See, Don’t See & Shouldn’t

18/08/2021

There’s a special indignity that comes with chasing an evasive mosquito through your house. That’s why one Israeli start-up has created what its founder dubs “the Iron Dome for mosquitos”.

When Bzigo’s ‘autonomous mosquito detection solution’ spots a mosquito, it uses a low energy laser to pinpoint the pest in situ and sends your phone an alert. While the current version leaves the mosquito destruction to you, the company is developing a model that will zap the pesky insects straight out of the sky!

And while this particular product has a certain ‘As Seen On TV’ novelty to it, the technology behind it, computer vision, also happens to be one of the most promising fields of AI.

Computer vision

Put simply, computer vision gives computers the gift of sight, enabling recognition of what is going on in a given piece of visual media.

Stanford’s recent Artificial Intelligence Index Report identifies computer vision as one of the most rapidly industrialising field of machine learning, with computer vision systems reaching human-level performance in recent years. This accelerated development can be attributed to the shift toward more autonomous, deep learning methods, as well as better (and cheaper) hardware.

Top-1 accuracy tests for how well an AI system can assign the correct label to an image, specifically whether its single most highly probable prediction (out of all possible labels) is the same as the target label. AI improved its performance from around 70% in 2014 to 90.2% in 2021.

While the term ‘computer vision’ may seem like a new entrant in the mainstream digital lexicon, it encompasses and underpins a wide array of more familiar technological advancements. From the facial recognition on our phones, to the object detection in most new cars, many examples of computer vision have become decidedly mainstream in the last decade. Below are three examples of computer vision in action:

Computer image generation

A subset of computer vision, computer image generation involves the creation of synthetic images that bear an uncanny resemblance to the “real” deal.

Deepfakes remain the most notable (or notorious) example of AI image generation, and generally involve the superimposing of a one face onto a different body. And while malicious uses of deepfakes for misinformation and the creation of (predominantly misogynistic) pornography capture most headlines, the Stanford report identifies many legitimate uses:

“Image generation systems have a variety of uses, ranging from augmenting search capabilities (it is easier to search for a specific image if you can generate other images like it) to serving as an aid for other generative uses (e.g., editing images, creating content for specific purposes, generating multiple variations of a single image to help designers brainstorm, and so on).”

To get a sense of progress, the Stanford Index complied the following evolution in the quality of synthetically generated images over time, with a computer generated photo (i.e. not a real human) each year from 2014 to 2020.

Image showing the changing faces AI generates.

Thankfully, AI has also become much better at recognising deepfakes generated by other AI systems. In one study, AI’s ability to detect ‘deep fakes’ improved 2-3 times over the course of the first quarter of 2021 alone. It is not clear whether ‘good’ AI or ‘bad’ AI will win this race.

What human activity is hardest for AI to recognise?

Activity recognition means how well AI can label and categorize human behaviours in videos. Progress is much slower than with still images, and AI still struggles ‘understanding’ videos of human activity that are only a few seconds long.

The hardest human activity for AI to recognise is drinking coffee, with a mean average precision in recognition of 6% in 2020, and with not much improvement since. AI is twice as better at recognising taking curlers out, running a marathon, shotput and throwing darts, but all still with a mean average precision of less than 15%.

AI in the real visual world

The day to day world in which we humans live presents AI with dynamic, complex, if not bewildering visual environments. Semantic segmentation is the task of classifying each pixel in an image to a particular label, such as person, cat, etc. Semantic segmentation is a basic input technology for self-driving cars (identifying and isolating objects on roads), image analysis, medical applications, and more.

A test of AI’s ability to handle street scapes is called ‘intersection over union’ (IoU): a higher IoU score is better. Between 2014 and 2020, the mean IoU increased by 35%, to 85.1%. Pretty good, but probably still short of the standard needed to let AI loose on our streets.

Imperfect vision

Far from a fleeting trend, Forbes reports that the computer vision market is expected to reach $48.6 billion by next year. But reporting for computer vision hasn’t been all mosquito-weapons and shiny valuations.

When AI suffers a fall from grace, it’s usually computer vision at the root of the controversy. It’s been almost 5 years since a computer vision algorithm charged with judging an international beauty contest turned out to have an undeniable bias toward light-skinned entrants. Though you may be more familiar with the time Google Photos inadvertently labelled BIPOC (Black, Indigenous, and People of Colour) people as primates. The threat posed by biased computer vision extends beyond offensive mislabelling too. For example, autonomous vehicles are more likely to be involved in a traffic accident with BIPOC pedestrians, as well as people with disabilities.

Given how these issues underscore the faults of computer vision, it’s ominous that the Stanford report shows a rapid commercialisation of facial recognition:

“Facial detection and recognition is one of the use-cases for AI that has a sizable commercial market and has generated significant interest from governments and militaries. Therefore, progress in this category gives us a sense of the rate of advancement in economically significant parts of AI development.”

Conclusion

As the Stanford report notes, ‘[i]n spite of the progress in performance…, current computer vision systems are still not perfect.’ It is certainly incumbent on those wishing to commercialise computer vision to properly identify, remove and remediate its blind spots. But there are circumstances where it’s not just a question of whether AI is seeing the full picture but whether AI should be relied upon to see at all.

While it may be neat to shoot laser beams at mosquitos, the most impressive developments in the future of computer vision will be those that can genuinely tackle its blind spots and potential for harm.

Read more: 2021 AI Index Report