When the Google Pixel 2 came out it was extremely impressive. Despite only having the single rear camera, it beat out everything else on DxOMark and held the position for quite a while. When the Pixel 3 was announced, it also had only the single rear camera. And despite one or two issues with the phone overall, Google made some great improvements with it. Google has published a blog post on how they taught the Pixel 3 to predict depth for its Portrait Mode. Given that it only has a single camera, and not the dual cameras of other brands, it seems like it wouldn’t be possible, but the techniques used are pretty interesting. And involve a case that holds five separate cameras.

Essentially, the Pixel 3’s portrait mode is based on a lot of very AI-educated guesswork combined with phase detection autofocus (PDAF). It’s a little like the dual pixel autofocus in Canon DSLRs and mirrorless cameras. It uses the slight shift between the pixels to capture two slightly different views of the scene.

This was the system introduced in the Pixel 2 and Pixel 2 XL and it was very good, but it wasn’t perfect. And to fix some of the issues it had and overcome challenges with the PDAF system, Google created an AI to identify objects within the scene. But to teach this AI, they used this weird contraption.

The above case holds five Pixel 3 phones, and has a WiFi-based triggering system that lets the operator capture a scene simultaneously (within about 2 milliseconds) with all five cameras. They can then feed the AI several images that show the same moment in time from multiple angles to help determine foreground and background elements and “teach” it. The middle animated image above shows five such images. The shot on the right is a generated depth map, with the black pixels being areas of low confidence (basically, there wasn’t enough detail to track between the images). With the phone trained to identify not only objects but also their scale, and with the tiny difference in parallax between PDAF pixels, the Pixel 3 is much more accurate than its predecessor at understanding depth. The comparison below shows the more traditional stereo depth image vs the trained AI image.

What’s most impressive of all, though, is that this is all calculated inside a supercomputer that can fit in our pockets in fractions of a second. Head on over to the Google blog to find out more about how it works. [via Android Police]