Computer vision is fundamental for a broad set of Internet of Things (IoT) applications. Household monitoring systems use cameras to provide family members with a view of what’s going on at home. Robots and drones use vision processing to map their environment and avoid obstacles in flight. Augmented reality glasses use computer vision to overlay important information on the user’s view, and cars stitch images from multiple cameras mounted in the vehicle to provide drivers with a surround or “bird’s eye” view which helps prevent collisions. The list goes on.
Over the years, exponential improvements in device capabilities including computing power, memory capacity, power consumption, image sensor resolution, and optics have improved the performance and cost-effectiveness of computer vision in IoT applications. This has been accompanied by the development and refinement of sophisticated software algorithms for tasks such as face detection and recognition, object detection and classification, and simultaneous localization and mapping.
The rise and challenges of machine learning
More recently, advancements in artificial intelligence (AI) – particularly in deep learning – have further accelerated the proliferation of vision-based applications in the IoT. Compared to traditional computer vision techniques, deep learning provides IoT developers with greater accuracy in tasks such as object classification. Since neural networks used in deep learning are “trained” rather than “programmed,” applications using this approach are often easier to develop and take better advantage of the enormous amount of imaging and video data available in today’s systems. Deep learning also provides superior versatility because neural network research and frameworks can be re-utilized across a larger variety of use cases compared to computer vision algorithms, which tend to be more purpose-specific.
But the benefits delivered by deep learning don’t come without trade-offs and challenges. Deep learning requires an enormous amount of computing resources, for both training and inferencing stages. Recent research shows a tight relationship between the compute power required for different deep learning models and their accuracy in deep learning techniques. Going from 75% to 80% accuracy in a vision-based application could require nothing less than billions of additional math operations.
Vision processing results using deep learning are also dependent on image resolution. Achieving adequate performance in object classification, for example, requires high resolution images or video – with the consequent increase in the amount of data that needs to be processed, stored, and transferred. Image resolution is especially important for applications in which it is necessary to detect and classify objects in the distance – for instance, enterprise security cameras.
Mixing computer vision with machine learning for better performance
There are clear compromises between traditional computer vision and deep learning-based approaches. Classic computer vision algorithms are mature, proven, and optimized for performance and power efficiency, while deep learning offers greater accuracy and versatility – but demands large amounts of computing resources.
Those looking to implement high performance systems quickly are finding that hybrid approaches, which combine traditional computer vision and deep learning, can offer the best of both worlds. For example, in a security camera, a computer vision algorithm can efficiently detect faces or moving objects in the scene. Then, a smaller segment of the image where the face or object was detected is processed through deep learning for identity verification or object classification – saving significant computing resources compared to using deep learning over the entire scene, on every frame.
At the Embedded Vision Europe conference in October, I presented a hybrid vision processing implementation from Qualcomm Technologies, which combines computer vision and deep learning. The hybrid approach delivers a 130X-1,000X reduction in multiply-accumulate operations and about 10X improvement in frame rates compared to a pure deep learning solution. Furthermore, the hybrid implementation uses about half of the memory bandwidth and requires significantly lower CPU resources. This is a significant performance advantage for manufacturers and developers choosing to implement this strategy.
Making best use of edge computing
Just like using pure deep learning, hybrid approaches for vision processing take great advantage of a heterogeneous computing capabilities available at the edge. A heterogeneous compute architecture helps improve vision processing performance and power efficiency, assigning different workloads to the most efficient compute engine. Test implementations show 10x latency reductions in object detection when deep learning inferences are executed on a DSP versus a CPU.
Running algorithms and neural network inferences on the IoT device itself also helps lower latency and bandwidth requirements compared to cloud-based implementations. Edge computing can also reduce costs, by reducing cloud storage and processing requirements –while protecting user privacy and security by avoiding transmission of sensitive or identifiable data over the network.
Deep learning innovations are driving exciting breakthroughs for the IoT, as well as hybrid techniques that combine the technologies with traditional algorithms. Vision processing is just a start, as the same principles can be applied to other areas such as audio analytics. As devices on the edge get smarter and more capable, innovators can start building products and applications never possible before. These are truly exciting times for the IoT.
This article is published as part of the IDG Contributor Network. Want to Join?