It’s safe to say the Internet of Things (IoT) era has arrived, as we live in a world where things are being connected at pace never seen before. Cars, video cameras, parking meters, building facilities and anything else one can think of are being connected to the internet, generating massive quantities of data.
The question is how does one interpret the data and understand what it means? Clearly trying to process this much data manually doesn’t work, which is why most of the web-scale companies have embraced artificial intelligence (AI) as a way to create new services that can leverage the data. This includes speech recognition, natural language processing, real-time translation, predictive services and contextual recommendations. Every major cloud provider and many large enterprises have AI initiatives underway.
+ Also on Network World:Nvidia GPU-powered autonomous car teaches itself to see and steer+
However, many data centers aren’t outfitted with enough processing power for AI inferencing. For those not familiar with the different phases of AI, training is teaching the AI new capabilities from an existing set of data. Inferencing is applying that learning to new data sets. Facebook’s image recognition and Amazon’s recommendation engine are both good examples of inferencing.
This week at its GPU Technology Conference (GTC) in China, Nvidia announced TensorRT 3, which promises to improve the performance and cut the cost of inferencing. TensorRT 3 takes very complex networks and optimizes and compiles them to get the best possible performance for AI inferencing. The below graphic shows that it acts as AI “middleware” so the data can be run through any framework and sent to any GPU. Recall this post where I explained why GPUs were much better for AI applications than CPUs. Nvidia has a wide range of GPUs, depending on the type of application and processing power required.
Unlike other GPU vendors, Nvidia’s approach isn’t just great silicon. Instead it takes an architectural approach where it combines software, development tools and hardware as an end-to-end solution.
During his keynote, CEO Jensen Huang showed some stats where TensorRT 3 running on Nvidia GPUs offered performance that was 150x better than CPU-based systems for translation and 40x better for images, which will save its customer huge amounts of money and offer a better quality of service. I have no way of proving or disproving those numbers, but I suspect they’re accurate because no other vendor has the combination of a high-performance compiler, run-time engine and GPU optimized to work together.
Other Nvidia announcements
- DeepStream SDK introduced. It delivers low-latency video analytics in real time. Video inferencing has become a key part of smart cities but is being used in entertainment, retail and other industries as well.
- An upgrade to CUDA, Nvidia’s accelerated computing software platform. Version 9 is now optimized for the new Tesla V100 GPU accelerators, which is the highest-end GPU and ideal for AI, HPC and graphically intense applications such as virtual reality.
- Huawei, Inspur and Lenovo using Nvidia’s HGX reference architecture to offer Volta-based systems. The server manufacturers will be granted early access to HGX architectures for data centers and design guidelines. The HGX architecture is the same one used by Microsoft and Facebook today, meaning Asia-Pac-based organizations can have access to the same GPU-based servers as the leading web-scale cloud providers.
The world is changing quickly, and it’s my belief that market leaders will be defined by the organizations that have the most data and the technologies to interpret that data. Core to that is GPU-based machine learning and AI, as these systems can do things far faster than people.