We’ve been using compute-intensive machine learning in our products for the past 15 years. We use it so much that we even designed an entirely new class of custom machine learning accelerator, the Tensor Processing Unit. Just how fast is the TPU, actually? Today, in conjunction with a TPU talk for a National Academy of Engineering meeting at the Computer History Museum in Silicon Valley, we’re releasing a study that shares new details on these custom chips, which have been running machine learning applications in our data centers since 2015. This first generation of TPUs targeted inference (the use of an already trained model, as opposed to the training phase of a model, which has somewhat different characteristics), and here are some of the results we’ve seen:
- On our production AI workloads that utilize neural network inference, the TPU is 15x to 30x faster than contemporary GPUs and CPUs.
- The TPU also achieves much better energy efficiency than conventional chips, achieving 30x to 80x improvement in TOPS/Watt measure (tera-operations [trillion or 1012 operations] of computation per Watt of energy consumed).
- The neural networks powering these applications require a surprisingly small amount of code: just 100 to 1500 lines. The code is based on TensorFlow, our popular open-source machine learning framework.
- More than 70 authors contributed to this report. It really does take a village to design, verify, implement and deploy the hardware and software of a system like this.