Tesla unveiled the latest version of the Dojo supercomputer – it was so powerful that it turned off the power grid of the city of Palo Alto.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world. However, the custom-made Dojo uses chips and all the infrastructure designed specifically by Tesla engineers.
The purpose-built supercomputer is expected to enhance Tesla’s ability to train neural networks using video data, which is critical to its computer vision technology enabling autonomous driving.
The computer system was first presented at last year’s Tesla AI Day event. However, back then, the supercomputer only had the first chip and practice tiles, and the team continued to work on building a complete Dojo “Exapod” cabinet and cluster.
This year, we have already moved from a chip and a tile to a system tray and a full-fledged cabinet. Tesla has said it can replace six GPU units with a single Dojo tile, which the company claims costs less than one GPU unit. There are six such tiles on the tray:
One tray is claimed to be equivalent to “3-4 fully loaded supercomputer racks”. The company integrates its host interface directly into the system tray to create a large full host build:
Tesla can fit two of these system trays with nodes in the same Dojo chassis.
Here’s what the Dojo cabinet looks like, closed and open:
Tesla is currently still developing and testing the infrastructure needed to combine multiple cabinets to create the first Dojo Exapod.
The company had to develop its own powerful cooling and power system for Dojo cabinets. When testing the infrastructure at the beginning of the year, the local power grid was even turned off.
“Earlier this year, we began stress testing our power and cooling infrastructure, and we managed to increase its capacity by more than 2 MW before we shut down our substation and received a call from the city,” says Bill Chang, Tesla Dojo Principal Systems Engineer. .
And here is how the Tesla Dojo Exapod should look open and closed:
Tesla also released the key specifications of the Dojo Exapod: 1.1 EFLOP, 1.3TB SRAM, and 13TB high-bandwidth DRAM.
1 Exapod = 1.1 exaFLOPs of machine learning compute ? pic.twitter.com/jfSX2BFmye
— Tesla (@Tesla) October 1, 2022
According to the company’s schedule, the complete first cluster will be ready in the first quarter of 2023. There are plans to open seven Dojo Exapods in Palo Alto.
Musk says it makes sense to offer Dojo as a service the way Amazon AWS does, arguing that the service “will be available online, where companies can train their models faster and for less money.”
However, it’s Tesla that needs Dojo first and foremost to automatically tag training videos from its fleet and train its neural networks to build its own autonomous driving system.
Below is a recording of the Tesla AI Day presentation broadcast:
Also at the presentation, Elon Musk presented a prototype humanoid robot Optimus for $20,000 with a 2.3 kWh battery, Tesla SoC, as well as Wi-Fi and LTE modules.
Source: Electric