
With NVIDIA’s fall GTC occasion in full swing, the corporate touched upon the majority of its core enterprise in a technique or one other on this morning’s keynote. On the enterprise facet of issues, one of many longest-awaited updates was the cargo standing of NVIDIA’s H100 “Hopper” accelerator, which at introduction was slated to land in Q3 of this yr. Because it seems, with Q3 already almost over H100 isn’t going to make its Q3 availability date. However, based on NVIDIA the accelerator is in full manufacturing, and the primary techniques might be delivery from OEMs in October.
First revealed again in March at NVIDIA’s annual spring GTC occasion, the H100 is NVIDIA’s next-generation high performance accelerator for servers, hyperscalers, and related markets. Primarily based on the Hopper structure and constructed on TSMC’s 4nm “4N” course of, H100 is the follow-up to NVIDIA’s very profitable A100 accelerator. Amongst different adjustments, the latest accelerator from the corporate implements HBM3 reminiscence, assist for transformer fashions inside its tensor cores, assist for dynamic programming, an up to date model of multi-instance GPU with extra sturdy isolation, and a complete lot extra computational throughput for each vector and tensor datatypes. Primarily based round NVIDIA’s hefty 80 billion transistor GH100 GPU, the H100 accelerator can also be pushing the envelope when it comes to energy consumption, with a most TDP of 700 Watts.
Provided that NVIDIA’s spring GTC occasion didn’t exactly align with their manufacturing window for this technology, the H100 announcement earlier this yr acknowledged that NVIDIA can be delivery the primary H100 techniques in Q3. Nevertheless, NVIDIA’s up to date supply objectives outlined immediately imply that the Q3 date has slipped. The excellent news is that H100 is in “full manufacturing”, as NVIDIA phrases it. The dangerous information is that it might appear that manufacturing and integration didn’t begin fairly on time; at this level the corporate doesn’t count on the primary manufacturing techniques to succeed in prospects till October, the beginning of This autumn.
Throwing an extra spanner into issues, the order through which techniques and merchandise are rolling out is basically being reversed from NVIDIA’s common technique. Relatively than beginning with techniques primarily based on their highest-performance SXM type issue elements first, NVIDIA’s companions are as a substitute beginning with the decrease performing PCIe playing cards. That’s to say that the primary techniques delivery in October might be utilizing the PCIe playing cards, and it’ll solely be later within the yr that NVIDIA’s companions ship techniques that combine the quicker SXM playing cards and their HGX service board.
NVIDIA Accelerator Specification Comparability | ||||||
H100 SXM | H100 PCIe | A100 SXM | A100 PCIe | |||
FP32 CUDA Cores | 16896 | 14592 | 6912 | 6912 | ||
Tensor Cores | 528 | 456 | 432 | 432 | ||
Enhance Clock | ~1.78GHz (Not Finalized) |
~1.64GHz (Not Finalized) |
1.41GHz | 1.41GHz | ||
Reminiscence Clock | 4.8Gbps HBM3 | 3.2Gbps HBM2e | 3.2Gbps HBM2e | 3.0Gbps HBM2e | ||
Reminiscence Bus Width | 5120-bit | 5120-bit | 5120-bit | 5120-bit | ||
Reminiscence Bandwidth | 3TB/sec | 2TB/sec | 2TB/sec | 2TB/sec | ||
VRAM | 80GB | 80GB | 80GB | 80GB | ||
FP32 Vector | 60 TFLOPS | 48 TFLOPS | 19.5 TFLOPS | 19.5 TFLOPS | ||
FP64 Vector | 30 TFLOPS | 24 TFLOPS | 9.7 TFLOPS (1/2 FP32 fee) |
9.7 TFLOPS (1/2 FP32 fee) |
||
INT8 Tensor | 2000 TOPS | 1600 TOPS | 624 TOPS | 624 TOPS | ||
FP16 Tensor | 1000 TFLOPS | 800 TFLOPS | 312 TFLOPS | 312 TFLOPS | ||
TF32 Tensor | 500 TFLOPS | 400 TFLOPS | 156 TFLOPS | 156 TFLOPS | ||
FP64 Tensor | 60 TFLOPS | 48 TFLOPS | 19.5 TFLOPS | 19.5 TFLOPS | ||
Interconnect | NVLink 4 18 Hyperlinks (900GB/sec) |
NVLink 4 (600GB/sec) |
NVLink 3 12 Hyperlinks (600GB/sec) |
NVLink 3 12 Hyperlinks (600GB/sec) |
||
GPU | GH100 (814mm2) |
GH100 (814mm2) |
GA100 (826mm2) |
GA100 (826mm2) |
||
Transistor Depend | 80B | 80B | 54.2B | 54.2B | ||
TDP | 700W | 350W | 400W | 300W | ||
Manufacturing Course of | TSMC 4N | TSMC 4N | TSMC 7N | TSMC 7N | ||
Interface | SXM5 | PCIe 5.0 (Twin Slot) |
SXM4 | PCIe 4.0 (Twin Slot) |
||
Structure | Hopper | Hopper | Ampere | Ampere |
In the meantime, NVIDIA’s flagship DGX techniques, that are primarily based on their HGX platform and are sometimes among the many very first techniques to ship, are actually going to be among the many final. NVIDIA is opening pre-orders for DGX H100 techniques immediately, with supply slated for Q1 of 2023 – 4 to 7 months from now. That is excellent news for NVIDIA’s server companions, who within the final couple of generations have needed to wait to go after NVIDIA, nevertheless it additionally implies that H100 as a product won’t be able to place its greatest foot ahead when it begins delivery in techniques subsequent month.
In a pre-briefing with the press, NVIDIA didn’t supply an in depth clarification as to why H100 has ended up delayed. Although talking at a excessive stage, firm representatives did state that the delay was not for element causes. In the meantime, the corporate cited the relative simplicity of the PCIe playing cards given that PCIe techniques are delivery first; these are largely plug-and-play inside generic PCIe infrastructure, whereas the H100 HGX/SXM techniques had been extra complicated and took longer to complete.
There are some notable function variations between the 2 type elements, as properly. The SXM model is the one one which makes use of HBM3 reminiscence (PCIe makes use of HBM2e), and the PCIe model requires fewer working SMs (114 vs. 132). So there may be some wiggle room right here for NVIDIA to cover early yield points, if certainly that is even an element.
Complicating issues for NVIDIA, the CPU facet of DGX H100 relies on Intel’s repeatedly delayed 4th technology Xeon Scalable processors (Sapphire Rapids), which in the meanwhile nonetheless would not have a launch knowledge fully nailed down. Much less optimistic projections have It launching in Q1, which does align with NVIDIA’s personal launch date – although this will very properly simply be coincidence. Both manner, the dearth of basic availability for Sapphire Rapids isn’t doing NVIDIA any favors right here.
In the end, with NVIDIA unable to ship DGX till subsequent yr, NVIDIA’s server companions aren’t solely going to beat them to the punch with PCIe-based techniques, however they would be the first out the door with HGX-based techniques as properly. Presumably these preliminary techniques might be utilizing current-generation hosts, or presumably AMD’s Genoa platform if it’s prepared in time. Among the many corporations slated to ship H100 techniques are the standard suspects, together with Supermicro, Dell, HPE, Gigabyte, Fujitsu, Cisco, and Atos.
In the meantime, for purchasers who’re desirous to check out H100 earlier than they purchase any {hardware}, H100 is now accessible on NVIDIA’s LaunchPad service.
Lastly, whereas we’re with reference to H100, NVIDIA can also be utilizing this week’s GTC to announce an replace to licensing for his or her NVIDIA AI Enterprise software program stack. H100 now comes with a 5-year license for the software program, which is notable since a 5 yr subscription is often $8000 per CPU socket.