Hot Chips 2020 Live Blog: Baidu Kunlun AI Processor (4:30pm PT)

AnandTech Live Blog: The newest updates are at the top. This page will auto-update, there’s no need to manually refresh your browser.

07:55PM EDT – Q: INT4 throughput as INT8? A: INT4 same as INT8, but INT4 and leverage more of the capabilities

07:54PM EDT – Q: hardware image/video decode? A: No

07:53PM EDT – Q&A time

07:52PM EDT – Available in Baidu Cloud

07:52PM EDT – Mask RCNN

07:51PM EDT – Mask inspection

07:51PM EDT – big edge = industrial

07:51PM EDT – These benchmarks are very odd

07:48PM EDT – 256 TOPs for 4096x4096x4096 GEMM INT8 inference

07:48PM EDT – XPU C/C++ for custom kernels

07:47PM EDT – supports PaddlePaddle, Tensorflow, pytorch

07:47PM EDT – Graph compiler

07:47PM EDT – (what are the tiny cores?)

07:46PM EDT – each unit has 16 MB on-chip memory

07:46PM EDT – Each cluster has 16 tiny cores

07:46PM EDT – XPU-Cluster does scalar and vector

07:46PM EDT – XPU-SDNN does tensor and vector


07:45PM EDT – Software defined neural network engine

07:45PM EDT – XPU cluster

07:45PM EDT – Same layout as XPUv1 shown in HotChips 2017

07:44PM EDT – Passive cooling

07:44PM EDT – 16 GB HBM

07:44PM EDT – 256TOPs for INT8

07:43PM EDT – PCIe card

07:43PM EDT – 150W / 256 TOPs

07:43PM EDT – PCIe 4.0 x8

07:43PM EDT – Interposer package, 2 HBM, 512 GB/s

07:43PM EDT – Samsung Foundry 14nm

07:43PM EDT – Now some detail

07:42PM EDT – (the presenter is a bit slow fyi)

07:39PM EDT – 256 TOPs in 2019

07:38PM EDT – Moved from FPGA to ASIC

07:38PM EDT – Need flexible, programmable, high performance

07:38PM EDT – Kunlun (Kun-loon)

07:36PM EDT – Design and implementation

07:36PM EDT – The challenge is the type of compute

07:36PM EDT – Try to explore market volume as much as possible

07:35PM EDT – High-end AI chips cost a lot to create

07:34PM EDT – Traditional AI computing is performed in Cloud, Datacenter, HPC, Smart Industry, Smart City

07:33PM EDT – All these systems are priority inside Baidu

07:33PM EDT – NLP = Neural Language Processing

07:33PM EDT – Need a processor to cover a diversified AI workflow

07:32PM EDT – Baidu and Samsung build the chip together

07:30PM EDT – We’ve heard of Baidu’s Kunlun a few months ago due to a press release from the company and Samsung stating that the silicon was making use of Interposer-Cube 2.5D packaging, as well as HBM2, and packing 260 TOPs into 150 W.

07:29PM EDT – Last session of Hot Chips is all about ML inference. Starting with Baidu, and its Kunlun AI processor