Hot Chips 2020 Live Blog: Manticore 4096-core RISC-V (3:30pm PT)

06:56PM EDT – This is only a prototype small core of chiplet

06:56PM EDT – Forward Body Biasing

06:56PM EDT – 22nm FDX

06:56PM EDT – 9mm2 prototype made

06:55PM EDT – Close tracking of roofline model

06:54PM EDT – Up to 80 DP GFLOPs/W per cluster

06:53PM EDT – increased utilization for matmul and dotproduct that might be memory bound

06:52PM EDT – FREP acts as instruction amplifier

06:52PM EDT – IPC > 1

06:52PM EDT – single-issue core can saturate an FPU

06:50PM EDT – For example, reduction!

06:49PM EDT – FREP marks the loop

06:49PM EDT – SSRs only work on float-only hardware loops

06:48PM EDT – ‘Psuedo-dual issue’ as integer core can work at the same time

06:48PM EDT – custom instruction indicates start of hardware loop block

06:47PM EDT – XFREP – Floating Point Repetition Buffer (programmable micro-loop buffer)

06:47PM EDT – Latency tolerant approach

06:46PM EDT – Extension in the core register file

06:45PM EDT – increases FPU/ALU from 3x-5x

06:44PM EDT – Turn register read/writes into implicit memory load/stores

06:44PM EDT – XSSR – Stream semantic registers

06:44PM EDT – Async with DMA Engine

06:44PM EDT – Goal was to maximize compute/control die area ratio

06:42PM EDT – Custom ISA extensions

06:42PM EDT – supports half-precision bfloat, FP8

06:42PM EDT – Each core has a multi-format SIMD compute unit

06:42PM EDT – Each compute cluster has 8 RV32G Snitch cores

06:41PM EDT – Support a lot of cluster-to-cluster traffic

06:41PM EDT – Bandwidth thinning scheme to optimize bandwidth to HBM without affecting floorplan

06:41PM EDT – 4x L1 quadrants share an L1 cache

06:40PM EDT – Clusters can do 64 TB/s with each other

06:40PM EDT – Four quadrants of 32 clusters per chiplet

06:39PM EDT – 8 GB HBM2 per die private to that die

06:39PM EDT – die-to-die serial link to each other die

06:38PM EDT – Four chiplets

06:38PM EDT – (estimated in 22FDX GloFo)

06:38PM EDT – 220mm2 per chip

06:38PM EDT – Now for Manticore

06:38PM EDT – Maximise computer datapath with respect to control

06:37PM EDT – lots of CPUs burn power on superfluous elements of out-of-order

06:36PM EDT – Energy efficiency is critical

06:36PM EDT – Ever growing demand for compute

06:35PM EDT – Who wants all the RISC-V cores?!?