Hot Chips 2020 Live Blog: Intel’s Xe GPU Architecture (5:30pm PT)

AnandTech Live Blog: The newest updates are at the top. This page will auto-update, there’s no need to manually refresh your browser.

08:57PM EDT – Xe will spread across different nodes and manufacturing

08:56PM EDT – Shows XeHP can scale

08:56PM EDT – 4-tile can do ~42k GFLOP FP32

08:56PM EDT – 2 tile can to 21161 GFLOP FP32

08:55PM EDT – 1 tile can to 10.6 GFLOP FP32

08:55PM EDT – XeHP up to 4 tiles

08:54PM EDT – XeHP parts in the lab

08:54PM EDT – AV1 support

08:52PM EDT – Each subslice has one L1, and up to 16 MB L3

08:52PM EDT – 2xINT16 and INT32 rates, fast INT8 dot-product that accumulates into one INT32 result

08:52PM EDT – Pairs of EUs run in lockstep due to shared thread control

08:51PM EDT – software score boarding per EU

08:51PM EDT – Tiger Lake Xe has greater dynamic range

08:50PM EDT – Frequency is also 1.5x

08:49PM EDT – 96 EUs, 1536 32-bit ops/clock

08:49PM EDT – 1.5x larger GPU EUs with scaled assets

08:49PM EDT – Tiger Lake goal was to increase perf 2x in graphics

08:48PM EDT – Tiger Lake, SG1, and DG1 will all be XeLP

08:48PM EDT – XeLP is low power optimized

08:48PM EDT – XeHP with HBM2e

08:47PM EDT – Xe Link enables XeMF from GPU-to-GPU

08:47PM EDT – EMIB does XeMF

08:47PM EDT – Mutliple tiles work as separate GPUs or a single GPU

08:46PM EDT – Low level tile disaggregation

08:46PM EDT – Requires multiple dies

08:46PM EDT – Allows scaling to 1000 of EUs

08:45PM EDT – Lots of optional stuff here

08:45PM EDT – L3 and Rambo cache

08:45PM EDT – Xe Memory Fabric

08:45PM EDT – can distribute a stream across mutiple slices

08:44PM EDT – de-noise, de-interlace, tone mapping is all here

08:44PM EDT – Media processing can be scaled as well with media slices

08:44PM EDT – 8 INT/FP ports, 2 complex math

08:43PM EDT – Xe Execution Unit

08:43PM EDT – L1 scratch pad

08:43PM EDT – XeHPG that uses Ray Tracing in the lab today

08:43PM EDT – hardware blocks for ray tracing

08:42PM EDT – 16 EUs = 128 SIMD lanes

08:42PM EDT – Fixed function units (optional based on segment)

08:42PM EDT – Sub-slice has 16 EUs

08:42PM EDT – Slice size is adjustable

08:42PM EDT – Geometry has moved inside the slice and now distributed

08:41PM EDT – (Each compute slice is 96 EUs)

08:41PM EDT – programmable shaders

08:41PM EDT – Each slice has sub-slices

08:41PM EDT – 3D/Compute slice, media slice, memory fabric

08:41PM EDT – There is a high-level Xe architecture

08:40PM EDT – HPC is exascale

08:40PM EDT – HP is Datacenter and AI

08:40PM EDT – HPG is Mid-range and Enthusiast

08:40PM EDT – LP is integrated and entry

08:40PM EDT – Such as ray tracing, media, FP64 etc

08:40PM EDT – Going beyond just adding execution units – but optimizing each segment with individual requirements

08:39PM EDT – Optimized for different market requirements

08:39PM EDT – Xe will scale from LP to HPG, HP, HPC

08:39PM EDT – Required a lot of new design over Gen11

08:38PM EDT – Also PPA improvements

08:38PM EDT – Add new capabilities – matrix tensors, ray tracing, virtualization, etc

08:38PM EDT – Goals: increase SIMD lanes from 10s to 1000s

08:38PM EDT – Moving from Gen to Xe -> exascale for everyone

08:38PM EDT – Going forward in architecture than previously covered by integrated GPU

08:37PM EDT – David did the Intel Architecture Xe talk

08:37PM EDT – Intel’s Xe talk, by David Blythe