AnandTech Live Blog: The newest updates are at the top. This page will auto-update, there’s no need to manually refresh your browser.
08:57PM EDT – Xe will spread across different nodes and manufacturing
08:56PM EDT – Shows XeHP can scale
08:56PM EDT – 4-tile can do ~42k GFLOP FP32
08:56PM EDT – 2 tile can to 21161 GFLOP FP32
08:55PM EDT – 1 tile can to 10.6 GFLOP FP32
08:55PM EDT – XeHP up to 4 tiles
08:54PM EDT – XeHP parts in the lab
08:54PM EDT – AV1 support
08:52PM EDT – Each subslice has one L1, and up to 16 MB L3
08:52PM EDT – 2xINT16 and INT32 rates, fast INT8 dot-product that accumulates into one INT32 result
08:52PM EDT – Pairs of EUs run in lockstep due to shared thread control
08:51PM EDT – software score boarding per EU
08:51PM EDT – Tiger Lake Xe has greater dynamic range
08:50PM EDT – Frequency is also 1.5x
08:49PM EDT – 96 EUs, 1536 32-bit ops/clock
08:49PM EDT – 1.5x larger GPU EUs with scaled assets
08:49PM EDT – Tiger Lake goal was to increase perf 2x in graphics
08:48PM EDT – Tiger Lake, SG1, and DG1 will all be XeLP
08:48PM EDT – XeLP is low power optimized
08:48PM EDT – XeHP with HBM2e
08:47PM EDT – Xe Link enables XeMF from GPU-to-GPU
08:47PM EDT – EMIB does XeMF
08:47PM EDT – Mutliple tiles work as separate GPUs or a single GPU
08:46PM EDT – Low level tile disaggregation
08:46PM EDT – Requires multiple dies
08:46PM EDT – Allows scaling to 1000 of EUs
08:45PM EDT – Lots of optional stuff here
08:45PM EDT – L3 and Rambo cache
08:45PM EDT – Xe Memory Fabric
08:45PM EDT – can distribute a stream across mutiple slices
08:44PM EDT – de-noise, de-interlace, tone mapping is all here
08:44PM EDT – Media processing can be scaled as well with media slices
08:44PM EDT – 8 INT/FP ports, 2 complex math
08:43PM EDT – Xe Execution Unit
08:43PM EDT – L1 scratch pad
08:43PM EDT – XeHPG that uses Ray Tracing in the lab today
08:43PM EDT – hardware blocks for ray tracing
08:42PM EDT – 16 EUs = 128 SIMD lanes
08:42PM EDT – Fixed function units (optional based on segment)
08:42PM EDT – Sub-slice has 16 EUs
08:42PM EDT – Slice size is adjustable
08:42PM EDT – Geometry has moved inside the slice and now distributed
08:41PM EDT – (Each compute slice is 96 EUs)
08:41PM EDT – programmable shaders
08:41PM EDT – Each slice has sub-slices
08:41PM EDT – 3D/Compute slice, media slice, memory fabric
08:41PM EDT – There is a high-level Xe architecture
08:40PM EDT – HPC is exascale
08:40PM EDT – HP is Datacenter and AI
08:40PM EDT – HPG is Mid-range and Enthusiast
08:40PM EDT – LP is integrated and entry
08:40PM EDT – Such as ray tracing, media, FP64 etc
08:40PM EDT – Going beyond just adding execution units – but optimizing each segment with individual requirements
08:39PM EDT – Optimized for different market requirements
08:39PM EDT – Xe will scale from LP to HPG, HP, HPC
08:39PM EDT – Required a lot of new design over Gen11
08:38PM EDT – Also PPA improvements
08:38PM EDT – Add new capabilities – matrix tensors, ray tracing, virtualization, etc
08:38PM EDT – Goals: increase SIMD lanes from 10s to 1000s
08:38PM EDT – Moving from Gen to Xe -> exascale for everyone
08:38PM EDT – Going forward in architecture than previously covered by integrated GPU
08:37PM EDT – David did the Intel Architecture Xe talk
08:37PM EDT – Intel’s Xe talk, by David Blythe