Discussion – Steam Deck 2 handheld ?

I think this is what I remembered:

Still leaves a lot of open questions. For one are NPUs deprecated nextgen in all Zen 6 + RDNA 5 based products or later. If it applies to mobile too how will AMD adress the concerns regarding time to first token (execution latency) and power efficiency, because getting an inferior solution in terms of battery life on a brand new product vs last gen is just unacceptable.
Are we talking about customizations to ML and core in RDNA 5 to effectively emulate an “NPU mode” to save on power? Could that perhaps be very fine-grained power gating, architectural changes to ML HW and even special modes of operation, and in general massive architectural changes to cachemem and data locality? Just a bit of spitballing.

This processing in cache patent should increase ML and RT performance sizeably: https://patents.google.com/patent/US20240264942A1
Really any branchy (PT for example) or mem hungry workload should benefit, as long as BW heavy instructions are offloaded to CCUs. Perhaps this is the Processing-in-Cache patent that was mentioned January?

Obviously no confirmation but might be reasonable to expect this is roughly how the NPU in Mediatek Dimensity 9500 SoC achieves CIM. IIRC CIM is touted as one of the big reasons for why the NPU is so power efficient.