Samsung’s PCIe Gen 4 Enterprise SSDs Get Reliability & Performance Boost

Almost a year after outlining their first roadmap for PCIe 4.0 SSDs, Samsung’s first two models are in mass production: the PM1733 and PM1735 high-end datacenter SSDs. Details about these new models have been slow to come out, but Samsung is now talking about three major improvements they bring over earlier SSDs in addition to the raw performance increases enabled by PCIe 4.0. The list of improvements includes fail-in-place (FIP) technology to boost reliability of drives, SSD virtualization technology to guarantee consistent performance for VDI and similar use cases, as well as V-NAND machine learning technology to predict and verify characteristics of NAND cells.


Samsung’s fail-in-place (FIP) technology promises to allow the SSD to robustly handle hardware failures that would otherwise be fatal to the SSD, up to the failure of an entire NAND die. For the highest-capacity 30.72TB PM1733, the drive can keep running more or less normally even with the loss of any one of its 512 NAND flash dies. The drive will scan for corrupted or lost data, reconstruct it and relocate it to a still-working flash chip, and continue to operate with high throughput and QoS. In essence, this is like a RAID-5/6 array running in degraded mode instead of the whole array going offline. It’s still wise to eventually replace a SSD after it suffers such severe malfunction, but Samsung’s FIP technology means that replacement can be done at the operator’s convenience instead of the problem causing immediate downtime.

The addition of fail-in-place doesn’t change the fact that the PM1733 and PM1735 have write endurance ratings of 1 and 3 drive writes per day, respectively. The overall lifespan is still comparable to the previous generation of drives, but the chance of a premature death due to causes other than normal NAND wear has been greatly reduced.


Next up, Samsung has added virtualization technology to the PM1733 and PM1735 SSDs. Samsung has implemented the optional NVMe virtualization features based on Single-Root I/O Virtualization (SR-IOV), allowing a single NVMe SSD controller to provide numerous virtual controllers (up to 64 in the case of Samsung’s drives). Each virtual controller can be assigned to a different VM running on the host system, and provide storage to that VM with no CPU overhead—the same as if the entire drive had been assigned to a single VM with PCIe passthrough. Storage capacity on each SSD can be flexibly allocated to different namespaces that can in turn be attached to the relevant virtual controller.

Machine Learning

The third technology introduced by Samsung is V-NAND machine learning. The company does not disclose precise details about how they are making use of machine learning, but only says that it is used to predict and analyze characteristics of flash cells, including by detecting variations among circuit patterns. With 3D NAND, it is increasingly difficult to get by with one size fits all strategies for cell programming, reading and error correction. Even tracking the P/E cycles each block has been through isn’t enough; there can be significant variation between layers near the top and bottom of the 3D stack, and from one die to another. Samsung is hardly alone in turning to machine learning strategies to tackle these complexities. The new capability will ensure consistent performance and improved reliability of today’s drives powered by TLC V-NAND, but its importance will grow dramatically in the case of QLC V-NAND-based drives.

The first drives that can take advantage of the new features are already shipping to interested parties. The PM1733 and PM1735 are based on a common hardware platform. The PM1733 is rated for 1 DWPD and offers capacities up to 30.72 TB, while the PM1735 has more overprovisioning and lower usable capacities to reach 3 DWPD. Both models are available in either U.2 or PCIe add-in card form factors. The U.2 form factor gives a few more capacity options, while the add-in card versions have a PCIe 4.0 x8 interface to enable 25% higher sequential read performance (for other workloads, PCIe 4.0 x4 is fast enough to not be the bottleneck).


Related Reading:

Source: Samsung