Kingston DataTraveler Max UFD Review: NVMe Performance in a USB Thumb Drive

Rapid advancements in flash technology and continued improvements in high-speed interfaces have driven the growth of small, bus-powered portable SSDs. Over the last few years, these types of drives have relied on a dual-chip solution – typically placing a SATA or NVMe SSD behind a USB bridge chip. SSD controller vendors such as Phison and Silicon Motion have recognized the growth potential in the portable SSD market and come out with USB Flash Drive (UFD) controllers employing high-speed direct-attach interfaces on the upstream side, and directly talking to the flash packages downstream. These controllers have now created a new category of portable SSDs by lowering the cost without sacrificing performance.

Kingston’s DataTraveler Max was introduced in August 2021 as a USB-C flash drive capable of hitting 1GBps speeds. The claimed performance numbers justify calling the thumb drive as a portable SSD. While Kingston did not publicly disclose the internals of the drive, the form-factor and performance numbers point to the use of a native UFD controller. Kingston is not the first to the market with such a high-performance portable SSD. Crucial’s X6 (updated in 2021 with Phison’s U17 UFD controller) reaches speeds of 800MBps+, but it retains the industrial design of the older version (which was a SATA drive behind a USB – SATA bridge).

To that end, today we’re digging into the 1TB version of the DataTraveler Max (referred to here on as DT Max), which Kingston has provided. We’ll be taking a look at the performance, power efficiency, and value proposition of the DT Max. We’ve also cracked the drive open in order to confirm which UFD controller Kingston is using.

External bus-powered storage devices have grown both in storage capacity as well as speeds over the last decade. Thanks to rapid advancements in flash technology (including the advent of 3D NAND and NVMe) as well as faster host interfaces (such as Thunderbolt 3 and USB 3.2 Gen 2×2), we now have palm-sized flash-based storage devices capable of delivering 2GBps+ speeds.

The thumb drive form factor is attractive for multiple reasons – there is no separate cable to carry around, and the casing can be designed to include a keyring loop for portability. Vendors such as Corsair and Mushkin briefly experimented with SATA SSDs behind a USB bridge chip, but the thermal solution and size made the UFDs slightly unwieldy. While the weight was fine for a Type-A male connector, putting such drives behind a Type-C connector would have required extensive redesign. The introduction of high-performance native UFD controllers has made this category viable again.

Kingston’s DT Max retains the traditional DataTraveler thumb drive form-factor. However, it takes full advantage of the USB 3.2 Gen 2 Type-C male connector by promising 1GBps speeds. Available in three capacities – 256GB, 512GB, and 1TB, Kingston says that they can deliver those high speeds across all three SKUs.

The industrial design is slightly different from other DataTraveler UFDs. The Type-C male connector is protected by a sliding cap. Retracting and covering it again can be done with a single hand. There is also a blue LED indicator and a keyring loop at the end. The thumb drive measures 82.6 mm x 22.3 mm x 9.5 mm and tips the scales at around 12.5 grams.

Gallery: Kingston DataTraveler Max UFD Case Design and Teardown

Tearing down the UFD is a simple matter of popping off the sliding cap and prying out the internal cover. There are no screws in the drive. The bare board inside has no special thermal solution. We see Silicon Motion’s SM2320 UFD controller here, but the package seems to be different from the one we saw in the SM2320 USB 3.2 Gen 2×2 reference design reviewed earlier this month.


Kingston XS2000 (SM2320 Reference Design) [top] vs. Kingston DT Max (SM2321?) [bottom]

For comparison purposes, we have only limited 1TB results with our new test suite and testbed. Hence, we only present metrics from two USB 3.2 Gen 2 NVMe bridges from Akasa – one using ASMedia’s ASM2362, and another using Realtek’s RTL9210B.

CrystalDiskInfo provides a quick overview of the capabilities of the internal storage device. Since the program handles each bridge chip / controller differently, and the SM2320 is quite new, many of the entries are marked as vendor-specific, and some of the capabilities (such as the interface) are deciphered incorrectly. The temperature monitoring worked well, though.

S.M.A.R.T Passthrough – CrystalDiskInfo

The table below presents a comparative view of the specifications of the different storage bridges presented in this review.

Comparative Direct-Attached Storage Devices Configuration
Aspect
Downstream Port Native Flash 1x PCIe 3.0 x2 (M.2 NVMe)
Upstream Port USB 3.2 Gen 2 Type-C (Male) USB 3.2 Gen 2 Type-C
Bridge Chip Silicon Motion SM2320 ASMedia ASM2362
Power Bus Powered Bus Powered
     
Use Case 1GBps-class, compact USB thumb drive with retractable cover for Type-C connector M.2 2230 / 2242 / 2260 / 2280 NVMe SSD aluminum enclosure
DIY 1GBps-class portable SSD with a USB flash drive-like form-factor
     
Physical Dimensions 82.6 mm x 22.3 mm x 9.5 mm 125 mm x 32 mm x 10.8 mm
Weight 12.5 grams 52 grams (without SSD)
Cable N/A 30 cm USB 3.2 Gen 2 Type-C to Type-C
     
S.M.A.R.T Passthrough Yes Yes
UASP Support Yes Yes
TRIM Passthrough Yes Yes
Hardware Encryption Not Available SSD-dependent
     
Evaluated Storage Micron 96L 3D TLC SK hynix P31 PCIe 3.0 x4 NVMe SSD
SK hynix 128L 3D TLC
     
Price USD 180 GBP 60 (Scan)
Review Link Kingston DT Max 1TB Review Akasa AK-ENU3M2-03 Review

Prior to looking at the benchmark numbers, power consumption, and thermal solution effectiveness, a description of the testbed setup and evaluation methodology is provided.

Testbed Setup and Evaluation Methodology

Direct-attached storage devices (including thumb drives) are evaluated using the Quartz Canyon NUC (essentially, the Xeon / ECC version of the Ghost Canyon NUC) configured with 2x 16GB DDR4-2667 ECC SODIMMs and a PCIe 3.0 x4 NVMe SSD – the IM2P33E8 1TB from ADATA.

The most attractive aspect of the Quartz Canyon NUC is the presence of two PCIe slots (electrically, x16 and x4) for add-in cards. In the absence of a discrete GPU – for which there is no need in a DAS testbed – both slots are available. In fact, we also added a spare SanDisk Extreme PRO M.2 NVMe SSD to the CPU direct-attached M.2 22110 slot in the baseboard in order to avoid DMI bottlenecks when evaluating Thunderbolt 3 devices. This still allows for two add-in cards operating at x8 (x16 electrical) and x4 (x4 electrical). Since the Quartz Canyon NUC doesn’t have a native USB 3.2 Gen 2×2 port, Silverstone’s SST-ECU06 add-in card was installed in the x4 slot. All non-Thunderbolt devices are tested using the Type-C port enabled by the SST-ECU06.

The specifications of the testbed are summarized in the table below:

The 2021 AnandTech DAS Testbed Configuration
System Intel Quartz Canyon NUC9vXQNX
CPU Intel Xeon E-2286M
Memory ADATA Industrial AD4B3200716G22
32 GB (2x 16GB)
DDR4-3200 ECC @ 22-22-22-52
OS Drive ADATA Industrial IM2P33E8 NVMe 1TB
Secondary Drive SanDisk Extreme PRO M.2 NVMe 3D SSD 1TB
Add-on Card SilverStone Tek SST-ECU06 USB 3.2 Gen 2×2 Type-C Host
OS Windows 10 Enterprise x64 (21H1)
Thanks to ADATA, Intel, and SilverStone Tek for the build components

The testbed hardware is only one segment of the evaluation. Over the last few years, the typical direct-attached storage workloads for memory cards have also evolved. High bit-rate 4K videos at 60fps have become quite common, and 8K videos are starting to make an appearance. Game install sizes have also grown steadily even in portable game consoles, thanks to high resolution textures and artwork. Keeping these in mind, our evaluation scheme for portable SSDs and UFDs involves multiple workloads which are described in detail in the corresponding sections.

  • Synthetic workloads using CrystalDiskMark and ATTO
  • Real-world access traces using PCMark 10’s storage benchmark
  • Custom robocopy workloads reflective of typical DAS usage
  • Sequential write stress test

In the next section, we have an overview of the performance of the Kingston DT Max in these benchmarks. Prior to providing concluding remarks, we have some observations on the UFD’s power consumption numbers and thermal solution also.

Benchmarks such as ATTO and CrystalDiskMark help provide a quick look at the performance of the direct-attached storage device. The results translate to the instantaneous performance numbers that consumers can expect for specific workloads, but do not account for changes in behavior when the unit is subject to long-term conditioning and/or thermal throttling. Yet another use of these synthetic benchmarks is the ability to gather information regarding support for specific storage device features that affect performance.

Synthetic Benchmark – ATTO

Kingston claims read and write speeds of 1000 MBps and 900 MBps respectively, and these are backed up by the ATTO benchmarks provided below – in fact, the numbers are actually higher than the claimed ones. ATTO benchmarking is restricted to a single configuration in terms of queue depth, and is only representative of a small sub-set of real-world workloads. It does allow the visualization of change in transfer rates as the I/O size changes, with optimal performance being reached around 512 KB for a queue depth of 4. The performance is slightly behind the bridge solutions in terms of raw numbers.

ATTO Benchmarks
TOP: BOTTOM:

Synthetic Benchmark – CrystalDiskMark

CrystalDiskMark. for example, uses four different access traces for reads and writes over a configurable region size. Two of the traces are sequential accesses, while two are 4K random accesses. Internally, CrystalDiskMark uses the Microsoft DiskSpd storage testing tool. The ‘Seq128K Q32T1’ sequential traces use 128K block size with a queue depth of 32 from a single thread, while the ‘4K Q32T16’ one does random 4K accesses with the same queue configuration, but from multiple threads. The ‘Seq1M’ traces use a 1MiB block size. The plain ‘Rnd4K’ one uses only a single queue and single thread . Comparing the ‘4K Q32T16’ and ‘4K Q1T1’ numbers can quickly tell us whether the storage device supports NCQ (native command queuing) / UASP (USB-attached SCSI protocol). If the numbers for the two access traces are in the same ballpark, NCQ / UASP is not supported. This assumes that the host port / drivers on the PC support UASP.

CrystalDiskMark Benchmarks
TOP: BOTTOM:

The sequential workloads’ numbers are essentially the same for both the UFD and bridge solutions. However, the benefits of a real SSD controller are evident in the high queue-depth random access performance, where the DT Max’s performance is cut in half compared to the ASMedia bridge solution.

AnandTech DAS Suite – Benchmarking for Performance Consistency

Our testing methodology for storage bridges / direct-attached storage units takes into consideration the usual use-case for such devices. The most common usage scenario is transfer of large amounts of photos and videos to and from the unit. Other usage scenarios include the use of the unit as a download or install location for games and importing files directly from it into a multimedia editing program such as Adobe Photoshop. Some users may even opt to boot an OS off an external storage device.

The AnandTech DAS Suite tackles the first use-case. The evaluation involves processing five different workloads:

  • AV: Multimedia content with audio and video files totalling 24.03 GB over 1263 files in 109 sub-folders
  • Home: Photos and document files totalling 18.86 GB over 7627 files in 382 sub-folders
  • BR: Blu-ray folder structure totalling 23.09 GB over 111 files in 10 sub-folders
  • ISOs: OS installation files (ISOs) totalling 28.61 GB over 4 files in one folder
  • Disk-to-Disk: Addition of 223.32 GB spread over 171 files in 29 sub-folders to the above four workloads (total of 317.91 GB over 9176 files in 535 sub-folders)

Except for the ‘Disk-to-Disk’ workload, each data set is first placed in a 29GB RAM drive, and a robocopy command is issue to transfer it to the external storage unit (formatted in exFAT for flash-based units, and NTFS for HDD-based units).

robocopy /NP /MIR /NFL /J /NDL /MT:32 $SRC_PATH $DEST_PATH

Upon completion of the transfer (write test), the contents from the unit are read back into the RAM drive (read test) after a 10 second idling interval. This process is repeated three times for each workload. Read and write speeds, as well as the time taken to complete each pass are recorded. Whenever possible, the temperature of the external storage device is recorded during the idling intervals. Bandwidth for each data set is computed as the average of all three passes.

The ‘Disk-to-Disk’ workload involves a similar process, but with one iteration only. The data is copied to the external unit from the CPU-attached NVMe drive, and then copied back to the internal drive. It does include more amount of continuous data transfer in a single direction, as data that doesn’t fit in the RAM drive is also part of the workload set.

Audio and Video Read

The workloads are processed in sequence, and the initial ones – particularly the read workloads – show great performance from the DT Max. However, with more and more traffic, as the workloads shift to the ‘disk-to-disk’ sections, the SLC cache runs out. Performance numbers are equivalent to what one might expect from bus-powered portable hard drives.

As long as the workload is contained within the SLC cache (more on the sizing further down in this section), it can be seen that there is no significant gulf in the numbers between the different units. For all practical purposes, casual users will not notice the difference between them in the course of normal usage. However, power users may want to dig deeper to understand the limits of each device. To address this concern, we also instrumented our evaluation scheme for determining performance consistency.

Performance Consistency

Aspects influencing the performance consistency include SLC caching and thermal throttling / firmware caps on access rates to avoid overheating. This is important for power users, as the last thing that they want to see when copying over 100s of GB of data is the transfer rate going down to USB 2.0 speeds.

In addition to tracking the instantaneous read and write speeds of the DAS when processing the AnandTech DAS Suite, the temperature of the drive was also recorded. In earlier reviews, we used to track the temperature all through. However, we have observed that SMART read-outs for the temperature in NVMe SSDs using USB 3.2 Gen 2 bridge chips end up negatively affecting the actual transfer rates. To avoid this problem, we have restricted ourselves to recording the temperature only during the idling intervals. The graphs below present the recorded data.

AnandTech DAS Suite – Performance Consistency
TOP: BOTTOM:

The first three sets of writes and reads correspond to the AV suite. A small gap (for the transfer of the video suite from the internal SSD to the RAM drive) is followed by three sets for the Home suite. Another small RAM-drive transfer gap is followed by three sets for the Blu-ray folder. This is followed up with the large-sized ISO files set. Finally, we have the single disk-to-disk transfer set. It appears that either enough SLC cache is available or it is recovered fast enough to cover almost 11 transfer sets. Once the cache is out, the performance goes down from close to 1GBps to around 80 MBps.

PCMark 10 Storage Bench – Real-World Access Traces

There are a number of storage benchmarks that can subject a device to artificial access traces by varying the mix of reads and writes, the access block sizes, and the queue depth / number of outstanding data requests. We saw results from two popular ones – ATTO, and CrystalDiskMark – in a previous section. More serious benchmarks, however, actually replicate access traces from real-world workloads to determine the suitability of a particular device for a particular workload. Real-world access traces may be used for simulating the behavior of computing activities that are limited by storage performance. Examples include booting an operating system or loading a particular game from the disk.

PCMark 10’s storage bench (introduced in v2.1.2153) includes four storage benchmarks that use relevant real-world traces from popular applications and common tasks to fully test the performance of the latest modern drives:

  • The Full System Drive Benchmark uses a wide-ranging set of real-world traces from popular applications and common tasks to fully test the performance of the fastest modern drives. It involves a total of 204 GB of write traffic.
  • The Quick System Drive Benchmark is a shorter test with a smaller set of less demanding real-world traces. It subjects the device to 23 GB of writes.
  • The Data Drive Benchmark is designed to test drives that are used for storing files rather than applications. These typically include NAS drives, USB sticks, memory cards, and other external storage devices. The device is subjected to 15 GB of writes.
  • The Drive Performance Consistency Test is a long-running and extremely demanding test with a heavy, continuous load for expert users. In-depth reporting shows how the performance of the drive varies under different conditions. This writes more than 23 TB of data to the drive.

Despite the data drive benchmark appearing most suitable for testing direct-attached storage, we opt to run the full system drive benchmark as part of our evaluation flow. Many of us use portable flash drives as boot drives and storage for Steam games. These types of use-cases are addressed only in the full system drive benchmark.

The Full System Drive Benchmark comprises of 23 different traces. For the purpose of presenting results, we classify them under five different categories:

  • Boot: Replay of storage access trace recorded while booting Windows 10
  • Creative: Replay of storage access traces recorded during the start up and usage of Adobe applications such as Acrobat, After Effects, Illustrator, Premiere Pro, Lightroom, and Photoshop.
  • Office: Replay of storage access traces recorded during the usage of Microsoft Office applications such as Excel and Powerpoint.
  • Gaming: Replay of storage access traces recorded during the start up of games such as Battlefield V, Call of Duty Black Ops 4, and Overwatch.
  • File Transfers: Replay of storage access traces (Write-Only, Read-Write, and Read-Only) recorded during the transfer of data such as ISOs and photographs.

PCMark 10 also generates an overall score, bandwidth, and average latency number for quick comparison of different drives. The sub-sections in the rest of the page reference the access traces specified in the PCMark 10 Technical Guide.

Booting Windows 10

The read-write bandwidth recorded for each drive in the boo access trace is presented below.

Windows 10 Boot

Performance numbers are roughly in line with what can be expected from a 1TB PCIe 3.0 x2 NVMe drive behind a USB 3.2 Gen 2 bridge. Since our comparison drives both use PCIe 3.0 x4 SSDs, there is a significant gulf in performance.

Creative Workloads

The read-write bandwidth recorded for each drive in the sacr, saft, sill, spre, slig, sps, aft, exc, ill, ind, psh, and psl access traces are presented below.

Startup - Adobe Acrobat

Surprisingly, despite the usage of a PCIe 3.0 x4 NVMe drive inside the Akasa enclosures, the Kingston DT Max manages to come in the middle of the pack in most workloads.

Office Workloads

The read-write bandwidth recorded for each drive in the exc and pow access traces are presented below.

Usage - Microsoft Excel

The DT Max performs very similar to other drives for the Excel and Powerpoint storage traces.

Gaming Workloads

The read-write bandwidth recorded for each drive in the bf, cod, and ow access traces are presented below.

Startup - Battlefield V

Certain games such as Call of Duty perform very well in this read-intensive benchmark. However, for load times, the bridge solutions work out better for most cases.

Files Transfer Workloads

The read-write bandwidth recorded for each drive in the cp1, cp2, cp3, cps1, cps2, and cps3 access traces are presented below.

Duplicating ISOs (Read-Write)

The PCIe 3.0 x4 NVMe SSDs in the Akasa enclosure deliver better performance for these stressful sequential workloads, with the Kingston DT Max consistently falling in the lower half.

Overall Scores

PCMark 10 reports an overall score based on the observed bandwidth and access times for the full workload set. The score, bandwidth, and average access latency for each of the drives are presented below.

Full System Drive Benchmark Bandwidth (MBps)

Given the analysis of the aforementioned results, it is clear that a bridge solution delivers a better experience in most cases. However, the Kingston DT Max delivers respectable numbers equivalent to that of a DRAM-less PCIe 3.0 x2 NVMe SSD behind a USB 3.2 Gen 2 bridge.

The performance of the Kingston DT Max in various real-world access traces as well as synthetic workloads was brought out in the previous section. We also looked at the performance consistency for these cases. Power users may also be interested in performance consistency under worst-case conditions, as well as drive power consumption. The latter is also important when used with battery powered devices such as notebooks and smartphones. Pricing is also an important aspect. We analyze each of these in detail below.

Worst-Case Performance Consistency

Flash-based storage devices tend to slow down in unpredictable ways when subject to a large number of small-sized random writes. Many benchmarks use that scheme to pre-condition devices prior to the actual testing in order to get a worst-case representative number. Fortunately, such workloads are uncommon for direct-attached storage devices, where workloads are largely sequential in nature. Use of SLC caching as well as firmware caps to prevent overheating may cause drop in write speeds when a flash-based DAS device is subject to sustained sequential writes.

Our Sequential Writes Performance Consistency Test configures the device as a raw physical disk (after deleting configured volumes). A fio workload is set up to write sequential data to the raw drive with a block size of 128K and iodepth of 32 to cover 90% of the drive capacity. The internal temperature is recorded at either end of the workload, while the instantaneous write data rate and cumulative total write data amount are recorded at 1-second intervals.

Sequential Writes to 90% Capacity – Performance Consistency
TOP: BOTTOM:

The Kingston DT Max sustains the maximum write transfer rate for close to 100 seconds – around 95GB of data – before the ‘direct-to-TLC’ writes start. Beyond that, the write speeds drop down to an average of around 80 MBps. This type of SLC caching behavior / direct-to-TLC penalty is not seen in the dual-chip bridge solutions. It must also be noted that the temperature read-out was 92C at the end of the process. Given the lack of a thermal solution inside the UFD, and the transfer rates involved, it is not much of a surprise to see the high temperature.

Power Consumption

Bus-powered devices can configure themselves to operate within the power delivery constraints of the host port. While Thunderbolt ports are guaranteed to supply up to 15W for client devices, USB 2.0 ports are guaranteed to deliver only 2.5W (500mA @ 5V). In this context, it is interesting to have a fine-grained look at the power consumption profile of the various external drives. Using the Plugable USBC-TKEY, the bus power consumption of the drives was tracked while processing the CrystalDiskMark workloads (separated by 5s intervals). The graphs below plot the instantaneous bus power consumption against time, while singling out the maximum and minimum power consumption numbers.

CrystalDiskMark Workloads – Power Consumption
TOP: BOTTOM:

While bridge chip-based solutions operate at around 2.5W with peaks of 5W+, the Kingston DT Max is much more power efficient. The UFD goes to deep sleep of around 8 – 9mW after around 20 minutes of lack of traffic. Average power usage is around 1.4W, and the peak is only 3.04W.

Final Words

The Kingston DT Max is available for pre-order today – hovering around $166 on B&H. It appears that retailers are happy to tag on a premium to this product for its uniqueness, given that the 2GBps-capable Kingston XS2000 is priced lower at $160. Given that the 256GB and 512GB versions are priced at $58 and $97, a sub-$150 price for the 1TB Data Traveler Max would work better on the current market. The real competition here is the dual-interface Akasa AK-ENU3M2-04 with its male connectors and user-replaceable SSD. Including a 1TB SSD would still land it at a sub $140 price point.

Overall, the Kingston DataTraveler Max is a hands-down winner in terms of industrial design, form-factor, performance, power efficiency, and uniqueness. The SLC cache is big enough for users to not worry too much about the sub-100 MBps performance numbers seen for extreme workloads. That said, there is some scope for improvement in thermal design. Additionally, a dual-connector (Type-C + Type-A) product could be a great addition to the family. Given what we have seen of Silicon Motion’s SM2320 in the Kingston DataTraveler Max, it is clear that USB thumb drives are approaching portable SSDs in performance – that is good from a consumer viewpoint.