The Best Surprise Is No Surprise
When the first production machines started arriving in September, it was only natural to have a little apprehension and think, “Does this actually work? What if it breaks?!” It was raw, completely new technology—it was terrifying. And then ... nothing happened. That is, everything simply worked.
Not that this was much of a surprise. The lead-up to this event was the classic Dropbox story: Sweat the details, build confidence by relentlessly bashing out the bugs—and then roll the project out on a large scale.
For the past three years, the Dropbox teams involved in the change over to SMR have been doing a lot of innovation—and more testing than any of us care to remember. The hardware team worked very tightly with the Magic Pocket software team, envisioning each possible edge case and running through every imaginable scenario. That diligent work helped ensure that the migration to the SMRs went off without a hitch.
To prepare for running SMR, we had to make substantial changes to the software layer. We optimized the software by improving the throughput of the networking stack to the disk. We also added an SSD cache: since you can only write to the SMR disks in fixed-size zones, and you can only write to them sequentially, we knew we needed an area to stage the writes. Adding support for this SSD staging layer to the software was specifically targeted for our transition to SMR, but it has helped latency in other cases, as well.
Fortunately, we worked on a significant portion of the necessary software changes required for sequential writing ahead of time, testing our existing fleet of PMR disks. Before we even began to build the new architecture, we made sure it would support both PMR and SMR. This meant that the whole stack was thoroughly tested by the hundreds of thousands of installed disks before we even started bringing the SMR machines online. This removed a considerable amount of risk from the equation. All we had to do once we received the new machines was change the actual disk we were writing to.
In the end, one aspect of our original design helped smooth the transition to SMR. The generic Dropbox infrastructure handles data in immutable blocks up to 4MB in size, which was convenient for SMR, since it allows random writes onto the disk sequentially into a new block. And the size of the write zones we’ve set up in Magic Pocket, with 1 GB extents of data, fit perfectly with the 256 MB zones used to split up SMR drives.
Initially, SMR was a proof-of-concept case: can we actually make it function the way we want? From a hardware point of view, turning to SMR would help us build data storage density quicker than with PMR. What we found was that the use-case for SMR matched up very well with the way we’ve already architected Magic Pocket.
But in comparison to the control we have over our own software stack for SMR, the hardware team had a massive hill to climb in terms of learning everything that went on in the background. One of the biggest challenges in enabling SMR for Dropbox was that it is a new technology in the data-center context. It was the healthy working relationship between the hardware team and the Magic Pocket team that allowed the project to be as successful as it turned out to be. Dropbox is not the only large tech company that’s working on calibrating and fine-tuning their software for SMR, but the use-case is so natural for us that we’ve been motivated to move quickly.
Still, being first had its challenges—not least being the sheer amount of data we already manage. Our hardware team had very limited support when it came to preparing for SMR. The vendors selling the drives didn’t have the chassis configuration that we have—our current test cluster is about six racks, and there are 48 systems, or close to 5,000 drives. So when we iterated through our revisions, we were able to obtain a far better signal, which led to a stronger test process. And that helped put us at the bleeding edge of the technology: few companies have really invested in SMR, so we often ended up doing a lot of the testing for our vendors, which kept us a step ahead.
Increasing Our Density
On the software side of things, we opted for more capacity, performance and flexibility by writing directly to the disks without a filesystem. Some operating systems are adding support for that, but when we were working with it, it wasn’t an option. So in order to talk to the disk, we used Libzbc— which is essentially a wrapper to send commands directly to the disk, without going through the Linux device or a block device stack. But during testing, we ran into the issue of the disk simply failing, over and over. It turned out the failures were due to a hardcoded loop—since we weren’t using Linux, whose kernel code includes retry logic, we had to implement our own retry logic around accessing the disk.
Firmware was also another issue when it came to getting the SMR drive technology to work on the existing platform, largely because the components came from various vendors. We work with multiple hard-drive vendors, as well as various kinds of intermediary technologies, such as the host bus adaptor, to connect multiple drives to a system. Each one of these vendors—as well as the server chassis itself—operated with its own firmware.
There were a lot of moving pieces, so the first initiative on the hardware side was to get our various partners and vendors to talk to each other. We then worked with each individual vendor to identify and resolve any issues early on, and all of the vendors have come forward and engaged with us.
But we are convinced that this will pay dividends in the long term. Opting to be multi-source across all the components, for example, insulates us against any single points of failure or too much reliance on a single supplier from a supply chain perspective.
Cold Storage and SMR
One of the latest developments at Dropbox is the incorporation of a new cold storage tier, designed for less frequently accessed data. Depending on the replication scheme, we’ve managed to cut down on disk usage by 25 to 33 percent with no noticeable change to the end-user experience. Similarly, our cold storage system uses our current mix of SMR and PMR drives, which translates to additional cost-savings without any difference in performance.
If you want to learn more about how we set up the cold tier, read Preslav Le’s recent blog post.
What the Future Holds
The simplicity of our infrastructure has also set us up to take advantage of all future innovations in data storage technology. The beauty of this approach, we believe, is that future technologies will likely use the same or similar API as SMR, where disks will be able to achieve greater densities by writing continuously to a limited number of zones of the disk at any given time. They may be microwave-assisted magnetic recording (MAMR) drives or heat assisted magnetic recording (HAMR) drives—but they will have the same interface and we’ll be able to use the same software architecture. We’re already working on further improving densities with our 2020 storage designs. By jumping to SMR, we’ve opened the door for whatever emerging technologies are coming. For Dropbox, the end result is more cost-efficient storage with a smaller energy footprint without sacrifice in reliability or performance. For the industry overall, our efforts will pay dividends as the underlying architecture is adopted for future enhancements in HDD technology. Acknowledgements: Refugio Fernandez, Preslev Le, Victor Li, and Alexander Sosa contributed to this article. Interested in how our SMR technology will change the future of cloud storage? We’re hiring!