For nearly a decade, Dropbox has run its own hardware infrastructure. We have one of the few exabyte-scale storage systems in the world, and running infrastructure at this scale takes a lot of time and planning to build and maintain. Our Hardware team architects, ships, and maintains all of the physical hardware in the data centers that keep Dropbox online, and we recently hit an exciting milestone: we shipped our sixth generation hardware designs.
As our workloads grow and workflows evolve, each new generation of server hardware helps us stay ahead of Dropbox users’ needs. And by revisiting these configurations on a regular cadence, we ensure we’re making decisions that provide the best user experience possible. Often, that means making innovative bets on new and emerging technologies. We were early adopters of Shingled Magnetic Recording (SMR) storage drives and new CPU technology from AMD, and we’re excited about the possibility of leveraging machine learning accelerators and heat-assisted magnetic recording in the future.
But first, let’s take a closer look at what’s running in our data centers today.
Who are y’all and what do you do at Dropbox?
When we started work on Magic Pocket in 2013, the hardware team decided to do all of our infrastructure engineering in-house. This provided an opportunity to control our own destiny in terms of cost and performance, which allows us to serve our customers better.
When designing server configurations there are many different approaches. One approach is to tailor the server to each service. This is sort of like designing a high-performance sports car that’s optimized for speed, but not as well suited for running errands or dropping your kids off at school. If you don’t have many services, this approach makes sense—but at Dropbox there are more than 100, and that number keeps increasing. This means we would have to design and maintain more than 100 unique hardware configurations at any one time. In the hardware industry we call these snowflakes.
To tackle the increasing size and breadth of our service diversity, our approach has been to focus on a few general-purpose system designs and to avoid hardware snowflakes. This means our hardware building blocks are more utilitarian—and in keeping with our analogy above, more like designing a family sedan, which is better suited for all around tasks.
These hardware building blocks are classified into three server tiers for core applications, and one tier for serving our Edge network:
- Storage runs Magic Pocket (our immutable block storage system) and hosts all of our customer data
- Database powers our persistent storage layer (think MySQL and metadata)
- Compute provides general purpose compute for applications
- Edge connects our customers to our data centers through our distributed POP sites
A look inside our sixth generation servers
As in any engineering project, there was a lot of time spent discussing the code names for each tier. We usually try to stick to cartoon names, and each name has to begin with the same first letter as each of our server tiers: Storage, Database, and Compute. (Edge is the exception, because we saw it as a branch from the Compute design.) For our sixth generation hardware, we chose Scooby, Diego, Cartman, and Coco.
These servers are designed to efficiently store our customer data. We co-designed this server with the Magic Pocket team to offer a range of high capacity 3.5” HDDs. This generation allows us to scale up to more than 2 PB per server, and more than 20 PB per rack. Each chassis contains more than 100 drives, a compute enclosure to run Magic Pocket software, and a 100 Gb NIC for connectivity. We can fit eight of these chassis in a rack—an amazing amount of density that lets us get even more performance per rack than previous generations.
These servers power our persistent storage layer, which supports services like Edgestore, File Journal, and Magic Pocket. With Diego, we’ve effectively doubled our rack density. Each chassis has 60% more CPU cores and 16 TB NVME SSDs—twice the amount of flash storage as our previous generation. Flash storage technology is the key to providing very fast input/output and low latency access to metadata.
These servers handle our general compute needs. Each node is a low-end server, housed in a 1U enclosure, stacked 46 per rack, and then interconnected with a top-of-rack switch. Cartman gives our Compute tier a 3x improvement in speed and power—far beyond any prior generation. The leap in compute was primarily driven by multi-tenancy—meaning multiple applications sharing the same space—and CPU industry trends that give us more performance across our fleet with fewer CPUs. Inside the box is a 48 core processor, 256 GB DDR4 RAM, a 1 TB SSD for boot, and a 25 Gb NIC.
These servers are responsible for accepting, encrypting, and decrypting traffic in and out of our data centers, and is the closest tier to our users. Our Coco refresh has allowed us to scale up the rack density of our POPs by 17%, increase our core count by 50%, increase network speeds by 25%, and triple our total memory. These improvements have helped us cut down on latency and improve the user experience when connecting to a Dropbox data center.
What’s next: Seventh generation hardware
Now that our sixth-generation hardware is up and running, we’ve been working closely with our colleagues on the software team to monitor how our new servers are performing. Not only can the software team see the actual user impact of our decisions in action, but their insights directly influence future hardware designs. Co-designing hardware with our software teams allows us to take full advantage of the latest technologies, and differentiate our infrastructure from our competitors, ensuring we provide the best service to our users, generation after generation.
Dropbox has a strong track record of deploying new hardware technologies. Some highlights include partnering with AMD to launch their Naples and Rome processors, and deploying SMR drives for Magic Pocket. We plan to continue with this theme by exploring new areas such as machine learning and heat-assisted magnetic recording (HAMR) hard drives.
Machine learning workloads will allow us to take advantage of hardware accelerators, which are new to our fleet. These accelerators offer huge performance gains when running inference and regression-type workloads. This will help us provide Dropbox users with an even better experience when using some of the new features we’ve rolled out in the past few years.
Heat-assisted magnetic recording is the next logical step in unlocking new densities in the most important storage destination at Dropbox: where customer data lives. This new technology will use a superheated laser focused on the platter while writing new bits into place. We’re excited to work closely with our suppliers to deploy these new drives in our fleet.
These are just a few of the technologies that will allow us to keep improving the Dropbox user experience. In fact, we’re already at work on our seventh generation server hardware, which will also mark our 10th anniversary of designing these configurations in-house! We plan to continue scaling our best-in-class infrastructure by rolling out new storage solutions, optimizing for multi-tenancy, and unlocking new features for our database tier.
If you’re interested in helping us build world class infrastructure, we’re hiring! Dropbox is now Virtual First, which means that remote work will be the primary experience for all Dropboxers. To learn more, visit our jobs site and see all of our currently open positions.