Lepton image compression: saving 22% losslessly from images at 15MB/s

// By Daniel Reiter Horn • Jul 14, 2016

This open-source project is no longer maintained or supported by Dropbox. Please refer to Lepton’s GitHub page for more information.

~ ~ ~

We are pleased to announce the open source release of Lepton, our new streaming image compression format, under the Apache license.

Lepton achieves a 22% savings reduction for existing JPEG images, by predicting coefficients in JPEG blocks and feeding those predictions as context into an arithmetic coder. Lepton preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5 megabytes per second and decodes them back to the original bits at 15 megabytes per second, securely, deterministically, and in under 24 megabytes of memory.

We have used Lepton to encode 16 billion images saved to Dropbox, and are rapidly recoding our older images. Lepton has already saved Dropbox multiple petabytes of space.

Community participation and improvement to this new compression algorithm is welcome and encouraged!

Lepton at scale

At Dropbox, the security and durability of your data are our highest priorities. As an added security layer, Lepton runs within seccomp to disable all system calls except read and write of already-open file descriptors. Lepton has gone through a rigorous automated testing process demonstrating determinism on over 4 billion photos and counting. This means that once we verify an image decodes back to its original bits the first time, we can always get back to the original file in future decodes.

All of our compression algorithms, including Lepton, decode every compressed file at least once and compare the result to the input, bit-for-bit, before persisting that file. Compressed files are placed into kernel-protected, read-only, memory before the bit-for-bit comparison to guarantee they are immutable during the full verification process.

A few details about how Lepton works

The JPEG format encodes an image by dividing it into a series of 8×8 pixel blocks, represented as 64 signed 10-bit coefficients. Thus the following 16×16 image would be encoded as 4 JPEG blocks.

Instead of encoding pixel values directly, the signed 10-bit coefficients are the result of a reversible transformation called the Discrete Cosine Transform, or DCT for short.

One of the 64 coefficients in a block, known as the DC, represents the brightness of the entire block of 8x8 pixels. The collection of all DC values in a whole JPEG can be viewed as a thumbnail that is a factor of 8 smaller than the original image in each dimension. The other 63 values, known as the AC coefficients, describe the fine detail going on in the 8x8 pixel block: for example, the texture of pebbles on a beach, or the pattern on a plaid shirt.

Here’s an animation of the letter A becoming ever clearer as its AC coefficients are added one by one. The animation starts with the DC and adds each AC in turn which brings out the detail of the A.

By Hanakus, CC BY-SA 3.0, via Wikimedia Commons

Encoding a block in Lepton

To encode the 63 AC coefficients, Lepton first serializes the number of non-zeros in the block, and then zigzags through the 8x8 block of coefficients, writing out each value using the efficient representation described below.

Instead of writing bits as zeros and ones, Lepton encodes the data using the VP8 arithmetic coder, which can be very efficient, if supplied with carefully chosen context information from previous sections of the image. Stay tuned to future blog posts for more about the context Lepton feeds into the arithmetic coder.

To encode an AC coefficient, first Lepton writes how long that coefficient is in binary representation, by using unary. Unary is a numerical representation that is as simple as counting off on your fingers. For example, three would be 1110. The extra zero at the end is the signal to stop counting. So, five would be 111110 and zero would just be 0.

Next Lepton writes a 1 if the coefficient is positive or 0 if it is negative. Finally, Lepton writes the the absolute value of the coefficient in standard binary. Lepton saves a bit of space by omitting the leading 1, since any number greater than zero doesn’t start with zero.

For example, here’s 47 represented in Lepton:

For the coefficients we observed in JPEG files, the Lepton representation results in significantly fewer symbols than other encodings, such as pure unary, or fixed length two's complement.

The DC coefficient (brightness in each 8x8 block) takes up a lot of room (over 8%) in a typical iPhone photograph so it’s important to compress it well. Most image formats put the DC coefficients before any AC coefficients in the file format. Lepton gets a compression advantage by coding the DC as the last value in each block.

Since the DCs are serialized last, there is a wealth of information from the AC coefficients available to predict the DC coefficient. By defining a good and reproducible prediction, we can subtract the actual DC coefficient from the predicted DC coefficient, and only encode the delta. Then in the future we can use the prediction along with the saved delta to get the original DC coefficient. In almost all cases, this technique results in a significantly reduced number of symbols to feed into our arithmetic coder.

Example image decompression using Lepton

Let’s assume we are in the middle of decoding a JPEG file using Lepton, row by row.

Here’s an example of a block in the middle of the image without a known brightness, only having the AC coefficients (the details) so far:

Since we are decoding from left-to-right and top-to-bottom and have already decoded some blocks, we know all the pixels of the block directly above and the block to the left:

One approach to predict the DC value could be to compute the overall brightness that minimizes the differences between all 16 pairs of pixels at the border of the current 8x8 block and both the left and top neighbors. This can be used as a first cut at the prediction.

If we average based on the median 8 pixels and ignore the 8 of the outlier pixels, the above technique shrinks the DC by roughly 30% over baseline JPEG. Applying this technique to the example above results in this complete icon of a Dropbox:

However, in reality, images tend to have smooth gradients, in the same way as the sky fades from blue to orange towards the horizon during a sunset. If we simply tried to reduce the difference between neighbor pixels, then we would be essentially predicting that any smooth gradients abruptly stop at 8x8 boundaries.

A better prediction is to actually continue all gradients smoothly as in this illustration:

Since we are encoding the DC after all the AC coefficients, and since the difference between a pair of pixels (the gradient) does not depend on the brightness (DC), we can utilize the gradients in the current block to help compute a prediction. We also have the gradients from all pairs of pixels in the neighbor blocks, because they have been completely decoded.

Thus, we can compute the gradient from the second row of the current block to the edge, and from the neighbors back to the edge, meeting in the middle, as illustrated:

Where these two gradients meet in the middle, between these two pixels is the prediction point that Lepton uses to predict the DC of the current 8×8 block. The delta of this prediction is written in the same manner as the AC’s, using length, followed by sign and residual.

For those familiar with Season 1 of Silicon Valley, this is essentially a “middle-out” algorithm.
Pied Piper presents their “MIDDLE OUT!!’ algorithm at TechCrunch Disrupt, photo from HBO

The overall savings from only encoding the delta reduces the stored size of the DC coefficients all the way down to 61% of their original size, saving 39%. DC coefficients occupy just 8% of an average JPEG file, but this middle-out algorithm still manages to contribute a significant reduction to the overall file size.

Summing it up

This is just one of the many techniques we use to save 22% on each JPEG file. The savings are surprisingly consistent over a wide range of images captured by modern cameras and cell phones.
Lepton compression applied to 10,000 images

Lepton can decompress significantly faster than line-speed for typical consumer and business connections. Lepton is a fully streamable format, meaning the decompression can be applied to any file as that file is being transferred over the network. Hence, streaming overlaps the computational work of the decompression with the file transfer itself, hiding latency from the user.

Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz
Lepton encode rate when encoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz

The code for Lepton is open source and available today. It provides lossless, bit-exact storage for any type of photo, whether it be for archival purposes, or for serving live. Stay tuned for more details about how Lepton works and the challenges of compressing images at Dropbox scale.

// Copy link