This has come up in conversation a number of times since I joined Nimble, where data compression is such a key component of our system design.  It may be old news to a lot of folks but there's bound to be someone out there who'll benefit from this.

 

Thanks to the analytics engine behind InfoSight, we have really good real-world data indicating how much compression you can get out of typical enterprise workloads.  This is almost always expressed as a compression ratio such as 1.5X, which is really shorthand for 1.5 to 1 or simply 1.5:1.  But what most people want to know is, "By how much capacity will the data footprint of my workload by reduced?  How much space do I really need for this particular application once Nimble's compression has worked on my data?"

 

To answer this, I found it helpful to "rephrase" the compression ratio concept.  Mathematically, a data compression ratio is defined as the size of the uncompressed data divided by the size of the compressed data:

 

                         Uncompressed data

Compression ratio  =   {-------------------}

                          Compressed data

 

 

So if an application had 3 TB of uncompressed data but it compressed down to 2 TB, the resulting compression ratio would be 3 TB / 2 TB = 1.5:1 or 1.5X.

 

But many people find it more useful to think of data compression in terms of space savings or reduction percentages, which you would calculate as follows:

 

                          Uncompressed data - Compressed data              Compressed data

Reduction percentage  =  -------------------------------------   =  1 - {-------------------}

                                 Uncompressed data                        Uncompressed data

 

Using our previous example, we calculate the space savings in going from 3 TB uncompressed down to 2 TB compressed as  1  - ( 2 TB / 3 TB) = 0.33, or 33%.

 

 

T