Not surprisingly, usable capacity is a key metric by which customers buy storage, what may be surprising however is how poorly it is understood even within the storage community. Before we get too far I think it's worth explaining what a Terabyte is. "Isn't 1TB just one trillion bytes?" That is a perfectly reasonable conclusion to draw - tera is the standard prefix for trillion in base 10. The thing we need to remember that computers don't do decimal (base 10) math like we do, they do binary (base 2) math. As a computer understands it, 1TB is 2^{40}, or 1,099,511,627,776 bytes. Similarly, 1GB is actually 2^{30}, or 1,073,741,824 bytes.

Surely then when one buys a 6TB drive in Nimble array or from the local electronics store it must contain 6,597,069,766,656 bytes right? Wrong. If you refer to the specifications of nearly any disk on the market today you'll see the capacity footnoted with text something like "1 One gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes when referring to drive capacity". Technically this is correct, since the prefixes 'giga' and 'tera' do describe billions and trillions, but it falls 597,069,766,656 bytes short of the expectation. Interestingly, system memory has always been marketed in capacities based on a power of 2 as a result of the way it is addressed, meaning that this capacity difference will only ever apply to disks.

So now let's see how the gap magnifies as the scale gets bigger. The table below shows how the ratio between SI units (standard mega, giga prefixes in base 10) compares with binary units; it is based on a wikipedia article here. What we can see from the table is that the ratio between SI units (standard base 10 mega, giga, tera, peta) and binary units (base 2) grows as the units grow bigger. A 1TB disk that you buy today actually contains a little over 931GB of space. If we could manufacture a 1PB disk, it would contain something like 909TB of space.

Multiples of bytes | |||||
---|---|---|---|---|---|

SI decimal prefixes | IEC binary prefixes | ||||

Name (Symbol) | Standard SI | Binary usage | Ratio SI/Binary | Name (Symbol) | Value |

kilobyte (kB) | 10^{3} | 2^{10} | 0.9766 | kibibyte (KiB) | 2^{10} |

megabyte (MB) | 10^{6} | 2^{20} | 0.9537 | mebibyte (MiB) | 2^{20} |

gigabyte (GB) | 10^{9} | 2^{30} | 0.9313 | gibibyte (GiB) | 2^{30} |

terabyte (TB) | 10^{12} | 2^{40} | 0.9095 | tebibyte (TiB) | 2^{40} |

petabyte (PB) | 10^{15} | 2^{50} | 0.8882 | pebibyte (PiB) | 2^{50} |

exabyte (EB) | 10^{18} | 2^{60} | 0.8674 | exbibyte (EiB) | 2^{60} |

zettabyte (ZB) | 10^{21} | 2^{70} | 0.8470 | zebibyte (ZiB) | 2^{70} |

yottabyte (YB) | 10^{24} | 2^{80} | 0.8272 | yobibyte (YiB) | 2^{80} |

As you can see above, a new naming convention has been created to describe storage capacity in binary terms. So a terabyte is actually not a terabyte, but rather a tebibyte or TiB. Sometimes I feel like the only guy using the term; that may be why I feel like I'm having this conversation all the time.

To make matters worse, in enterprise storage systems there is additional overhead consumed by things like storage system meta data and array system software. In some cases this additional overhead can consume more than 10% of the purchased capacity in TiB. We have all seen the aggregate effect of both the TB to TiB short-change and the system overhead with things like mobile devices. The screenshot below is from the "About" screen on my iPhone (236 apps! I should delete some...). The 64GB model has just 55.9GiB usable after binary conversion and reserving space for the Operating System. Ironically Apple doesn't bother with the GiB prefix either.

Another reality of modern storage systems is that they tend to have a soft capacity limit beyond which performance will begin to suffer in addition to a hard capacity limit where no more data can be stored. In many cases the soft limit is as low as 60-80% of the system's true capacity further limiting the true usable capacity of that system. For some use cases this performance threshold won't matter; if you're archiving data you want to store every last possible byte without concern for performance implications. For a lot of use cases though guaranteeing that performance doesn't degrade will be critical, and it is worth understanding.

I doubt there's any changing the disk industry at this point, but you can challenge your enterprise storage system suppliers to tell you about the true usable capacity of their system in tebibytes after accounting for system overhead and performance thresholds - and when they ask what a tebibyte is (and they likely will), send them here. I've helped my customers decode the real capacity of my competitors systems while those competitors struggled to accurately describe a terabyte

EOF