Every environment is different, and so are priorities (capacity vs. performance), so it's impossible to say for sure whether it's "worth it" in your yours. The change rate of the data will ultimately determine how much cache churn there is. If it's performance you seek, then the capacity savings you gain is likely not worth the dedupe overhead. However, if performance is not a factor, and capacity is most important, then I'd recommend leaving on the dedupe and disabling caching for the volume(s) in question. If you want both (and depending on the amount of data we're talking about here), you could consider pinning the volume(s) to cache (NOS 2.3+ only).
Something else to think about is snapshots.
I have a 1.5TB data volume for a file server, Win2012r2 dedupe was saving me about 375GB of space on the volume with the ratio getting better the longer time went on. However, I found that the snapshots for this volume were far too big for the churn. Like - at the end of the week disk usage may have increased 10GB but the snapshot size was 120GB. I am guessing the issue has something to do with how chunklets or metadata get stored related to deduplication.
I have since disabled dedupe on the volume and re-hydrated all the files, now snapshots are more reasonably sized. however, via the array's compression I am now only saving about 100GB on the volume.
So to sum up if you are protecting the volume via the array's regular snapshots unexpected usage is something to look out for. since I have room to spare on the array I am ok with the trade-off at the moment but am hoping the deduplication features the Nimble sales people have promised this fall will make more efficient use of space.
We currently have a Windows 2012 R2 fileserver with a 3.8TB drive. We have managed to save 1.99TB to leave 1.88TB free on the drive. Definitely significant space/cost savings there for us.
Thanks for brining up the topic of snapshots, we haven't as of yet enabled snapshots on our fileservers but thanks for pointing out that the snapshot size may not accurately reflect the change rate on the drive once we do. This is definitely something for us to keep in mind if we decide to do this in the future...
Of course, if Nimble dedupe ever comes along then there won't be much need for Windows Server based dedupe I guess!
Jason Emery wrote:
[..]Is it worth the trade off on the hit to cache? From what I understand when the dedupe process runs it would potentially cause all of those files to be cached. I would hate to see the cache unnecessarily impacted.
This may actually be a good thing, at least in theory. Imagine the case of a file server with 1,000 copies of the same file (perhaps you're storing roaming profiles on it, and every user has a copy of the same file). Before deduplication that's 1000 different sets of blocks on disk, and all the users reading that data looks to the Nimble array like they're all reading different data, but it's infrequent enough so the data doesn't get cached and every read has to go to spinning disk. All those reads impact your other reads that can't be deduplicated and have to go to spinning disk. Now imagine the deduplicated scenario. There is only one block of data (caveat: ok that's not entirely true, Microsoft dedup may store it several times for redundancy and hot-spot purposes, but let's assume it is true). To the nimble array, that looks like a frequently accessed block and gets cached. Now the reads of that data (which is truly frequently used), come out of cache. Your spinning disk is left more available for the other stuff that's less frequently used. What does the community think?
Check out the following site one of my teammates wrote up. It terrific and shows the value of using both features. Hope this helps!