I have been having some adventures in Windows 2012 R2 Deduplication and thought I would share my experiences here for others who might be working on similar migrations.  As a long term NetApp user it's been interesting getting to grips with something different.  The biggest change for me is the loss of the NAS functionality which means I need something to do that NAS role in Nimble world.  Step up Windows 2012 R2 and more importantly the deduplication feature.  Note that a lot of the information here is not unique to a Nimble experience.


So, background first

 

We are still in the process of migrating to Nimble, although now a significant portion of the data has been moved (thanks Storage vMotion).  The largest dataset was our file shares (previously serviced by the NetApp storage) to a new Windows 2012 R2 file server hosted on Nimble.  We are not using much in guest iSCSI as we have adopted Veeam as our backup tools, so it is much easier to service those disks as VMWare disks rather than direct attached Nimble disks (via iSCSI).  An area of weakness in my mind is the ability to get data from Nimble to tape etc.  Nimble to Nimble is incredibly easy and for those brave enough to ditch tape it's snapshotting is both efficient simple to configure.  For my environment Veeam was the most applicable replacement for NetApp's data protection products (SnapDrive, SnapManager, Snapmirror and Snapvault).  We moved from Snapmirror to tape to Veeam and repositories.  My hope is that Nimble improves in a few areas (specifically authentication and SnapDrive like features in NCM), but we are where we are and the performance of these boxes vastly outweighs any problems I have with the software stack.  I'm sure this stuff will come and when it does we will really be cooking on gas.


On to my migration.

 

The Process

 

I moved each Netapp file share to a new disk on a Windows 2012 R2 file server I created.  Each share was given a new disk (I'm still on ESX 5.1 so maximum disk size is limited to 2TB).  There was a lot of data (nearly 10TB), but I seeded it using Robocopy then did the final sync and the rest of the magic on a weekend.  Also as part of the migration I enabled Microsoft Deduplication feature.  I thought as Nimble didn’t have this feature it would be worth a go and we did see some significant savings.

 

Currently – Used disk space is 3.3TB

Saved – 5.4TB

Unoptimised – 9.6TB

 

Those are pretty good numbers.  Unfortunately they are not reflected in the Nimble used space.  Nimble reports 6.8TB used including savings for compression.  It's not the 9.6TB of RAW data, but also not the tidy 3.3TB Windows thinks it is using.

 

Remediation

 

It seems pretty clear that we are dealing with UNMAP here.  VMWare doesn’t support guest UNMAP (won’t pass through UNMAP requests directly to the storage from the guest) so the dirty blocks (pre-Deduplication) are being marked as used storage.  I’ve seen some information about ESX 6 having some capabilities for OS support UNMAP but I think they are being very careful because some arrays (not Nimble’s) can be saturated by such requests.  This left me a bit stuck as there is ~3TB out there being used which is effectively deletions (pre-Windows dedupe data).  I did some digging on the Nimble Connect forums as found some information regarding reclaim (sdelete and a powershell alternative).  It was very focused on the layout where Nimble disks are direct attached to the guest (or physical machine), but I thought I would give the reclaimer a quick spin on one of the smaller (but well serviced by deduplication) disks.

 

I believe there are some caveats here.

1.       No snapshots should be enabled on the Nimble volume (you will just end up with a big snapshot)

2.       No VMWare snapshots should exist on the machine (same as above)

 

The space is coming back to me now as I guess the dirty blocks are being zeroed and Nimble is compressing it (lots of zeros compressed is probably a couple of zeros).  My hope is that I can get below 3.3TB used on the Nimble (3.3TB is the currently used disk space reported by Windows, so compression should be able to reduce this further).  Obviously I am running this as a one off operation so things can get dirty again from now on, but it should give me a clean base and a better correlation between OS used, Hypervisor used and Nimble used.


I finally managed to get down to 3.46TB used, pretty close to my 3.3TB target (I needed to add OS really so 3.3TB was optimistic).  There is something to note here though and that is that the compression for the volume holding all of this data is 1.04x.  That ratio is poor and I think Windows 2012 R2 Deduplication is a lie, it certainly looks like they are compressing this stuff too (not just deduplication).  Still, the results are good and far beyond what NetApp Deduplication gave me (although I didn't use compression there).


What if I used iSCSI in the Guest?


I believe, and might need correcting if I am wrong, that if I had the disks directly attached to the guest that Nimble would leverage the OS UNMAP and TRIM functionality to recover the space from the volume.  Windows 2012 supports these features and I would expect different results.  If and when Nimble's data protection features get to a point where this is feasible for me I might change how the file server is architected.


Lessons For Others


I think most of the info can be gleaned from above, however I did make an unrelated ***** up.  When I first did my Robocopy I didn't use the /SEC switch so the permissions didn't carry across.  I ended up in a position where all the files had default file system permissions, but the directories had the correct permissions.  If I picked this up right away I would have run Robocopy with /SECFIX but I didn't.  I'm kicking myself for missing it and it's caused a few issues for the users accessing the data (although really highlights that I probably migrated years of data that no-one really cares about as it has taken nearly a month for issues to show up).  Here is a handy powershell script that takes the folder security and applies to files underneath (this enables inheritance on the files only for those who say I could have just gone to folder properties and replaced file permissions on all child objects).  I went through several revisions (initially calling icacls.exe /reset for each file) but this is the fastest I could make it.  It might help a googler trying to fix the same issue.


# Recurse($path)

 

function Recurse ([string]$path){

 

  if (-not (Test-Path $path)) {

 

   Write-Error "$path is an invalid path."

 

   return $false

 

  $path

 

  $inheritance = get-acl $path

 

  $inheritance.SetAccessRuleProtection($false,$false)

 

  Get-ChildItem -Path $path -Attributes !Directory | set-acl -aclobject $inheritance

 

  $directories = Get-ChildItem -Path $path -Attributes Directory  | Select-Object Fullname

 

  foreach ($directory in $directories) {

 

   Recurse $directory.FullName

 

  }

 

}