Migrating From NetApp To Windows File Servers With PowerShell – Part 1

We are retiring our NetApp filer this year. It was nice knowing you, NetApp. Thank you for the no-hassle performance, agile volume management, and excellent customer support. We will not miss your insane pricing, and subtle incompatibilities with modern Windows clients.

In this multi-part series, I will be sharing PowerShell code developed to assist with our migration. In part one, we will look at bulk copy operations with RoboCopy. In part 2, we will look at a situation where RoboCopy fails to get the job done. In future parts, we will look at automated share and quota management and migration.

Migrating large amounts of data off a NetApp is not particularly straightforward. The only real option we have is to copy data off of the filer CIFS shares to their Windows counterparts. Fortunately, with the multi-threading power utility “robocopy” we can move data between shares pretty quickly. Unfortunately, robocopy only multi-threads file copy operations, not directory search operations. So, while initial data transfers with robocopy take place really quickly, subsequent sync operations are slower than expected. MS also released a utility called “RichCopy” whish supports multi-thread directory searching, but this utility is not supported by MS, and has some significant bugs (i.e. it crashes all the time). What to do?

PowerShell to the rescue! Using PowerShell jobs, we can spawn off a separate robocopy job for each subdirectory of a source share, and run an arbitrary number of parallel directory copies. With some experimentation, I determined that I could run ten simultaneous robocopy operations without overwhelming CPU or disk channels on the filer. Under this arrangement, or file sync Window has been reduced from almost 48 hours to a mere 2.5 hours.

Some tricky bits in the development of this script where:

  • PowerShell jobs and job queuing are critical to completing this script in a timely fashion. Syntax for “start-job” is tricky. See my post on backup performance testing for more comments on working with jobs.
  • Robocopy fails top copy a number of source files. This is mitigated though the use of the “/b” switch (backup mode).
  • The PowerShell cmdlet “receive-jobs” fails to capture output from a variety of job commands unless you assign the job to an object. To reliably capture the output of commands within our jobs, I needed to assign the jobs to our $jobs array.
  • I needed to do some post processing on the log file. In doing so, I needed to find UNC paths for our source filer “\\files”. It is important to remember that, when using regular expressions, “\” is the escape character. So, to match for “\”, we need to enter “\\”. To match for “\\” we need to enter “\\\\”, as in: get-content $logfile | select-string -Pattern "\\\\files" | ...
  • Initially I allowed the script to process only one top level directory at a time (i.e. Start with \\files\software, and only proceed to \\files\shared when “software” completes). The problem with this was, I was preventing the script from running an optimal job count. Furthermore, a single hung job could bring the whole script to a halt. To combat this, I start the script by building a master queue array “$q”, which holds all of the directories for which I am going to start a job. The result of using a master queue is a considerable improvement in sustained throughput.
  • When building an array with a loop (i.e. while…) you may have trouble with the first item added to the array if you do not initialize the array before starting to loop. In my case, I needed to initialize “[array]$jobs = @()” before using the array to hold job objects in the “while” loop. Failing to do so caused “$jobs” to become a single job object when the number of jobs was equal to one. Bad news, if you are expecting to use array properties such as $jobs.count, or to call in index of the object (i.e. $jobs[0]).
  • ISE programs like the native PowerShell ISE, or Quest PowerGUI make script development much easier. However, production environments are not the same as the debug environment, so keep these tips in mind:
    1. Log your script actions! Use lots of out-file calls. If you are feeling slick, you can enclose these in “if ($debug)” clauses, and set the $debug variable as a script parameter (which I did no do here).
    2. When running in production, watch the log file in real-time using “get-content -wait”. I know it is not a cool as the Gnu command “tail”, but it is close.
  • Scoping… careful of the “global” scope. Initially I modified the $jobs and $dc variables in the global scope from within the “collectJobs” function. This worked fine in my ISE and at the PowerShell prompt. However, when running as a scheduled task, these calls failed miserably. I changed the calls to use the “script” scope, and the script now runs as a scheduled task successfully.

Below is the script I developed for this job… it contains paths specific to our infrastructure, but easily could be modified. Change the “while ($jobcount -lt 10)” loop to set the number of simultaneous robocopy processes to be used by the script…

1# FilerSync_jobQueue.ps1

 

 

2# JGM, 2011-09-29

 

 

3# Copies all content of the paths specified in the $srcShares array to

 

 

4# corresponding paths on the local server.

 

 

5# Keeps data on all copy jobs in an array "$q".

 

 

6# We will use up to 10 simultaneous robocopy operations.

 

 

7

 

 

8set-psdebug -strict

 

 

9

 

 

10# Initialize the log file:

 

 

11[string] $logfile = "s:\files_to_local.log"

 

 

12remove-item $logfile -Force

 

 

13[datetime] $startTime = Get-Date

 

 

14[string] "Start Time: " + $startTime | Out-File $logfile -Append

 

 

15

 

 

16# Initialize the Source file server root directories:

 

 

17[String[]] $srcShares1 = "adfs$","JMP$","tsFlexConfig","software","mca","sis","shared"`

 

 

18    #,"R25"

 

 

19    #R25 removed from this sync process as the "text_comments" directory kills

 

 

20    #robocopy.  We will sync this structure separately.

 

 

21[String[]] $srcShares2 = "uvol_t1_1$\q-home","uvol_t1_2$\q-home","uvol_t1_3$\q-home",`

 

 

22    "uvol_t1_4$\q-home","uvol_t1_5$\q-home","uvol_t2_1$\q-home",`

 

 

23    "vol1$\qtree-home"

 

 

24   

 

 

25[String[]] $q = @() #queue array

 

 

26

 

 

27function collectJobs {

 

 

28#Detects jobs with status of Completed or Stopped.

 

 

29#Collects jobs output to log file, increments the "done jobs" count,

 

 

30#Then rebuilds the $jobs array to contain only running jobs.

 

 

31#Modifies variables in the script scope.

 

 

32    $djs = @(); #Completed jobs array

 

 

33    $djs += $script:jobs | ? {$_.State -match "Completed|Stopped"} ;

 

 

34    [string]$('$djs.count = ' + $djs.count + ' ; POssible number of jobs completed in this colletion cycle.') | Out-File $logfile -Append;

 

 

35    if ($djs[0] -ne $null) { #First item in done jobs array should not be null.

 

 

36        $script:dc += $djs.count; #increment job count

 

 

37        [string]$('$script:dc = ' + $script:dc + ' ; Total number of completed jobs.') |Out-File $logfile -Append;

 

 

38        $djs | Receive-Job | Out-File $logfile -Append; #log job output to file

 

 

39        $djs | Remove-Job -Force;

 

 

40        Remove-Variable djs;

 

 

41        $script:jobs = @($script:jobs | ? {$_.State -eq "Running"}) ; #rebuild jobs arr

 

 

42        [string]$('$script:jobs.count = ' + $script:jobs.Count + ' ; Exiting function...') | Out-File $logfile -Append

 

 

43    } else {

 

 

44        [string]$('$djs[0] is null.  No jobs completed in this cycle.') | Out-File$logfile -Append

 

 

45    }

 

 

46}

 

 

47   

 

 

48# Loop though the source directories:

 

 

49foreach ($rootPath in $srcShares1) {

 

 

50    [string] $srcPath = "\\files\" + $rootPath # Full Source Directory path.

 

 

51    #Switch maps the source directory to a destination volume stored in $target

 

 

52    switch ($rootPath) {

 

 

53        shared {[string] $target = "S:\shared"}

 

 

54        software {[string] $target = "S:\software"}

 

 

55        mca {[string] $target = "S:\mca"}

 

 

56        sis {[string] $target = "S:\sis"}

 

 

57        adfs$ {[string] $target = "S:\adfs"}

 

 

58        tsFlexConfig {[string] $target = "s:\tsFlexConfig"}

 

 

59        JMP$ {[string] $target = "s:\JMP"}

 

 

60        R25 {[string] $target = "S:\R25"}

 

 

61    }

 

 

62    #Enumerate directories to copy:

 

 

63    $dirs1 = @()

 

 

64    $dirs1 += gci $srcPath | sort-object -Property Name `

 

 

65        | ? {$_.Attributes.tostring() -match "Directory"} `

 

 

66        | ? {$_.Name -notmatch "~snapshot"}

 

 

67    #Copy files in the root directory:

 

 

68    [string] $sd = '"' + $srcPath + '"';

 

 

69    [string] $dd = '"' + $target + '"';

 

 

70    [Array[]] $q += ,@($sd,$dd,'"/COPY:DATSO"','"/LEV:1"' )

 

 

71    # Add to queue:

 

 

72    if ($dirs1[0] -ne $null) {

 

 

73        foreach ($d in $dirs1) {

 

 

74            [string] $sd = '"' + $d.FullName + '"';

 

 

75            [string] $dd = '"' + $target + "\" + $d.Name + '"';

 

 

76            $q += ,@($sd,$dd,'"/COPY:DATSO"','"/e"')

 

 

77        }

 

 

78    }

 

 

79}

 

 

80foreach ($rootPath in $srcShares2) { 

 

 

81    [string] $srcPath = "\\files\" + $rootPath # Full Source Directory path.

 

 

82    #Switch maps the source directory to a destination volume stored in $target

 

 

83    switch ($rootPath) {

 

 

84        uvol_t1_1$\q-home {[string] $target = "H:\homes1"}

 

 

85        uvol_t1_2$\q-home {[string] $target = "I:\homes1"}

 

 

86        uvol_t1_3$\q-home {[string] $target = "J:\homes1"}

 

 

87        uvol_t1_4$\q-home {[string] $target = "K:\homes1"}

 

 

88        uvol_t1_5$\q-home {[string] $target = "L:\homes1"}

 

 

89        uvol_t2_1$\q-home {[string] $target = "M:\homes1"}

 

 

90        vol1$\qtree-home {[string] $target = "J:\homes2"}

 

 

91    }

 

 

92    #Enumerate directories to copy:

 

 

93    [array]$dirs1 = gci -Force $srcPath | sort-object -Property Name `

 

 

94        | ? {$_.Attributes.tostring() -match "Directory"}

 

 

95    if ($dirs1[0] -ne $null) {

 

 

96        foreach ($d in $dirs1) {

 

 

97            [string] $sd = '"' + $d.FullName + '"'

 

 

98            [string] $dd = '"' + $target + "\" + $d.Name + '"'

 

 

99            $q += ,@($sd,$dd,'"/COPY:DAT"','"/e"')

 

 

100        }

 

 

101    }

 

 

102}

 

 

103

 

 

104[string] $queueFile = "s:\files_to_local_queue.csv"

 

 

105Remove-Item -Force $queueFile

 

 

106foreach ($i in $q) {[string]$($i[0]+", "+$i[1]+", "+$i[2]+", "+$i[3]) >> $queueFile }

 

 

107

 

 

108New-Variable -Name dc -Option AllScope -Value 0

 

 

109[int] $dc = 0           #Count of completed (done) jobs.

 

 

110[int] $qc = $q.Count    #Initial count of jobs in the queue

 

 

111[int] $qi = 0           #Queue Index - current location in queue

 

 

112[int] $jc = 0           #Job count - number of running jobs

 

 

113$jobs = @()

 

 

114

 

 

115while ($qc -gt $qi) { # Problem here as some "done jobs" are not getting captured.

 

 

116    while ($jobs.count -lt 10) {

 

 

117        [string] $('In ($jobs.count -lt 10) loop...') | out-file -Append $logFile

 

 

118        [string] $('$jobs.count is now: ' + $jobs.count) | out-file -Append $logFile

 

 

119        [string] $jobName = 'qJob_' + $qi + '_';

 

 

120        [string] $sd = $q[$qi][0]; [string]$dd = $q[$qi][1];

 

 

121        [string] $cpo = $q[$qi][2]; [string] $lev = $q[$qi][3];

 

 

122        [string]$cmd = "& robocopy.exe $lev,$cpo,`"/dcopy:t`",`"/purge`",`"/nfl`",`"/ndl`",`"/np`",`"/r:0`",`"/mt:4`",`"/b`",$sd,$dd";

 

 

123        [string] $('Starting job with source: ' + $sd +' and destination: ' + $dd) | out-file -Append $logFile

 

 

124        $jobs += Start-Job -Name $jobName -ScriptBlock ([scriptblock]::create($cmd))

 

 

125        [string] $('Job started.  Incrementing $qi to: ' + [string]$($qi + 1)) | out-file-Append $logFile

 

 

126        $qi++

 

 

127    }

 

 

128    [string] $("About to run collectJobs function...") | out-file -Append $logFile

 

 

129    collectJobs

 

 

130    [string] $('Function done.  $jobs.count is now: ' + $jobs.count)| out-file -Append$logFile

 

 

131    [string] $('$jobs.count = '+$jobs.Count+' ; Sleeping for three seconds...') | out-file -Append $logFile

 

 

132    Start-Sleep -Seconds 3

 

 

133}

 

 

134#Wait up to two hours for remaining jobs to complete:

 

 

135[string] $('Started last job in queue. Waiting up to three hours for completion...') |out-file -Append $logFile

 

 

136$jobs | Wait-Job -Timeout 7200 | Stop-Job

 

 

137collectJobs

 

 

138

 

 

139# Complete logging:

 

 

140[datetime] $endTime = Get-Date

 

 

141[string] "End Time: " + $endTime | Out-File $logfile -Append

 

 

142$elapsedTime = $endTime - $startTime

 

 

143[string] $out "Elapsed Time: " + [math]::floor($elapsedTime.TotalHours)`

 

 

144    + " hours, " + $elapsedTime.minutes + " minutes, " + $elapsedTime.seconds`

 

 

145    + " seconds."

 

 

146$out | out-file -Append $logfile

 

 

147

 

 

148#Create an error log from the session log.  Convert error codes to descriptions:

 

 

149[string] $errFile = 's:\files_to_local.err'

 

 

150remove-item $errFile -force

 

 

151[string] $out = "Failed jobs:"; $out | out-file -Append $logfile

 

 

152$jobs | out-file -Append $errFile

 

 

153$jobs | % {$jobs.command} | out-file -Append $errFile

 

 

154[string] $out = "Failed files/directories:"; $out | out-file -Append $errFile

 

 

155Get-Content $logfile | Select-String -Pattern "\\\\files"`

 

 

156    | select-string -NotMatch -pattern "^   Source" `

 

 

157    | % {

 

 

158        $a = $_.toString();

 

 

159        if ($a -match "ERROR 32 ")  {[string]$e = 'fileInUse:        '};

 

 

160        if ($a -match "ERROR 267 ") {[string]$e = 'directoryInvalid: '};

 

 

161        if ($a -match "ERROR 112 ") {[string]$e = 'notEnoughSpace:   '};

 

 

162        if ($a -match "ERROR 5 ")   {[string]$e = 'accessDenied:     '};

 

 

163        if ($a -match "ERROR 3 ")   {[string]$e = 'cannotFindPath:   '};

 

 

164        $i = $a.IndexOf("\\f");

 

 

165        $f = $a.substring($i);

 

 

166        Write-Output "$e$f" | Out-File $errFile -Force -Append

 

 

167    }