Microsoft Robocopy vs Linux NAS: Robocopy Pitfalls

Intro

I have been using Microsoft Robocopy for years, as it is an easy (by now even a built-in) way to do incremental backups under Windows.

Until recently, I used to backup my data from an internal NTFS hard drive to an external NTFS hard drive. As of late, I’m the proud owner of a DS213+ NAS, which runs some Linux base OS and Ext4 hard drives.

As I’m still looking for the perfect backup/versioning client (rsync on windows?!), I thought to stick to Robocopy in the meantime. Unfortunately, my backup scripts, which have done a splendid job of incrementally backing up my data to an external hard drive for years, now do a full backup to the NAS every time.

As it turns out, there is not only one reason for this behavior, but two:

  1. Timestamp
  2. File size

Here are the solutions solving these issues (at least for me), as well as some additional hints to using Robocopy.

1. Timestamp

At first, Robocopy kept telling me NEWER or OLDER for each file (even though the file didn’t change), resulting in copying the file instead of skipping it.

Solution:

First, make sure that both the NAS and the client PC have the same system time (use an NTP server, for example).

If the problem still persists, a good solution is to make Robocopy use FAT file times (/FFT).

This results in a 2-second-granularity, that is a file is only declared NEWER or OLDER, when it there is a difference of more than two seconds between the source and the destination file. If this option is not set, a nanosecond-granularity is used. Obviously, Samba’s time granularity is not as precise, and therefore the time stamps hardly ever match.

2. File size

If your incremental works by now, skip the rest of the article.

As for me, after solving the above problem, the incremental backups still didn’t work.

Robocopy kept telling me CHANGED for most files. Out of the frying pan into the fire!

What does CHANGED mean? The answer can be found here:

The source and destination files have identical time stamps but
different file sizes. The file is copied; to skip this file, use /XC.

Skipping all files with different sizes? No, that’s some dangerous idea when backing up data. So what now?

But why do they have a different size at all? Thats some file on the client PC:

SomeFile on the Client PC

SomeFile on the Client PC

And that’s the same file after transferring to the NAS:

SomeFile on the NAS

SomeFile on the NAS

Tthe attentive observer might have recognized that the size on the disk is different.

The reason for this can be found in different block sizes used in NAS and Client. I was wondering first, because I set up both NTFS and Ext4 with a Block size of 4K.

However, the Samba server has a default block size of 1k! So setting up the Samba with an explicit block size that matches the one of your client PC solves this issue.

How?

SSH to your (Synology) NAS.

 vi /usr/syno/etc/smb.conf

Press i.

Enter this line bellow the [global] tag (use the block size of your host file system, e.g. 4K = 4×1024=4096)

         block size = 4096

Press ESC, then enter :wq and press ENTER.

Restart the samba server by

/usr/syno/etc/rc.d/S80samba.sh restart

That solved my problems and I can now do incremental backups again.
Until I finally have set up perfect rsync for windows solution 🙂

Alternative solution for 1. and 2.

There is, however, an alternative to the solutions for 1. and 2.:

Use the Archive bit. Each file has an archive bit. Everytime you change the file, the bit is set. This behavior can be utilized by Robocopy. Using the /m switch makes Robocopy reset the archive bit on each source file and skips all files whose archive bit is set. That is, it copies only files that changed since the last backup. No need for caring about nasty time stamps or stupid file sizes.

There is one drawback, however. When you want to make a full backup, or you backup your data to several devices, you must not use the /m switch or your backups will be incomplete.

Additional Hints

While I’m on it: Here’s the Robocopy options I use. See Robocopy | SS64.com for complete reference.

robocopy "<source>" "<dest>" /MIR /V /NP /TEE /LOG:"%~f0.log" /Z /R:10 /W:10 /FFT /DCOPY:T
  • /MIR – MIRror a directory tree, that is, copy all subfolders and purge extra files on the destination
  • /V – Verbose output log, showing skipped files
  • /NP – No Progress. Don’t flood log file with % copied for each file.
  • /TEE – Output to console window, as well as the log file.
  • /LOG:”%~f0.log” – Output status to file <name-of-batch-file>.log, overwrite existing one.
  • /Z – Copy files in restartable mode (survive network glitch)
  • /R:10 – Number of Retries on failed copies – default is 1 million. Reducing the number is helpful when trying to copy locked files.
  • /W:10 – Wait time between retries – default is 30 seconds. Reducing the number is helpful when trying to copy locked files.
  • /FFT – Assume FAT File Times, 2-second date/time granularity. See above.
  • /DCOPY:T – Copy Directory Timestamps. Why is this not default? You definitely want to keep your directory timestamps!
Advertisements

NAS: DS213+ & WD20NPVT – 3. Performance and Encryption

As announced in the first and second post about Synology DS213+ and the Western Digital WD20NPVT, this post is about the effective data rates achieved by the NAS and the hard drives. It contains the data rates measured when reading files from the DS213+ (download), as well as the ones measured when writing to it (upload) for both unencrypted and encrypted folders on the NAS. For measurement both one large file (1 x 50GB) as well as many small files (100,000 x 10KB) have been transfered to/from the NAS.

Measured Values

The following tables compare the measured data rates to the ones published by Synology.

Large file

Note that for the measurement in this post a 50GB file was used, whereas Synology transfered a 5GB file, which should not make much of a difference.

Operation Data rate (measured) Data rate (Synology)
Upload 51.87 MB/s 84.31 MB/s
Upload (encrypted) 21.32 MB/s 24.65 MB/s
Download 40.89 MB/s 110.36 MB/s
Download (encrypted) 37.21 MB/s 49.58 MB/s
Client (internal) 111.55MB/s

Small files

Note that for the measurement in this post a 100,000 10KB files were used, whereas Synology transfered 1,000 5MB files. So the rates here cannot really be compared, as transferring more smaller files results in a bigger overhead and therefore in a lower transfer rate.

Still, it is remarkable, that Synology only measured the performance when transferring small files to unencrypted folders. Maybe the data rates measured for encrypted folders didn’t look too good?

Operation Data rate (measured) Data rate (Synology)
Upload 0.44 MB/s 43.82MB/s
Upload (encrypted) 0.05 MB/s
Download 0.75 MB/s 58.15MB/s
Download (encrypted) 0.49 MB/s
Client (internal) 4.52MB/s

Measurement

All data rates have been measured from the same client PC using Microsoft Robocopy, connecting to the NAS via SMB protocol.

The Client and the NAS are connected via a Linksys SE2800 switch, using Gigabit Ethernet.

The following table lists the NAS details, as well as the client PCs’ used for measurement in this post. In addition, the details of the client PC used by Synology are listed in the table.

Synology DS213+ Client PC Client PC (Synology)
OS DSM 4.1-2657 Windows 8×64 Windows 7
CPU Freescale MPC8544E 2x 1.067GHz Intel T7250 2×2.0GHz Intel Core i5 750 2.67GHz
RAM 512MB DDR3 4GB DDRII (2x2GB at 667MHz) 4GB DDRIII
SSD/HDD Western Digital Green WD20NPVT x2, RAID 1 Samsung 840 Pro (256GB) SVP200S3 (60GB) SSD x 2, RAID 0

Conclusion / Differences

There obviously are differences between the values measured here and the ones published by Synology. What are the reasons for this?

For the small files, the main reason for the difference surely is the smaller size of the files copied, as mentioned above. Why did I choose this smaller size and bigger number? It was not my main objective to compare the values to the ones measured by Synology. However, I was interested at what rate small files are actually copied. For me, small files are less than 1MB. Have you ever tried to copy a directory with a large quantity of small files (several KB each) such as an eclipse workspace or an SVN repo? It takes ages. I never thought, though, that they are copied with a data rate of less than a MB per sec.

For the large file I presume the difference between the measured values and the ones by Synology can be found in the differences in measurement set up. Synology used a faster CPU, fast RAM, an Raid 0 and direct connection between client PC and NAS.

Moreover, I don’t know what software and protocol Synology used for transfer. Maybe they used FTP, which might perform better than SMB. In addition, it might be even faster for small files, because they can be transfered using several concurrent connections and not sequentially, as it is done by robocopy.

Anyway, Synology’s download rate of 110 MB/s somehow still is a miracle to me, as this is almost as fast as when I write to my local SSD with robocopy…

Finally, I must say that it is astonishing why uploading (that is writing) large file is faster than downloading (for unencrypted files). I repeated measurement of all four large file operations several times but I got nearly the same results every time (± 1 MB/s). This seems to have something to do with Robocopy or SMB, because downloading the exact same file via FTP (Filezilla) yields a data rate of about 65 MB/s.

Maybe I should write another post comparing FTP and SMB, when I have time 🙂