Speed up rsync with Simultaneous Concurrent File Transfers

Question

We need to transfer 15TB of data from one server to another as fast as we can  We re currently using rsync but we re only getting speeds of around 150Mb s  when our network is capable of 900 Mb s  tested with iperf   I ve done tests of the disks  network  etc and figured it s just that rsync is only transferring one file at a time which is causing the slowdown   I found a script to run a different rsync for each folder in a directory tree  allowing you to limit to x number   but I can t get it working  it still just runs one rsync at a time   I found the script here  copied below    Our directory tree is like this    main    -  files       -  1          - 343             - 123 wav             - 76 wav          - 772             - 122 wav          - 55             - 555 wav             - 324 wav             - 1209 wav          - 43             - 999 wav             - 111 wav             - 222 wav       -  2          - 346             - 9993 wav          - 4242             - 827 wav       -  3          - 2545             - 76 wav             - 199 wav             - 183 wav          - 23             - 33 wav             - 876 wav          - 4256             - 998 wav             - 1665 wav             - 332 wav             - 112 wav             - 5584 wav   So what I d like to happen is to create an rsync for each of the directories in  main files  up to a maximum of  say  5 at a time  So in this case  3 rsyncs would run  for  main files 1   main files 2 and  main files 3   I tried with it like this  but it just runs 1 rsync at a time for the  main files 2 folder      bin bash    Define source  target  maxdepth and cd to source source   main files  target   main filesTest  depth 1 cd    source      Set the maximum number of concurrent rsync threads maxthreads 5   How long to wait before checking the number of rsync threads again sleeptime 5    Find all folders in the source directory within the maxdepth level find   -maxdepth   depth  -type d   while read dir do       Make sure to ignore the parent folder     if    echo    dir     awk -F      print NF    -gt   depth        then           Strip leading dot slash         subfolder   echo    dir     sed  s       g           if     -d    target    subfolder             then               Create destination folder and set ownership and permissions to match source             mkdir -p    target    subfolder               chown --reference    source    subfolder      target    subfolder               chmod --reference    source    subfolder      target    subfolder           fi           Make sure the number of rsync threads running is below the threshold         while    ps -ef   grep -c  r sync  -gt   maxthreads            do             echo  Sleeping   sleeptime  seconds              sleep   sleeptime          done           Run rsync in background for the current subfolder and move one to the next one         nohup rsync -a    source    subfolder       target    subfolder     lt  dev null  gt  dev null 2 gt  amp 1  amp      fi done    Find all files above the maxdepth level and rsync them as well find   -maxdepth   depth  -type f -print0   rsync -a --files-from - --from0       target

User · Answer

I ve developed a python package called  parallel sync  https   pythonhosted org parallel sync pages examples html  Here is a sample code how to use it   from parallel sync import rsync creds     user    myusername    key      ssh id rsa    host   192 168 16 31   rsync upload   tmp local dir     tmp remote dir   creds creds    parallelism by default is 10  you can increase it   from parallel sync import rsync creds     user    myusername    key      ssh id rsa    host   192 168 16 31   rsync upload   tmp local dir     tmp remote dir   creds creds  parallelism 20    however note that ssh typically has the MaxSessions by default set to 10 so to increase it beyond 10  you ll have to modify your ssh settings

User · Answer

You can use xargs which supports running many processes at a time  For your case it will be   ls -1  main files   xargs -I    -P 5 -n 1 rsync -avh  main files     main filesTest

User · Answer

Updated answer  Jan 2020   xargs is now the recommended tool to achieve parallel execution  It s pre-installed almost everywhere  For running multiple rsync tasks the command would be   ls  srv mail   xargs -n1 -P4 -I  rsync -Pa   myserver com  srv mail    This will list all folders in  srv mail  pipe them to xargs  which will read them one-by-one and and run 4 rsync processes at a time  The   char replaces the input argument for each command call   Original answer using parallel   ls  srv mail   parallel -v -j8 rsync -raz --progress    myserver com  srv mail

User · Answer

Have you tried using rclone org   With rclone you could do something like  rclone copy    source    subfolder       target    subfolder    --progress --multi-thread-streams N   where --multi-thread-streams N represents the number of threads you wish to spawn

User · Answer

The shortest version I found is to use the --cat option of parallel like below  This version avoids using xargs  only relying on features of parallel  cat files txt       parallel -n 500 --lb --pipe --cat rsync --files-from    user remote  dir  dir -avPi       Arg explainer   -n 500              split input into chunks of 500 entries     --cat               create a tmp file referenced by    containing the 500                        entry content for each process     user remote  dir    the root relative to which entries in files txt are considered      dir                local root relative to which files are copied  Sample content from files txt   dir file-1  dir subdir file-2       Note that this doesn t use -j 50 for job count  that didn t work on my end here  Instead I ve used -n 500 for record count per job  calculated as a reasonable number given the total number of records

User · Answer

rsync transfers files as fast as it can over the network  For example  try using it to copy one large file that doesn t exist at all on the destination  That speed is the maximum speed rsync can transfer data  Compare it with the speed of scp  for example   rsync is even slower at raw transfer when the destination file exists  because both sides have to have a two-way chat about what parts of the file are changed  but pays for itself by identifying data that doesn t need to be transferred   A simpler way to run rsync in parallel would be to use parallel  The command below would run up to 5 rsyncs in parallel  each one copying one directory  Be aware that the bottleneck might not be your network  but the speed of your CPUs and disks  and running things in parallel just makes them all slower  not faster   run rsync           e g  copies  main files blah to  main filesTest blah     rsync -av   1    main filesTest   1  main files      export -f run rsync parallel -j5 run rsync      main files

User · Answer

There are a number of alternative tools and approaches for doing this listed arround the web  For example    The NCSA Blog has a description of using xargs and find to parallelize rsync without having to install any new software for most  nix systems  And parsync provides a feature rich Perl wrapper for parallel rsync

User · Answer

The simplest I ve found is using background jobs in the shell   for d in  main files    do     rsync -a   d  remote  main files   amp  done   Beware it doesn t limit the amount of jobs  If you re network-bound this is not really a problem but if you re waiting for spinning rust this will be thrashing the disk   You could add  while     jobs   wc -l   xargs  -gt 10    do sleep 1  done   inside the loop for a primitive form of job control

[bash] Speed up rsync with Simultaneous/Concurrent File Transfers?

Examples related to bash

Examples related to shell

Examples related to ubuntu-12.04

Examples related to rsync

Examples related to simultaneous