Is it better to use std memcpy or std copy in terms to performance

Question

Is it better to use memcpy as shown below or is it better to use std  copy   in terms to performance  Why   char  bits   NULL       bits   new  std  nothrow  char   int    copyMe- gt bits  0    if  bits    NULL        cout  lt  lt   ERROR Not enough memory  n       exit 1      memcpy  bits  copyMe- gt bits    int    copyMe- gt bits  0

User · Answer

I m going to go against the general wisdom here that std  copy will have a slight  almost imperceptible performance loss  I just did a test and found that to be untrue  I did notice a performance difference  However  the winner was std  copy   I wrote a C   SHA-2 implementation  In my test  I hash 5 strings using all four SHA-2 versions  224  256  384  512   and I loop 300 times  I measure times using Boost timer  That 300 loop counter is enough to completely stabilize my results  I ran the test 5 times each  alternating between the memcpy version and the std  copy version  My code takes advantage of grabbing data in as large of chunks as possible  many other implementations operate with char   char    whereas I operate with T   T    where T is the largest type in the user s implementation that has correct overflow behavior   so fast memory access on the largest types I can is central to the performance of my algorithm  These are my results   Time  in seconds  to complete run of SHA-2 tests  std  copy   memcpy    increase 6 11        6 29    2 86  6 09        6 28    3 03  6 10        6 29    3 02  6 08        6 27    3 03  6 08        6 27    3 03    Total average increase in speed of std  copy over memcpy  2 99   My compiler is gcc 4 6 3 on Fedora 16 x86 64  My optimization flags are -Ofast -march native -funsafe-loop-optimizations   Code for my SHA-2 implementations   I decided to run a test on my MD5 implementation as well  The results were much less stable  so I decided to do 10 runs  However  after my first few attempts  I got results that varied wildly from one run to the next  so I m guessing there was some sort of OS activity going on  I decided to start over   Same compiler settings and flags  There is only one version of MD5  and it s faster than SHA-2  so I did 3000 loops on a similar set of 5 test strings   These are my final 10 results   Time  in seconds  to complete run of MD5 tests  std  copy   memcpy        difference 5 52        5 56         0 72  5 56        5 55        -0 18  5 57        5 53        -0 72  5 57        5 52        -0 91  5 56        5 57         0 18  5 56        5 57         0 18  5 56        5 53        -0 54  5 53        5 57         0 72  5 59        5 57        -0 36  5 57        5 56        -0 18    Total average decrease in speed of std  copy over memcpy  0 11   Code for my MD5 implementation  These results suggest that there is some optimization that std  copy used in my SHA-2 tests that std  copy could not use in my MD5 tests  In the SHA-2 tests  both arrays were created in the same function that called std  copy   memcpy  In my MD5 tests  one of the arrays was passed in to the function as a function parameter   I did a little bit more testing to see what I could do to make std  copy faster again  The answer turned out to be simple  turn on link time optimization  These are my results with LTO turned on  option -flto in gcc    Time  in seconds  to complete run of MD5 tests with -flto  std  copy   memcpy        difference 5 54        5 57         0 54  5 50        5 53         0 54  5 54        5 58         0 72  5 50        5 57         1 26  5 54        5 58         0 72  5 54        5 57         0 54  5 54        5 56         0 36  5 54        5 58         0 72  5 51        5 58         1 25  5 54        5 57         0 54    Total average increase in speed of std  copy over memcpy  0 72   In summary  there does not appear to be a performance penalty for using std  copy  In fact  there appears to be a performance gain   Explanation of results  So why might std  copy give a performance boost   First  I would not expect it to be slower for any implementation  as long as the optimization of inlining is turned on  All compilers inline aggressively  it is possibly the most important optimization because it enables so many other optimizations  std  copy can  and I suspect all real world implementations do  detect that the arguments are trivially copyable and that memory is laid out sequentially  This means that in the worst case  when memcpy is legal  std  copy should perform no worse  The trivial implementation of std  copy that defers to memcpy should meet your compiler s criteria of  always inline this when optimizing for speed or size    However  std  copy also keeps more of its information  When you call std  copy  the function keeps the types intact  memcpy operates on void    which discards almost all useful information  For instance  if I pass in an array of std  uint64 t  the compiler or library implementer may be able to take advantage of 64-bit alignment with std  copy  but it may be more difficult to do so with memcpy  Many implementations of algorithms like this work by first working on the unaligned portion at the start of the range  then the aligned portion  then the unaligned portion at the end  If it is all guaranteed to be aligned  then the code becomes simpler and faster  and easier for the branch predictor in your processor to get correct   Premature optimization   std  copy is in an interesting position  I expect it to never be slower than memcpy and sometimes faster with any modern optimizing compiler  Moreover  anything that you can memcpy  you can std  copy  memcpy does not allow any overlap in the buffers  whereas std  copy supports overlap in one direction  with std  copy backward for the other direction of overlap   memcpy only works on pointers  std  copy works on any iterators  std  map  std  vector  std  deque  or my own custom type   In other words  you should just use std  copy when you need to copy chunks of data around

User · Answer

My rule is simple  If you are using C   prefer C   libraries and not C

User · Answer

If you really need maximum copying performance (which you might not), use neither of them.

There's a lot that can be done to optimize memory copying - even more if you're willing to use multiple threads/cores for it. See, for example:

What's missing/sub-optimal in this memcpy implementation?

both the question and some of the answers have suggested implementations or links to implementations.

User · Answer

Always use std  copy because memcpy is limited to only C-style POD structures  and the compiler will likely replace calls to std  copy with memcpy if the targets are in fact POD   Plus  std  copy can be used with many iterator types  not just pointers  std  copy is more flexible for no performance loss and is the clear winner

User · Answer

Just a minor addition  The speed difference between memcpy   and std  copy   can vary quite a bit depending on if optimizations are enabled or disabled  With g   6 2 0 and without optimizations memcpy   clearly wins   Benchmark             Time           CPU Iterations --------------------------------------------------- bm memcpy            17 ns         17 ns   40867738 bm stdcopy           62 ns         62 ns   11176219 bm stdcopy n         72 ns         72 ns    9481749   When optimizations are enabled  -O3   everything looks pretty much the same again   Benchmark             Time           CPU Iterations --------------------------------------------------- bm memcpy             3 ns          3 ns  274527617 bm stdcopy            3 ns          3 ns  272663990 bm stdcopy n          3 ns          3 ns  274732792   The bigger the array the less noticeable the effect gets  but even at N 1000 memcpy   is about twice as fast when optimizations aren t enabled   Source code  requires Google Benchmark     include  lt string h gt   include  lt algorithm gt   include  lt vector gt   include  lt benchmark benchmark h gt   constexpr int N   10   void bm memcpy benchmark  State amp  state      std  vector lt int gt  a N     std  vector lt int gt  r N      while  state KeepRunning            memcpy r data    a data    N   sizeof int           void bm stdcopy benchmark  State amp  state      std  vector lt int gt  a N     std  vector lt int gt  r N      while  state KeepRunning            std  copy a begin    a end    r begin            void bm stdcopy n benchmark  State amp  state      std  vector lt int gt  a N     std  vector lt int gt  r N      while  state KeepRunning            std  copy n a begin    N  r begin            BENCHMARK bm memcpy   BENCHMARK bm stdcopy   BENCHMARK bm stdcopy n    BENCHMARK MAIN       EOF

User · Answer

Profiling shows that statement  std  copy   is always as fast as memcpy   or faster is false     My system       HP-Compaq-dx7500-Microtower 3 13 0-24-generic  47-Ubuntu SMP Fri May 2   23 30 00 UTC 2014 x86 64 x86 64 x86 64 GNU Linux       gcc  Ubuntu 4 8 2-19ubuntu1  4 8 2   The code  language  c          const uint32 t arr size    1080   720   3     HD image in rgb24     const uint32 t iterations   100000      uint8 t arr1 arr size       uint8 t arr2 arr size       std  vector lt uint8 t gt  v       main                          DPROFILE              memcpy arr1  arr2  sizeof arr1                printf  memcpy   n                       v reserve sizeof arr1                          DPROFILE              std  copy arr1  arr1   sizeof arr1   v begin                 printf  std  copy   n                                     time t t   time NULL               for uint32 t i   0  i  lt  iterations    i                  memcpy arr1  arr2  sizeof arr1                printf  memcpy      elapsed  d s n   time NULL  - t                                    time t t   time NULL               for uint32 t i   0  i  lt  iterations    i                  std  copy arr1  arr1   sizeof arr1   v begin                 printf  std  copy   elapsed  d s n   time NULL  - t                        g   -O0 -o test stdcopy test stdcopy cpp      memcpy     profile  main 21  now 1422969084 04859 elapsed 2650 us   std  copy     profile  main 27  now 1422969084 04862 elapsed 2745 us   memcpy      elapsed 44 s   std  copy   elapsed 45 s      g   -O3 -o test stdcopy test stdcopy cpp      memcpy     profile  main 21  now 1422969601 04939 elapsed 2385 us   std  copy     profile  main 28  now 1422969601 04941 elapsed 2690 us   memcpy      elapsed 27 s   std  copy   elapsed 43 s   Red Alert pointed out that the code uses memcpy from array to array and std  copy from array to vector  That coud be a reason for faster memcpy    Since there  is   v reserve sizeof arr1     there shall be no difference in copy to vector or array    The code is fixed to use array for both cases  memcpy still faster         time t t   time NULL       for uint32 t i   0  i  lt  iterations    i          memcpy arr1  arr2  sizeof arr1        printf  memcpy      elapsed  ld s n   time NULL  - t            time t t   time NULL       for uint32 t i   0  i  lt  iterations    i          std  copy arr1  arr1   sizeof arr1   arr2       printf  std  copy   elapsed  ld s n   time NULL  - t      memcpy      elapsed 44 s std  copy   elapsed 48 s

User · Answer

All compilers I know will replace a simple std::copy with a memcpy when it is appropriate, or even better, vectorize the copy so that it would be even faster than a memcpy.

In any case: profile and find out yourself. Different compilers will do different things, and it's quite possible it won't do exactly what you ask.

See this presentation on compiler optimisations (pdf).

Here's what GCC does for a simple std::copy of a POD type.

#include <algorithm>

struct foo
{
  int x, y;    
};

void bar(foo* a, foo* b, size_t n)
{
  std::copy(a, a + n, b);
}

Here's the disassembly (with only -O optimisation), showing the call to memmove:

bar(foo*, foo*, unsigned long):
    salq    $3, %rdx
    sarq    $3, %rdx
    testq   %rdx, %rdx
    je  .L5
    subq    $8, %rsp
    movq    %rsi, %rax
    salq    $3, %rdx
    movq    %rdi, %rsi
    movq    %rax, %rdi
    call    memmove
    addq    $8, %rsp
.L5:
    rep
    ret

If you change the function signature to

void bar(foo* __restrict a, foo* __restrict b, size_t n)

then the memmove becomes a memcpy for a slight performance improvement. Note that memcpy itself will be heavily vectorised.

User · Answer

In theory  memcpy might have a slight  imperceptible  infinitesimal  performance advantage  only because it doesn t have the same requirements as std  copy   From the man page of memcpy      To avoid overflows  the size of the   arrays pointed by both the destination   and source parameters  shall be at   least num bytes  and should not   overlap  for overlapping memory   blocks  memmove is a safer approach     In other words  memcpy can ignore the possibility of overlapping data    Passing overlapping arrays to memcpy is undefined behavior    So memcpy doesn t need to explicitly check for this condition  whereas std  copy can be used as long as the OutputIterator parameter is not in the source range   Note this is not the same as saying that the source range and destination range can t overlap   So since std  copy has somewhat different requirements  in theory it should be slightly  with an extreme emphasis on slightly  slower  since it probably will check for overlapping C-arrays  or else delegate the copying of C-arrays to memmove  which needs to perform the check   But in practice  you  and most profilers  probably won t even detect any difference   Of course  if you re not working with PODs  you can t use memcpy anyway

[c++] Is it better to use std::memcpy() or std::copy() in terms to performance?

The answer is

Examples related to c++

Examples related to performance

Examples related to optimization

Tags