Swift Beta performance sorting arrays

Question

I was implementing an algorithm in Swift Beta and noticed that the performance was very poor  After digging deeper I realized that one of the bottlenecks was something as simple as sorting arrays  The relevant part is here   let n   1000000 var x     Int  repeating  0  count  n  for i in 0   lt n       x i    random        start clock here let y   sort x     stop clock here   In C    a similar operation takes 0 06s on my computer   In Python  it takes 0 6s  no tricks  just y   sorted x  for a list of integers    In Swift it takes 6s if I compile it with the following command   xcrun swift -O3 -sdk  xcrun --show-sdk-path --sdk macosx    And it takes as much as 88s if I compile it with the following command   xcrun swift -O0 -sdk  xcrun --show-sdk-path --sdk macosx    Timings in Xcode with  Release  vs   Debug  builds are similar   What is wrong here  I could understand some performance loss in comparison with C    but not a 10-fold slowdown in comparison with pure Python     Edit  weather noticed that changing -O3 to -Ofast makes this code run almost as fast as the C   version  However  -Ofast changes the semantics of the language a lot     in my testing  it disabled the checks for integer overflows and array indexing overflows  For example  with -Ofast the following Swift code runs silently without crashing  and prints out some garbage    let n   10000000 print n n n n n  let x     Int  repeating  10  count  n  print x n     So -Ofast is not what we want  the whole point of Swift is that we have the safety nets in place  Of course  the safety nets have some impact on the performance  but they should not make the programs 100 times slower  Remember that Java already checks for array bounds  and in typical cases  the slowdown is by a factor much less than 2  And in Clang and GCC we have got -ftrapv for checking  signed  integer overflows  and it is not that slow  either   Hence the question  how can we get reasonable performance in Swift without losing the safety nets     Edit 2  I did some more benchmarking  with very simple loops along the lines of  for i in 0   lt n       x i    x i    12345678      Here the xor operation is there just so that I can more easily find the relevant loop in the assembly code  I tried to pick an operation that is easy to spot but also  harmless  in the sense that it should not require any checks related to integer overflows    Again  there was a huge difference in the performance between -O3 and -Ofast  So I had a look at the assembly code    With -Ofast I get pretty much what I would expect  The relevant part is a loop with 5 machine language instructions  With -O3 I get something that was beyond my wildest imagination  The inner loop spans 88 lines of assembly code  I did not try to understand all of it  but the most suspicious parts are 13 invocations of  callq  swift retain  and another 13 invocations of  callq  swift release   That is  26 subroutine calls in the inner loop      Edit 3  In comments  Ferruccio asked for benchmarks that are fair in the sense that they do not rely on built-in functions  e g  sort   I think the following program is a fairly good example   let n   10000 var x    Int  repeating  1  count  n  for i in 0   lt n       for j in 0   lt n           x i    x j            There is no arithmetic  so we do not need to worry about integer overflows  The only thing that we do is just lots of array references  And the results are here   Swift -O3 loses by a factor almost 500 in comparison with -Ofast    C   -O3  0 05 s C   -O0  0 4 s Java  0 2 s Python with PyPy  0 5 s Python  12 s Swift -Ofast  0 05 s Swift -O3  23 s Swift -O0  443 s    If you are concerned that the compiler might optimize out the pointless loops entirely  you can change it to e g  x i     x j   and add a print statement that outputs x 0   This does not change anything  the timings will be very similar    And yes  here the Python implementation was a stupid pure Python implementation with a list of ints and nested for loops  It should be much slower than unoptimized Swift  Something seems to be seriously broken with Swift and array indexing     Edit 4  These issues  as well as some other performance issues  seems to have been fixed in Xcode 6 beta 5   For sorting  I now have the following timings    clang   -O3  0 06 s swiftc -Ofast  0 1 s swiftc -O  0 1 s swiftc  4 s   For nested loops    clang   -O3  0 06 s swiftc -Ofast  0 3 s swiftc -O  0 4 s swiftc  540 s   It seems that there is no reason anymore to use the unsafe -Ofast  a k a  -Ounchecked   plain -O produces equally good code

User · Answer

func partition inout list    Int   low  Int  high   Int  - gt  Int       let pivot   list high      var j   low     var i   j - 1     while j  lt  high           if list j   lt   pivot              i    1              list i   list j      list j   list i                     j    1            list i 1   list high      list high   list i 1       return i 1    func quikcSort inout list    Int    low   Int   high   Int         if low  lt  high           let pIndex   partition  amp list  low  low  high  high          quikcSort  amp list  low  low  high  pIndex-1          quikcSort  amp list  low  pIndex   1  high  high           var list    7 3 15 10 0 8 2 4  quikcSort  amp list  low  0  high  list count-1   var list2     10  0  3  9  2  14  26  27  1  5  8  -1  8   quikcSort  amp list2  low  0  high  list2 count-1   var list3    1 3 9 8 2 7 5  quikcSort  amp list3  low  0  high  list3 count-1     This is my Blog about Quick Sort- Github sample Quick-Sort  You can take a look at Lomuto s partitioning algorithm in Partitioning the list  Written in Swift

User · Answer

From The Swift Programming Language      The Sort Function Swift   s standard library provides a function called   sort  which sorts an  array of values of a known type  based on the   output of a sorting closure  that you provide  Once it completes the   sorting process  the sort function returns a new array of the same   type and size as the old one  with its elements in the correct sorted   order    The sort function has two declarations   The default declaration which allows you to specify a comparison closure   func sort lt T gt  array  T    pred   T  T  - gt  Bool  - gt  T     And a second declaration that only take a single parameter  the array  and is  hardcoded to use the less-than comparator    func sort lt T   Comparable gt  array  T    - gt  T    Example  sort   arrayToSort       0  gt   1     I tested a modified version of your code in a playground with the closure added on so I could monitor the function a little more closely  and I found that with n set to 1000  the closure was being called about 11 000 times   let n   1000 let x   Int   count  n  repeatedValue  0  for i in 0  n       x i    random     let y   sort x     0  gt   1     It is not an efficient function  an I would recommend using a better sorting function implementation   EDIT   I took a look at the Quicksort wikipedia page and wrote a Swift implementation for it  Here is the full program I used  in a playground   import Foundation  func quickSort inout array  Int    begin  Int  end  Int        if  begin  lt  end            let p   partition  amp array  begin  end          quickSort  amp array  begin  p - 1          quickSort  amp array  p   1  end           func partition inout array  Int    left  Int  right  Int  - gt  Int       let numElements   right - left   1     let pivotIndex   left   numElements   2     let pivotValue   array pivotIndex      swap  amp array pivotIndex    amp array right       var storeIndex   left     for i in left  right           let a   1     lt - Used to see how many comparisons are made         if array i   lt   pivotValue               swap  amp array i    amp array storeIndex               storeIndex                       swap  amp array storeIndex    amp array right      Move pivot to its final place     return storeIndex    let n   1000 var x   Int   count  n  repeatedValue  0  for i in 0  n       x i    Int arc4random       quickSort  amp x  0  x count - 1      lt - Does the sorting  for i in 0  n       x i      lt - Used by the playground to display the results     Using this with n 1000  I found that   quickSort   got called about 650 times  about 6000 swaps were made  and there are about 10 000 comparisons   It seems that the built-in sort method is  or is close to  quick sort  and is really slow

User · Answer

The main issue that is mentioned by others but not called out enough is that -O3 does nothing at all in Swift  and never has  so when compiled with that it is effectively non-optimised  -Onone    Option names have changed over time so some other answers have obsolete flags for the build options  Correct current options  Swift 2 2  are   -Onone    Debug - slow -O        Optimised -O -whole-module-optimization   Optimised across files   Whole module optimisation has a slower compile but can optimise across files within the module i e  within each framework and within the actual application code but not between them  You should use this for anything performance critical   You can also disable safety checks for even more speed but with all assertions and preconditions not just disabled but optimised on the basis that they are correct  If you ever hit an assertion this means that you are into undefined behaviour  Use with extreme caution and only if you determine that the speed boost is worthwhile for you  by testing   If you do find it valuable for some code I recommend separating that code into a separate framework and only disabling the safety checks for that module

User · Answer

Swift Array performance revisited    I wrote my own benchmark comparing Swift with C Objective-C  My benchmark calculates prime numbers  It uses the array of previous prime numbers to look for prime factors in each new candidate  so it is quite fast  However  it does TONS of array reading  and less writing to arrays   I originally did this benchmark against Swift 1 2  I decided to update the project and run it against Swift 2 0    The project lets you select between using normal swift arrays and using Swift unsafe memory buffers using array semantics   For C Objective-C  you can either opt to use NSArrays  or C malloc ed arrays   The test results seem to be pretty similar with fastest  smallest code optimization   -0s   or fastest  aggressive   -0fast   optimization   Swift 2 0 performance is still horrible with code optimization turned off  whereas C Objective-C performance is only moderately slower   The bottom line is that C malloc d array-based calculations are the fastest  by a modest margin  Swift with unsafe buffers takes around 1 19X - 1 20X longer than C malloc d arrays when using fastest  smallest code optimization  the difference seems slightly less with fast  aggressive optimization  Swift takes more like 1 18x to 1 16x longer than C   If you use regular Swift arrays  the difference with C is slightly greater   Swift takes  1 22 to 1 23 longer     Regular Swift arrays are DRAMATICALLY faster than they were in Swift 1 2 Xcode 6  Their performance is so close to Swift unsafe buffer based arrays that using unsafe memory buffers does not really seem worth the trouble any more  which is big    BTW  Objective-C NSArray performance stinks  If you re going to use the native container objects in both languages  Swift is DRAMATICALLY faster   You can check out my project on github at SwiftPerformanceBenchmark  It has a simple UI that makes collecting stats pretty easy   It s interesting that sorting seems to be slightly faster in Swift than in C now  but that this prime number algorithm is still faster in Swift

User · Answer

Swift 4 1 introduces new -Osize optimization mode      In Swift 4 1 the compiler now supports a new optimization mode which   enables dedicated optimizations to reduce code size       The Swift compiler comes with powerful optimizations  When compiling   with -O the compiler tries to transform the code so that it executes   with maximum performance  However  this improvement in runtime   performance can sometimes come with a tradeoff of increased code size    With the new -Osize optimization mode the user has the choice to   compile for minimal code size rather than for maximum speed       To enable the size optimization mode on the command line  use -Osize   instead of -O    Further reading   https   swift org blog osize

User · Answer

I decided to take a look at this for fun  and here are the timings that I get   Swift 4 0 2               0 83s  0 74s with  -Ounchecked   C    Apple LLVM 8 0 0     0 74s   Swift     Swift 4 0 code import Foundation  func doTest   - gt  Void       let arraySize   10000000     var randomNumbers    UInt32         for   in 0   lt arraySize           randomNumbers append arc4random uniform UInt32 arraySize               let start   Date       randomNumbers sort       let end   Date        print randomNumbers 0       print  Elapsed time    end timeIntervalSince start        doTest     Results   Swift 1 1  xcrun swiftc --version Swift version 1 1  swift-600 0 54 20  Target  x86 64-apple-darwin14 0 0  xcrun swiftc -O SwiftSort swift   SwiftSort      Elapsed time  1 02204304933548   Swift 1 2  xcrun swiftc --version Apple Swift version 1 2  swiftlang-602 0 49 6 clang-602 0 49  Target  x86 64-apple-darwin14 3 0  xcrun -sdk macosx swiftc -O SwiftSort swift   SwiftSort      Elapsed time  0 738763988018036   Swift 2 0  xcrun swiftc --version Apple Swift version 2 0  swiftlang-700 0 59 clang-700 0 72  Target  x86 64-apple-darwin15 0 0  xcrun -sdk macosx swiftc -O SwiftSort swift   SwiftSort      Elapsed time  0 767306983470917   It seems to be the same performance if I compile with -Ounchecked   Swift 3 0  xcrun swiftc --version Apple Swift version 3 0  swiftlang-800 0 46 2 clang-800 0 38  Target  x86 64-apple-macosx10 9  xcrun -sdk macosx swiftc -O SwiftSort swift   SwiftSort      Elapsed time  0 939633965492249  xcrun -sdk macosx swiftc -Ounchecked SwiftSort swift   SwiftSort      Elapsed time  0 866258025169373   There seems to have been a performance regression from Swift 2 0 to Swift 3 0  and I m also seeing a difference between -O and -Ounchecked for the first time   Swift 4 0  xcrun swiftc --version Apple Swift version 4 0 2  swiftlang-900 0 69 2 clang-900 0 38  Target  x86 64-apple-macosx10 9  xcrun -sdk macosx swiftc -O SwiftSort swift   SwiftSort      Elapsed time  0 834299981594086  xcrun -sdk macosx swiftc -Ounchecked SwiftSort swift   SwiftSort      Elapsed time  0 742045998573303   Swift 4 improves the performance again  while maintaining a gap between -O and -Ounchecked  -O -whole-module-optimization did not appear to make a difference   C     include  lt chrono gt   include  lt iostream gt   include  lt vector gt   include  lt cstdint gt   include  lt stdlib h gt   using namespace std  using namespace std  chrono   int main int argc  const char   argv          const auto arraySize   10000000      vector lt uint32 t gt  randomNumbers       for  int i   0  i  lt  arraySize    i            randomNumbers emplace back arc4random uniform arraySize               const auto start   high resolution clock  now        sort begin randomNumbers   end randomNumbers        const auto end   high resolution clock  now         cout  lt  lt  randomNumbers 0   lt  lt    n       cout  lt  lt   Elapsed time     lt  lt  duration cast lt duration lt double gt  gt  end - start  count    lt  lt    n        return 0      Results   Apple Clang 6 0  clang   --version Apple LLVM version 6 0  clang-600 0 54   based on LLVM 3 5svn  Target  x86 64-apple-darwin14 0 0 Thread model  posix  clang   -O3 -std c  11 CppSort cpp -o CppSort   CppSort      Elapsed time  0 688969   Apple Clang 6 1 0  clang   --version Apple LLVM version 6 1 0  clang-602 0 49   based on LLVM 3 6 0svn  Target  x86 64-apple-darwin14 3 0 Thread model  posix  clang   -O3 -std c  11 CppSort cpp -o CppSort   CppSort      Elapsed time  0 670652   Apple Clang 7 0 0  clang   --version Apple LLVM version 7 0 0  clang-700 0 72  Target  x86 64-apple-darwin15 0 0 Thread model  posix  clang   -O3 -std c  11 CppSort cpp -o CppSort   CppSort      Elapsed time  0 690152   Apple Clang 8 0 0  clang   --version Apple LLVM version 8 0 0  clang-800 0 38  Target  x86 64-apple-darwin15 6 0 Thread model  posix  clang   -O3 -std c  11 CppSort cpp -o CppSort   CppSort      Elapsed time  0 68253   Apple Clang 9 0 0  clang   --version Apple LLVM version 9 0 0  clang-900 0 38  Target  x86 64-apple-darwin16 7 0 Thread model  posix  clang   -O3 -std c  11 CppSort cpp -o CppSort   CppSort      Elapsed time  0 736784   Verdict  As of the time of this writing  Swift s sort is fast  but not yet as fast as C   s sort when compiled with -O  with the above compilers  amp  libraries  With -Ounchecked  it appears to be as fast as C   in Swift 4 0 2 and Apple LLVM 9 0 0

User · Answer

TL DR  Yes  the only Swift language implementation is slow  right now  If you need fast  numeric  and other types of code  presumably  code  just go with another one  In the future  you should re-evaluate your choice  It might be good enough for most application code that is written at a higher level  though   From what I m seeing in SIL and LLVM IR  it seems like they need a bunch of optimizations for removing retains and releases  which might be implemented in Clang  for Objective-C   but they haven t ported them yet  That s the theory I m going with  for now    I still need to confirm that Clang does something about it   since a profiler run on the last test-case of this question yields this    pretty    result      As was said by many others  -Ofast is totally unsafe and changes language semantics  For me  it s at the    If you re going to use that  just use another language    stage  I ll re-evaluate that choice later  if it changes   -O3 gets us a bunch of swift retain and swift release calls that  honestly  don t look like they should be there for this example  The optimizer should have elided  most of  them AFAICT  since it knows most of the information about the array  and knows that it has  at least  a strong reference to it   It shouldn t emit more retains when it s not even calling functions which might release the objects  I don t think an array constructor can return an array which is smaller than what was asked for  which means that a lot of checks that were emitted are useless  It also knows that the integer will never be above 10k  so the overflow checks can be optimized  not because of -Ofast weirdness  but because of the semantics of the language  nothing else is changing that var nor can access it  and adding up to 10k is safe for the type Int    The compiler might not be able to unbox the array or the array elements  though  since they re getting passed to sort    which is an external function and has to get the arguments it s expecting  This will make us have to use the Int values indirectly  which would make it go a bit slower  This could change if the sort   generic function  not in the multi-method way  was available to the compiler and got inlined   This is a very new  publicly  language  and it is going through what I assume are lots of changes  since there are people  heavily  involved with the Swift language asking for feedback and they all say the language isn t finished and will change   Code used   import Cocoa  let swift start   NSDate timeIntervalSinceReferenceDate    let n  Int   10000 let x   Int   count  n  repeatedValue  1  for i in 0  n       for j in 0  n           let tmp  Int   x j          x i    tmp         let y  Int     sort x  let swift stop   NSDate timeIntervalSinceReferenceDate     println    swift stop - swift start s     P S  I m not an expert on Objective-C nor all the facilities from Cocoa  Objective-C  or the Swift runtimes  I might also be assuming some things that I didn t write

User · Answer

tl dr Swift 1 0 is now as fast as C by this benchmark using the default release optimisation level  -O      Here is an in-place quicksort in Swift Beta   func quicksort swift inout a CInt    start Int  end Int        if  end - start  lt  2           return           var p   a start    end - start  2      var l   start     var r   end - 1     while  l  lt   r           if  a l   lt  p               l    1             continue                   if  a r   gt  p               r -  1             continue                   var t   a l          a l    a r          a r    t         l    1         r -  1           quicksort swift  amp a  start  r   1      quicksort swift  amp a  r   1  end      And the same in C   void quicksort c int  a  int n        if  n  lt  2          return      int p   a n   2       int  l   a      int  r   a   n - 1      while  l  lt   r            if   l  lt  p                l                continue                    if   r  gt  p                r--              continue                    int t    l           l      r           r--   t            quicksort c a  r - a   1       quicksort c l  a   n - l       Both work   var a swift CInt      0 5 2 8 1234 -1 2  var a c CInt      0 5 2 8 1234 -1 2   quicksort swift  amp a swift  0  a swift count  quicksort c  amp a c  CInt a c count        -1  0  2  2  5  8  1234      -1  0  2  2  5  8  1234    Both are called in the same program as written   var x swift   CInt   count  n  repeatedValue  0  var x c   CInt   count  n  repeatedValue  0  for var i   0  i  lt  n    i       x swift i    CInt random        x c i    CInt random       let swift start UInt64   mach absolute time    quicksort swift  amp x swift  0  x swift count  let swift stop UInt64   mach absolute time     let c start UInt64   mach absolute time    quicksort c  amp x c  CInt x c count   let c stop UInt64   mach absolute time      This converts the absolute times to seconds   static const uint64 t NANOS PER USEC   1000ULL  static const uint64 t NANOS PER MSEC   1000ULL   NANOS PER USEC  static const uint64 t NANOS PER SEC   1000ULL   NANOS PER MSEC   mach timebase info data t timebase info   uint64 t abs to nanos uint64 t abs        if   timebase info denom    0              void mach timebase info  amp timebase info             return abs   timebase info numer    timebase info denom     double abs to seconds uint64 t abs        return abs to nanos abs     double NANOS PER SEC      Here is a summary of the compiler s optimazation levels    -Onone  no optimizations  the default for debug   -O      perform optimizations  the default for release   -Ofast  perform optimizations and disable runtime overflow checks and runtime type checks    Time in seconds with  -Onone  for n 10 000   Swift             0 895296452 C                 0 001223848   Here is Swift s builtin sort   for n 10 000   Swift builtin     0 77865783   Here is  -O  for n 10 000   Swift             0 045478346 C                 0 000784666 Swift builtin     0 032513488   As you can see  Swift s performance improved by a factor of 20   As per mweathers  answer  setting  -Ofast  makes the real difference  resulting in these times for n 10 000   Swift             0 000706745 C                 0 000742374 Swift builtin     0 000603576   And for n 1 000 000   Swift             0 107111846 C                 0 114957179 Swift sort        0 092688548   For comparison  this is with  -Onone  for n 1 000 000   Swift             142 659763258 C                 0 162065333 Swift sort        114 095478272   So Swift with no optimizations was almost 1000x slower than C in this benchmark  at this stage in its development   On the other hand with both compilers set to  -Ofast  Swift actually performed at least as well if not slightly better than C   It has been pointed out that  -Ofast  changes the semantics of the language  making it potentially unsafe  This is what Apple states in the Xcode 5 0 release notes      A new optimization level -Ofast  available in LLVM  enables aggressive optimizations  -Ofast relaxes some conservative restrictions  mostly for floating-point operations  that are safe for most code  It can yield significant high-performance wins from the compiler    They all but advocate it  Whether that s wise or not I couldn t say  but from what I can tell it seems reasonable enough to use  -Ofast  in a release if you re not doing high-precision floating point arithmetic and you re confident no integer or array overflows are possible in your program  If you do need high performance and overflow checks   precise arithmetic then choose another language for now   BETA 3 UPDATE   n 10 000 with  -O    Swift             0 019697268 C                 0 000718064 Swift sort        0 002094721   Swift in general is a bit faster and it looks like Swift s built-in sort has changed quite significantly   FINAL UPDATE    -Onone    Swift    0 678056695 C        0 000973914    -O    Swift    0 001158492 C        0 001192406    -Ounchecked    Swift    0 000827764 C        0 001078914

User · Answer

As of Xcode 7 you can turn on Fast  Whole Module Optimization  This should increase your performance immediately

[swift] Swift Beta performance: sorting arrays

Examples related to swift

Examples related to performance

Examples related to sorting

Examples related to xcode6

Examples related to compiler-optimization