How can I quickly sum all numbers in a file

Question

I have a file which contains several thousand numbers  each on it s own line   34 42 11 6 2 99       I m looking to write a script which will print the sum of all numbers in the file  I ve got a solution  but it s not very efficient   It takes several minutes to run   I m looking for a more efficient solution  Any suggestions

User · Answer

Another option is to use jq     seq 10 jq -s add 55   -s  --slurp  reads the input lines into an array

User · Answer

With Ruby   ruby -e  File read  file txt   split inject 0   mem  obj  mem    obj to f

User · Answer

I don t know if you can get a lot better than this  considering you need to read through the whole file    sum   0  while  lt  gt        sum          print  sum

User · Answer

None of the solution thus far use paste   Here s one   paste -sd  filename   bc   As an example  calculate Sn where 1 lt  n lt  100000     seq 100000   paste -sd    bc -l 5000050000    For the curious  seq n would print a sequence of numbers from 1 to n given a positive number n

User · Answer

Just for fun  lets do it with PDL  Perl s array math engine   perl -MPDL -E  say rcols shift - gt sum  datafile   rcols reads columns into a matrix  1D in this case  and sum  surprise  sums all the element of the matrix

User · Answer

GNU Parallel can presumably be used to improve many of the above answers by spreading the workload across multiple cores   In the example below we send chunks of 500 numbers  --max-lines 500  to bc processes which are executed in parallel 4 at a time  -j 4   The results are then aggregated by a final bc   time parallel --max-lines 500 -j 4 --pipe  paste -sd  -   bc   lt  random numbers   paste -sd  -   bc   The optimal choice of work size and number of parallel processes depends on the machine and problem  Note that this solution only really shines when there s a large number of parallel processes with substantial work each

User · Answer

Another for fun  sum 0 for i in   cat file  do sum    sum  i   done echo  sum   or another bash only  s 0 while read l  do s    s  l   done lt file echo  s   But awk solution is probably best as it s most compact

User · Answer

More succinct     Ruby ruby -e  puts open  random numbers   map  amp  to i  reduce         Python python -c  print sum int l  for l in open  random numbers

User · Answer

Running R scripts  I ve written an R script to take arguments of a file name and sum the lines       usr local bin R file commandArgs trailingOnly TRUE  1  sum as numeric readLines file      This can be sped up with the  data table  or  vroom  package as follows       usr local bin R file commandArgs trailingOnly TRUE  1  sum data table  fread file         usr local bin R file commandArgs trailingOnly TRUE  1  sum vroom  vroom file     Benchmarking  Same benchmarking data as  glenn jackman   for   i 0  i lt 1000000  i       do echo  RANDOM  done  gt  random numbers   In comparison to the R call above  running R 3 5 0 as a script is comparable to other methods  on the same Linux Debian server      time R -e  sum scan  random numbers        0 37s user  0 04s system  86  cpu  0 478 total   R script with readLines    time Rscript sum R random numbers   0 53s user   0 04s system   84  cpu   0 679 total   R script with data table    time Rscript sum R random numbers       0 30s user  0 05s system  77  cpu  0 453 total   R script with vroom    time Rscript sum R random numbers        0 54s user    0 11s system   93  cpu   0 696 total   Comparison with other languages  For reference here as some other methods suggested on the same hardware  Python 2  2 7 13     time python2 -c  import sys  print sum  float l  for l in sys stdin     lt  random numbers   0 27s user 0 00s system 89  cpu 0 298 total   Python 3  3 6 8     time python3 -c  import sys  print sum  float l  for l in sys stdin      lt  random number 0 37s user 0 02s system 98  cpu 0 393 total   Ruby  2 3 3      time ruby -e  sum   0  File foreach ARGV shift    line  sum  line to i   puts sum  random numbers  0 42s user  0 03s system  72  cpu  0 625 total   Perl  5 24 1     time perl -nle   sum         END   print  sum  random numbers  0 24s user  0 01s system  99  cpu  0 249 total   Awk  4 1 4     time awk    sum     0   END   print sum    random numbers  0 26s user  0 01s system  99  cpu  0 265 total   time awk    sum     1   END   print sum    random numbers  0 34s user  0 01s system  99  cpu  0 354 total   C  clang version 3 3  gcc  Debian 6 3 0-18  6 3 0       gcc sum c -o sum  amp  amp  time   sum  lt  random numbers     0 10s user  0 00s system  96  cpu  0 108 total   Update with additional languages  Lua  5 3 5     time lua -e  sum 0  for line in io lines   do sum sum line end  print sum    lt  random numbers   0 30s user   0 01s system  98  cpu  0 312 total   tr  8 26  must be timed in bash  not compatible with zsh   time     tr   n     lt  random numbers   echo 0      bc    real    0m0 494s user    0m0 488s sys 0m0 044s   sed  4 4  must be timed in bash  not compatible with zsh     time   head -n 10000 random numbers   sed   a N s  n    ta   bc    real    0m0 631s user    0m0 628s sys     0m0 008s    time   head -n 100000 random numbers   sed   a N s  n    ta   bc    real    1m2 593s user    1m2 588s sys     0m0 012s   note  sed calls seem to work faster on systems with more memory available  note smaller datasets used for benchmarking sed   Julia  0 5 0     time julia -e  print sum readdlm  random numbers       3 00s user   1 39s system   136  cpu   3 204 total    time julia -e  print sum readtable  random numbers       0 63s user   0 96s system   248  cpu   0 638 total   Notice that as in R  file I O methods have different performance

User · Answer

Bash variant  raw   cat file  echo       raw     n           wc -l file 10000 file    time   test 323390  real    0m3 096s user    0m3 095s sys     0m0 000s

User · Answer

cat nums   perl -ne   sum           print  sum     same as brian d foy s answer  without  END

User · Answer

In shell using awk  I have used below script to do so          bin bash   total 0   for i in    awk    print  1      lt myfile gt    do  total   echo  total  i   bc      count     done echo  scale 2   total     bc

User · Answer

Just to be ridiculous   cat f   tr   n        perl -pne chop   R --vanilla --slave

User · Answer

sed   a N s  n    ta  file bc

User · Answer

In Go   package main  import        bufio       fmt       os       strconv     func main         scanner    bufio NewScanner os Stdin      sum    int64 0      for scanner Scan             v  err    strconv ParseInt scanner Text    10  64          if err    nil               fmt Fprintf os Stderr   Not an integer    s  n   scanner Text                os Exit 1                    sum    v           fmt Println sum

User · Answer

Here is a solution using python with a generator expression  Tested with a million numbers on my old cruddy laptop   time python -c  import sys  print sum  float l  for l in sys stdin     lt  file  real    0m0 619s user    0m0 512s sys     0m0 028s

User · Answer

For a Perl one-liner  it s basically the same thing as the awk solution in Ayman Hourieh s answer      perl -nle   sum         END   print  sum    If you re curious what Perl one-liners do  you can deparse them       perl -MO Deparse -nle   sum         END   print  sum    The result is a more verbose version of the program  in a form that no one would ever write on their own   BEGIN          n          n     LINE  while  defined       lt ARGV gt          chomp          sum          sub END       print  sum    -e syntax OK   Just for giggles  I tried this with a file containing 1 000 000 numbers  in the range 0 - 9 999   On my Mac Pro  it returns virtually instantaneously  That s too bad  because I was hoping using mmap would be really fast  but it s just the same time   use 5 010  use File  Map qw map file    map file my  map   ARGV 0     sum     1 while  map    m   d   g   say  sum

User · Answer

It is not easier to replace all new lines by    add a 0 and send it to the Ruby interpreter    sed -e  s       file  echo 0  irb   If you do not have irb  you can send it to bc  but you have to remove all newlines except the last one  of echo   It is better to use tr for this  unless you have a PhD in sed     sed -e  s       file tr -d   n   echo 0  bc

User · Answer

Perl  6  say sum lines      perl6 -e   say for 0  1000000   gt  test in     perl6 -e  say sum lines   lt  test in 500000500000

User · Answer

Here s another   open FIL   a txt     my  sum   0  foreach   lt FIL gt     chomp   sum          close FIL    print  Sum    sum n

User · Answer

I have not tested this but it should work   cat f   tr   n        sed  s     n     bc   You might have to add   n  to the string before bc  like via echo  if bc doesn t treat EOF and EOL

User · Answer

You can do it with Alacon - command-line utility for Alasql database   It works with Node js  so you need to install Node js and then Alasql package   To calculate sum from TXT file you can use the following command    gt  node alacon  SELECT VALUE SUM  0   FROM TXT  mydata txt

User · Answer

Here s another one-liner    echo 0   sed  s        foo   echo p     dc   This assumes the numbers are integers  If you need decimals  try    echo 0 2k   sed  s        foo   echo p     dc   Adjust 2 to the number of decimals needed

User · Answer

Just for fun  let s benchmark it     for   i 0  i lt 1000000  i       do echo  RANDOM  done  gt  random numbers    time perl -nle   sum         END   print  sum  random numbers 16379866392  real    0m0 226s user    0m0 219s sys     0m0 002s    time awk    sum     1   END   print sum    random numbers 16379866392  real    0m0 311s user    0m0 304s sys     0m0 005s    time     tr   n     lt  random numbers   echo 0      bc    16379866392  real    0m0 445s user    0m0 438s sys     0m0 024s    time   s 0 while read l  do s    s  l   done lt random numbers echo  s    16379866392  real    0m9 309s user    0m8 404s sys     0m0 887s    time   s 0 while read l  do   s  l   done lt random numbers echo  s    16379866392  real    0m7 191s user    0m6 402s sys     0m0 776s    time   sed   a N s  n    ta  random numbers bc     C  real    4m53 413s user    4m52 584s sys 0m0 052s   I aborted the sed run after 5 minutes    I ve been diving to lua  and it is speedy     time lua -e  sum 0  for line in io lines   do sum sum line end  print sum    lt  random numbers 16388542582 0  real    0m0 362s user    0m0 313s sys     0m0 063s   and while I m updating this  ruby     time ruby -e  sum   0  File foreach ARGV shift    line  sum  line to i   puts sum  random numbers 16388542582  real    0m0 378s user    0m0 297s sys     0m0 078s     Heed Ed Morton s advice  using  1    time awk    sum     1   END   print sum    random numbers 16388542582  real    0m0 421s user    0m0 359s sys     0m0 063s   vs using  0    time awk    sum     0   END   print sum    random numbers 16388542582  real    0m0 302s user    0m0 234s sys     0m0 063s

User · Answer

I couldn t just pass by    Here s my Haskell one-liner  It s actually quite readable   sum  lt   gt   read  lt   gt    lt   gt  lines  lt   gt  getContents   Unfortunately there s no ghci -e to just run it  so it needs the main function  print and compilation   main    sum  lt   gt   read  lt   gt    lt   gt  lines  lt   gt  getContents   gt  gt   print   To clarify  we read entire input  getContents   split it by lines  read as numbers and sum   lt   gt  is fmap operator - we use it instead of usual function application because sure this all happens in IO  read needs an additional fmap  because it is also in the list     ghc sum hs  1 of 1  Compiling Main               sum hs  sum o   Linking sum         sum  1 2 4  D 7   Here s a strange upgrade to make it work with floats   main     0 0      lt   gt  sum  lt   gt   read  lt   gt    lt   gt  lines  lt   gt  getContents   gt  gt   print       sum  1 3 2 1 4 2  D 7 6000000000000005

User · Answer

C always wins for speed    include  lt stdio h gt   include  lt stdlib h gt   int main int argc  char   argv        ssize t read      char  line   NULL      size t len   0      double sum   0 0       while  read   getline  amp line   amp len  stdin     -1            sum    atof line              printf   f   sum       return 0      Timing for 1M numbers  same machine input as my python answer      gcc sum c -o sum  amp  amp  time   sum  lt  numbers  5003371677 000000 real    0m0 188s user    0m0 180s sys     0m0 000s

User · Answer

perl -MList  Util sum -le  print sum  lt  gt   nums txt

User · Answer

I prefer to use R for this     R -e  sum scan  filename

User · Answer

C    one-liner     include  lt iostream gt   include  lt iterator gt   include  lt numeric gt  using namespace std   int main         cout  lt  lt  accumulate istream iterator lt int gt  cin   istream iterator lt int gt     0   lt  lt  endl

User · Answer

I prefer to use GNU datamash for such tasks because it s more succinct and legible than perl or awk   For example  datamash sum 1  lt  myfile   where 1 denotes the first column of data

User · Answer

You can use awk   awk    sum     1   END   print sum    file

User · Answer

This is straight Bash   sum 0 while read -r line do        sum    line    done  lt  file echo  sum

User · Answer

One in tcl      usr bin env tclsh set sum 0 while   gets stdin num   gt   0    incr sum  num   puts  sum

[linux] How can I quickly sum all numbers in a file?

Examples related to linux

Examples related to perl

Examples related to bash

Examples related to shell

Examples related to awk