What do real user and sys mean in the output of time 1

Question

time foo real        0m0 003s user        0m0 000s sys         0m0 004s     What do  real    user  and  sys  mean in the output of time   Which one is meaningful when benchmarking my app

User · Answer

I want to mention some other scenario when the real-time is much much bigger than user + sys. I've created a simple server which respondes after a long time

real 4.784
user 0.01s
sys  0.01s

the issue is that in this scenario the process waits for the response which is not on the user site nor in the system.

Something similar happens when you run the find command. In that case, the time is spent mostly on requesting and getting a response from SSD.

User · Answer

Real  User and Sys process time statistics  One of these things is not like the other   Real refers to actual elapsed time  User and Sys refer to CPU time used only by the process    Real is wall clock time - time from start to finish of the call   This is all elapsed time including time slices used by other processes and time the process spends blocked  for example if it is waiting for I O to complete   User is the amount of CPU time spent in user-mode code  outside the kernel  within the process   This is only actual CPU time used in executing the process   Other processes and time the process spends blocked do not count towards this figure  Sys is the amount of CPU time spent in the kernel within the process   This means executing CPU time spent in system calls within the kernel  as opposed to library code  which is still running in user-space   Like  user   this is only CPU time used by the process   See below for a brief description of kernel mode  also known as  supervisor  mode  and the system call mechanism    User Sys will tell you how much actual CPU time your process used   Note that this is across all CPUs  so if the process has multiple threads  and this process is running on a computer with more than one processor  it could potentially exceed the wall clock time reported by Real  which usually occurs    Note that in the output these figures include the User and Sys time of all child processes  and their descendants  as well when they could have been collected  e g  by wait 2  or waitpid 2   although the underlying system calls return the statistics for the process and its children separately   Origins of the statistics reported by time  1   The statistics reported by time are gathered from various system calls    User  and  Sys  come from wait  2   POSIX  or times  2   POSIX   depending on the particular system    Real  is calculated from a start and end time gathered from the gettimeofday  2  call   Depending on the version of the system  various other statistics such as the number of context switches may also be gathered by time   On a multi-processor machine  a multi-threaded process or a process forking children could have an elapsed time smaller than the total CPU time - as different threads or processes may run in parallel   Also  the time statistics reported come from different origins  so times recorded for very short running tasks may be subject to rounding errors  as the example given by the original poster shows   A brief primer on Kernel vs  User mode  On Unix  or any protected-memory operating system   Kernel  or  Supervisor  mode refers to a privileged mode that the CPU can operate in   Certain privileged actions that could affect security or stability can only be done when the CPU is operating in this mode  these actions are not available to application code   An example of such an action might be manipulation of the MMU to gain access to the address space of another process   Normally  user-mode code cannot do this  with good reason   although it can request shared memory from the kernel  which could be read or written by more than one process   In this case  the shared memory is explicitly requested from the kernel through a secure mechanism and both processes have to explicitly attach to it in order to use it   The privileged mode is usually referred to as  kernel  mode because the kernel is executed by the CPU running in this mode   In order to switch to kernel mode you have to issue a specific instruction  often called a trap  that switches the CPU to running in kernel mode and runs code from a specific location held in a jump table   For security reasons  you cannot switch to kernel mode and execute arbitrary code - the traps are managed through a table of addresses that cannot be written to unless the CPU is running in supervisor mode   You trap with an explicit trap number and the address is looked up in the jump table  the kernel has a finite number of controlled entry points   The  system  calls in the C library  particularly those described in Section 2 of the man pages  have a user-mode component  which is what you actually call from your C program   Behind the scenes  they may issue one or more system calls to the kernel to do specific services such as I O  but they still also have code running in user-mode   It is also quite possible to directly issue a trap to kernel mode from any user space code if desired  although you may need to write a snippet of assembly language to set up the registers correctly for the call   More about  sys   There are things that your code cannot do from user mode - things like allocating memory or accessing hardware  HDD  network  etc    These are under the supervision of the kernel  and it alone can do them  Some operations like malloc orfread fwrite will invoke these kernel functions and that then will count as  sys  time  Unfortunately it s not as simple as  every call to malloc will be counted in  sys  time   The call to malloc will do some processing of its own  still counted in  user  time  and then somewhere along the way it may call the function in kernel  counted in  sys  time   After returning from the kernel call  there will be some more time in  user  and then malloc will return to your code  As for when the switch happens  and how much of it is spent in kernel mode    you cannot say  It depends on the implementation of the library  Also  other seemingly innocent functions might also use malloc and the like in the background  which will again have some time in  sys  then

User · Answer

Real shows total turn-around time for a process  while User shows the execution time for user-defined instructions and Sys is for time for executing system calls   Real time includes the waiting time also  the waiting time for I O etc

User · Answer

Minimal runnable POSIX C examples To make things more concrete  I want to exemplify a few extreme cases of time with some minimal C test programs  All programs can be compiled and run with  gcc -ggdb3 -o main out -pthread -std c99 -pedantic-errors -Wall -Wextra main c time   main out  and have been tested in Ubuntu 18 10  GCC 8 2 0  glibc 2 28  Linux kernel 4 18  ThinkPad P51 laptop  Intel Core i7-7820HQ CPU  4 cores   8 threads   2x Samsung M471A2K43BB1-CRC RAM  2x 16GiB   sleep Non-busy sleep does not count in either user or sys  only real  For example  a program that sleeps for a second   define  XOPEN SOURCE 700  include  lt stdlib h gt   include  lt unistd h gt   int main void        sleep 1       return EXIT SUCCESS     GitHub upstream  outputs something like  real    0m1 003s user    0m0 001s sys     0m0 003s  The same holds for programs blocked on IO becoming available  For example  the following program waits for the user to enter a character and press enter   include  lt stdio h gt   include  lt stdlib h gt   int main void        printf  quot  c n quot   getchar         return EXIT SUCCESS     GitHub upstream  And if you wait for about one second  it outputs just like the sleep example something like  real    0m1 003s user    0m0 001s sys     0m0 003s  For this reason time can help you distinguish between CPU and IO bound programs  What do the terms  quot CPU bound quot  and  quot I O bound quot  mean  Multiple threads The following example does niters iterations of useless purely CPU-bound work on nthreads threads   define  XOPEN SOURCE 700  include  lt assert h gt   include  lt inttypes h gt   include  lt pthread h gt   include  lt stdint h gt   include  lt stdio h gt   include  lt stdlib h gt   include  lt unistd h gt   uint64 t niters   void  my thread void  arg        uint64 t  argument  i  result      argument    uint64 t   arg      result    argument      for  i   0  i  lt  niters    i            result    result   result  -  3   result    1             argument   result      return NULL     int main int argc  char   argv        size t nthreads      pthread t  threads      uint64 t rc  i   thread args          CLI args         if  argc  gt  1            niters   strtoll argv 1   NULL  0         else           niters   1000000000            if  argc  gt  2            nthreads   strtoll argv 2   NULL  0         else           nthreads   1            threads   malloc nthreads   sizeof  threads        thread args   malloc nthreads   sizeof  thread args            Create all threads        for  i   0  i  lt  nthreads    i            thread args i    i          rc   pthread create               amp threads i               NULL              my thread               void   amp thread args i                     assert rc    0                 Wait for all threads to complete        for  i   0  i  lt  nthreads    i            rc   pthread join threads i   NULL           assert rc    0           printf  quot   quot  PRIu64  quot    quot  PRIu64  quot  n quot   i  thread args i               free threads       free thread args       return EXIT SUCCESS     GitHub upstream   plot code  Then we plot wall  user and sys as a function of the number of threads for a fixed 10 10 iterations on my 8 hyperthread CPU   Plot data  From the graph  we see that   for a CPU intensive single core application  wall and user are about the same  for 2 cores  user is about 2x wall  which means that the user time is counted across all threads  user basically doubled  and while wall stayed the same   this continues up to 8 threads  which matches my number of hyperthreads in my computer  After 8  wall starts to increase as well  because we don t have any extra CPUs to put more work in a given amount of time  The ratio plateaus at this point    Note that this graph is only so clear and simple because the work is purely CPU-bound  if it were memory bound  then we would get a fall in performance much earlier with less cores because the memory accesses would be a bottleneck as shown at What do the terms  quot CPU bound quot  and  quot I O bound quot  mean  Quickly checking that wall  lt  user is a simple way to determine that a program is multithreaded  and the closer that ratio is to the number of cores  the more effective the parallelization is  e g    multithreaded linkers  Can gcc use multiple cores when linking  C   parallel sort  Are C  17 Parallel Algorithms implemented already   Sys heavy work with sendfile The heaviest sys workload I could come up with was to use the sendfile  which does a file copy operation on kernel space  Copy a file in a sane  safe and efficient way So I imagined that this in-kernel memcpy will be a CPU intensive operation  First I initialize a large 10GiB random file with  dd if  dev urandom of sendfile in tmp bs 1K count 10M  Then run the code   define  GNU SOURCE  include  lt assert h gt   include  lt fcntl h gt   include  lt stdlib h gt   include  lt sys sendfile h gt   include  lt sys stat h gt   include  lt sys types h gt   include  lt unistd h gt   int main int argc  char   argv        char  source path   dest path      int source  dest      struct stat stat source      if  argc  gt  1            source path   argv 1         else           source path    quot sendfile in tmp quot             if  argc  gt  2            dest path   argv 2         else           dest path    quot sendfile out tmp quot             source   open source path  O RDONLY       assert source    -1       dest   open dest path  O WRONLY   O CREAT   O TRUNC  S IRUSR   S IWUSR       assert dest    -1       assert fstat source   amp stat source     -1       assert sendfile dest  source  0  stat source st size     -1       assert close source     -1       assert close dest     -1       return EXIT SUCCESS     GitHub upstream  which gives basically mostly system time as expected  real    0m2 175s user    0m0 001s sys     0m1 476s  I was also curious to see if time would distinguish between syscalls of different processes  so I tried  time   sendfile out sendfile in1 tmp sendfile out1 tmp  amp  time   sendfile out sendfile in2 tmp sendfile out2 tmp  amp   And the result was  real    0m3 651s user    0m0 000s sys     0m1 516s  real    0m4 948s user    0m0 000s sys     0m1 562s  The sys time is about the same for both as for a single process  but the wall time is larger because the processes are competing for disk read access likely  So it seems that it does in fact account for which process started a given kernel work  Bash source code When you do just time  lt cmd gt  on Ubuntu  it use the Bash keyword as can be seen from  type time  which outputs  time is a shell keyword  So we grep source in the Bash 4 19 source code for the output string  git grep   quot user b   which leads us to execute cmd c function time command  which uses   gettimeofday   and getrusage   if both are available times   otherwise  all of which are Linux system calls and POSIX functions  GNU Coreutils source code If we call it as   usr bin time  then it uses the GNU Coreutils implementation  This one is a bit more complex  but the relevant source seems to be at resuse c and it does   a non-POSIX BSD wait3 call if that is available times and gettimeofday otherwise

User · Answer

To expand on the accepted answer  I just wanted to provide another reason why real   user   sys   Keep in mind that real represents actual elapsed time  while user and sys values represent CPU execution time  As a result  on a multicore system  the user and or sys time  as well as their sum  can actually exceed the real time  For example  on a Java app I m running for class I get this set of values   real    1m47 363s user    2m41 318s sys     0m4 013s

User · Answer

real  The actual time spent in running the process from start to finish  as if it was measured by a human with a stopwatch      user  The cumulative time spent by all the CPUs during the computation      sys  The cumulative time spent by all the CPUs during system-related tasks such as memory allocation      Notice that sometimes user   sys might be greater than real  as   multiple processors may work in parallel

User · Answer

In very simple terms  I like to think about it like this   real is the actual amount of time it took to run the command  as if you had timed it with a stopwatch   user and sys are how much  work  the CPU had to do to execute the command   This  work  is expressed in units of time    Generally speaking   user is how much work the CPU did to run to run the command s code sys is how much work the CPU had to do to handle  system overhead  type tasks  such as allocating memory  file I O  ect   in order to support the running command  Since these last two times are counting  work  done  they don t include time a thread might have spent waiting  such as waiting on another process or for disk I O to finish   real  however  is a measure of actual runtime and not  work   so it does include any time spent waiting

[unix] What do 'real', 'user' and 'sys' mean in the output of time(1)?

Examples related to unix

Examples related to time

Examples related to benchmarking