[unix] What do 'real', 'user' and 'sys' mean in the output of time(1)?

Minimal runnable POSIX C examples

To make things more concrete, I want to exemplify a few extreme cases of time with some minimal C test programs.

All programs can be compiled and run with:

gcc -ggdb3 -o main.out -pthread -std=c99 -pedantic-errors -Wall -Wextra main.c
time ./main.out

and have been tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28, Linux kernel 4.18, ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB).

sleep

Non-busy sleep does not count in either user or sys, only real.

For example, a program that sleeps for a second:

#define _XOPEN_SOURCE 700
#include <stdlib.h>
#include <unistd.h>

int main(void) {
    sleep(1);
    return EXIT_SUCCESS;
}

GitHub upstream.

outputs something like:

real    0m1.003s
user    0m0.001s
sys     0m0.003s

The same holds for programs blocked on IO becoming available.

For example, the following program waits for the user to enter a character and press enter:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    printf("%c\n", getchar());
    return EXIT_SUCCESS;
}

GitHub upstream.

And if you wait for about one second, it outputs just like the sleep example something like:

real    0m1.003s
user    0m0.001s
sys     0m0.003s

For this reason time can help you distinguish between CPU and IO bound programs: What do the terms "CPU bound" and "I/O bound" mean?

Multiple threads

The following example does niters iterations of useless purely CPU-bound work on nthreads threads:

#define _XOPEN_SOURCE 700
#include <assert.h>
#include <inttypes.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

uint64_t niters;

void* my_thread(void *arg) {
    uint64_t *argument, i, result;
    argument = (uint64_t *)arg;
    result = *argument;
    for (i = 0; i < niters; ++i) {
        result = (result * result) - (3 * result) + 1;
    }
    *argument = result;
    return NULL;
}

int main(int argc, char **argv) {
    size_t nthreads;
    pthread_t *threads;
    uint64_t rc, i, *thread_args;

    /* CLI args. */
    if (argc > 1) {
        niters = strtoll(argv[1], NULL, 0);
    } else {
        niters = 1000000000;
    }
    if (argc > 2) {
        nthreads = strtoll(argv[2], NULL, 0);
    } else {
        nthreads = 1;
    }
    threads = malloc(nthreads * sizeof(*threads));
    thread_args = malloc(nthreads * sizeof(*thread_args));

    /* Create all threads */
    for (i = 0; i < nthreads; ++i) {
        thread_args[i] = i;
        rc = pthread_create(
            &threads[i],
            NULL,
            my_thread,
            (void*)&thread_args[i]
        );
        assert(rc == 0);
    }

    /* Wait for all threads to complete */
    for (i = 0; i < nthreads; ++i) {
        rc = pthread_join(threads[i], NULL);
        assert(rc == 0);
        printf("%" PRIu64 " %" PRIu64 "\n", i, thread_args[i]);
    }

    free(threads);
    free(thread_args);
    return EXIT_SUCCESS;
}

GitHub upstream + plot code.

Then we plot wall, user and sys as a function of the number of threads for a fixed 10^10 iterations on my 8 hyperthread CPU:

enter image description here

Plot data.

From the graph, we see that:

  • for a CPU intensive single core application, wall and user are about the same

  • for 2 cores, user is about 2x wall, which means that the user time is counted across all threads.

    user basically doubled, and while wall stayed the same.

  • this continues up to 8 threads, which matches my number of hyperthreads in my computer.

    After 8, wall starts to increase as well, because we don't have any extra CPUs to put more work in a given amount of time!

    The ratio plateaus at this point.

Note that this graph is only so clear and simple because the work is purely CPU-bound: if it were memory bound, then we would get a fall in performance much earlier with less cores because the memory accesses would be a bottleneck as shown at What do the terms "CPU bound" and "I/O bound" mean?

Quickly checking that wall < user is a simple way to determine that a program is multithreaded, and the closer that ratio is to the number of cores, the more effective the parallelization is, e.g.:

Sys heavy work with sendfile

The heaviest sys workload I could come up with was to use the sendfile, which does a file copy operation on kernel space: Copy a file in a sane, safe and efficient way

So I imagined that this in-kernel memcpy will be a CPU intensive operation.

First I initialize a large 10GiB random file with:

dd if=/dev/urandom of=sendfile.in.tmp bs=1K count=10M

Then run the code:

#define _GNU_SOURCE
#include <assert.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char **argv) {
    char *source_path, *dest_path;
    int source, dest;
    struct stat stat_source;
    if (argc > 1) {
        source_path = argv[1];
    } else {
        source_path = "sendfile.in.tmp";
    }
    if (argc > 2) {
        dest_path = argv[2];
    } else {
        dest_path = "sendfile.out.tmp";
    }
    source = open(source_path, O_RDONLY);
    assert(source != -1);
    dest = open(dest_path, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
    assert(dest != -1);
    assert(fstat(source, &stat_source) != -1);
    assert(sendfile(dest, source, 0, stat_source.st_size) != -1);
    assert(close(source) != -1);
    assert(close(dest) != -1);
    return EXIT_SUCCESS;
}

GitHub upstream.

which gives basically mostly system time as expected:

real    0m2.175s
user    0m0.001s
sys     0m1.476s

I was also curious to see if time would distinguish between syscalls of different processes, so I tried:

time ./sendfile.out sendfile.in1.tmp sendfile.out1.tmp &
time ./sendfile.out sendfile.in2.tmp sendfile.out2.tmp &

And the result was:

real    0m3.651s
user    0m0.000s
sys     0m1.516s

real    0m4.948s
user    0m0.000s
sys     0m1.562s

The sys time is about the same for both as for a single process, but the wall time is larger because the processes are competing for disk read access likely.

So it seems that it does in fact account for which process started a given kernel work.

Bash source code

When you do just time <cmd> on Ubuntu, it use the Bash keyword as can be seen from:

type time

which outputs:

time is a shell keyword

So we grep source in the Bash 4.19 source code for the output string:

git grep '"user\b'

which leads us to execute_cmd.c function time_command, which uses:

  • gettimeofday() and getrusage() if both are available
  • times() otherwise

all of which are Linux system calls and POSIX functions.

GNU Coreutils source code

If we call it as:

/usr/bin/time

then it uses the GNU Coreutils implementation.

This one is a bit more complex, but the relevant source seems to be at resuse.c and it does:

  • a non-POSIX BSD wait3 call if that is available
  • times and gettimeofday otherwise

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to time

Date to milliseconds and back to date in Swift How to manage Angular2 "expression has changed after it was checked" exception when a component property depends on current datetime how to sort pandas dataframe from one column Convert time.Time to string How to get current time in python and break up into year, month, day, hour, minute? Xcode swift am/pm time to 24 hour format How to add/subtract time (hours, minutes, etc.) from a Pandas DataFrame.Index whos objects are of type datetime.time? What does this format means T00:00:00.000Z? How can I parse / create a date time stamp formatted with fractional seconds UTC timezone (ISO 8601, RFC 3339) in Swift? Extract time from moment js object

Examples related to benchmarking

Why is 2 * (i * i) faster than 2 * i * i in Java? ab load testing Why is reading lines from stdin much slower in C++ than Python? Execution time of C program How to use clock() in C++ Clang vs GCC - which produces faster binaries? How to Calculate Execution Time of a Code Snippet in C++ Which is faster: multiple single INSERTs or one multiple-row INSERT? What do 'real', 'user' and 'sys' mean in the output of time(1)? How do I write a correct micro-benchmark in Java?