[linux] How do I measure separate CPU core usage for a process?

Is there any way to measure a specific process CPU usage by cores?

I know top is good for measuring the whole system's CPU usage by cores and taskset can provide information about which CPU core is allowed for the process to run on.

But how do I measure a specific process' CPU usage by CPU cores?

This question is related to linux multicore measurement performance

The answer is


The ps solution was nearly what I needed and with some bash thrown in does exactly what the original question asked for: to see per-core usage of specific processes

This shows per-core usage of multi-threaded processes too.

Use like: cpustat `pgrep processname` `pgrep otherprocessname` ...

#!/bin/bash

pids=()
while [ $# != 0 ]; do
        pids=("${pids[@]}" "$1")
        shift
done

if [ -z "${pids[0]}" ]; then
        echo "Usage: $0 <pid1> [pid2] ..."
        exit 1
fi

for pid in "${pids[@]}"; do
        if [ ! -e /proc/$pid ]; then
                echo "Error: pid $pid doesn't exist"
                exit 1
        fi
done

while [ true ]; do
        echo -e "\033[H\033[J"
        for pid in "${pids[@]}"; do
                ps -p $pid -L -o pid,tid,psr,pcpu,comm=
        done
        sleep 1
done

Note: These stats are based on process lifetime, not the last X seconds, so you'll need to restart your process to reset the counter.


You can use:

 mpstat -P ALL 1

It shows how much each core is busy and it updates automatically each second. The output would be something like this (on a quad-core processor):

10:54:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:54:42 PM  all    8.20    0.12    0.75    0.00    0.00    0.00    0.00    0.00   90.93
10:54:42 PM    0   24.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   74.00
10:54:42 PM    1   22.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   76.00
10:54:42 PM    2    2.02    1.01    0.00    0.00    0.00    0.00    0.00    0.00   96.97
10:54:42 PM    3    2.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00
10:54:42 PM    4   14.15    0.00    1.89    0.00    0.00    0.00    0.00    0.00   83.96
10:54:42 PM    5    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
10:54:42 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:54:42 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

This command doesn't answer original question though i.e. it does not show CPU core usage for a specific process.


htop gives a nice overview of individual core usage


you can use ps.
e.g. having python process with two busy threads on dual core CPU:

$ ps -p 29492 -L -o pid,tid,psr,pcpu
  PID   TID PSR %CPU
29492 29492   1  0.0
29492 29493   1 48.7
29492 29494   1 51.9

(PSR is CPU id the thread is currently assigned to)

you see that the threads are running on the same cpu core (because of GIL)

running the same python script in jython, we see, that the script is utilizing both cores (and there are many other service or whatever threads, which are almost idle):

$ ps -p 28671 -L -o pid,tid,psr,pcpu
  PID   TID PSR %CPU
28671 28671   1  0.0
28671 28672   0  4.4
28671 28673   0  0.6
28671 28674   0  0.5
28671 28675   0  2.3
28671 28676   0  0.0
28671 28677   1  0.0
28671 28678   1  0.0
28671 28679   0  4.6
28671 28680   0  4.4
28671 28681   1  0.0
28671 28682   1  0.0
28671 28721   1  0.0
28671 28729   0 88.6
28671 28730   1 88.5

you can process the output and calculate the total CPU for each CPU core.

Unfortunately, this approach does not seem to be 100% reliable, sometimes i see that in the first case, the two working threads are reported to be separated to each CPU core, or in the latter case, the two threads are reported to be on the same core..


I thought perf stat is what you need.

It shows a specific usage of a process when you specify a --cpu=list option. Here is an example of monitoring cpu usage of building a project using perf stat --cpu=0-7 --no-aggr -- make all -j command. The output is:

CPU0         119254.719293 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU1         119254.724776 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU2         119254.724179 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU3         119254.720833 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU4         119254.714109 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU5         119254.727721 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU6         119254.723447 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU7         119254.722418 task-clock (msec)         #    1.000 CPUs utilized            (100.00%)
CPU0                 8,108 context-switches          #    0.068 K/sec                    (100.00%)
CPU1                26,494 context-switches                                              (100.00%)
CPU2                10,193 context-switches                                              (100.00%)
CPU3                12,298 context-switches                                              (100.00%)
CPU4                16,179 context-switches                                              (100.00%)
CPU5                57,389 context-switches                                              (100.00%)
CPU6                 8,485 context-switches                                              (100.00%)
CPU7                10,845 context-switches                                              (100.00%)
CPU0                   167 cpu-migrations            #    0.001 K/sec                    (100.00%)
CPU1                    80 cpu-migrations                                                (100.00%)
CPU2                   165 cpu-migrations                                                (100.00%)
CPU3                   139 cpu-migrations                                                (100.00%)
CPU4                   136 cpu-migrations                                                (100.00%)
CPU5                   175 cpu-migrations                                                (100.00%)
CPU6                   256 cpu-migrations                                                (100.00%)
CPU7                   195 cpu-migrations                                                (100.00%)

The left column is the specific CPU index and the right most column is the usage of the CPU. If you don't specify the --no-aggr option, the result will aggregated together. The --pid=pid option will help if you want to monitor a running process.

Try -a --per-core or -a perf-socket too, which will present more classified information.

More about usage of perf stat can be seen in this tutorial: perf cpu statistic, also perf help stat will help on the meaning of the options.


You can still do this in top. While top is running, press '1' on your keyboard, it will then show CPU usage per core.

Limit the processes shown by having that specific process run under a specific user account and use Type 'u' to limit to that user


dstat -C 0,1,2,3 

Will also give you the CPU usage of first 4 cores. Of course, if you have 32 cores then this command gets a little bit longer but useful if you only interested in few cores.

For example, if you only interested in core 3 and 7 then you could do

dstat -C 3,7

I had just this problem and I found a similar answer here.

The method is to set top the way you want it and then press W (capital W). This saves top's current layout to a configuration file in $HOME/.toprc

Although this might not work if you want to run multiple top's with different configurations.

So via what I consider a work around you can write to different config files / use different config files by doing one of the following...

1) Rename the binary

  ln -s /usr/bin/top top2
  ./top2

Now .top2rc is going to be written to your $HOME

2) Set $HOME to some alternative path, since it will write its config file to the $HOME/.binary-name.rc file

HOME=./
top

Now .toprc is going to be written to the current folder.

Via use of other peoples comments to add the various usage accounting in top you can create a batch output for that information and latter coalesces the information via a script. Maybe not quite as simple as you script but I found top to provide me ALL processes so that later I can recap and capture a state during a long run that I might have missed otherwise (unexplained sudden CPU usage due to stray processes)


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to multicore

How do I measure separate CPU core usage for a process? Compiling with g++ using multiple cores

Examples related to measurement

Font size relative to the user's screen resolution? How do I measure separate CPU core usage for a process? Easily measure elapsed time

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder