[shell] How can I shuffle the lines of a text file on the Unix command line or in a shell script?

I want to shuffle the lines of a text file randomly and create a new file. The file may have several thousands of lines.

How can I do that with cat, awk, cut, etc?

This question is related to shell random command-line awk shuffle

The answer is


We have a package to do the very job:

sudo apt-get install randomize-lines

Example:

Create an ordered list of numbers, and save it to 1000.txt:

seq 1000 > 1000.txt

to shuffle it, simply use

rl 1000.txt

Another awk variant:

#!/usr/bin/awk -f
# usage:
# awk -f randomize_lines.awk lines.txt
# usage after "chmod +x randomize_lines.awk":
# randomize_lines.awk lines.txt

BEGIN {
  FS = "\n";
  srand();
}

{
  lines[ rand()] = $0;
}

END {
  for( k in lines ){
    print lines[k];
  }
}

This is a python script that I saved as rand.py in my home folder:

#!/bin/python

import sys
import random

if __name__ == '__main__':
  with open(sys.argv[1], 'r') as f:
    flist = f.readlines()
    random.shuffle(flist)

    for line in flist:
      print line.strip()

On Mac OSX sort -R and shuf are not available so you can alias this in your bash_profile as:

alias shuf='python rand.py'

Perl one-liner would be a simple version of Maxim's solution

perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' < myfile

here's an awk script

awk 'BEGIN{srand() }
{ lines[++d]=$0 }
END{
    while (1){
    if (e==d) {break}
        RANDOM = int(1 + rand() * d)
        if ( RANDOM in lines  ){
            print lines[RANDOM]
            delete lines[RANDOM]
            ++e
        }
    }
}' file

output

$ cat file
1
2
3
4
5
6
7
8
9
10

$ ./shell.sh
7
5
10
9
6
8
2
1
3
4

Here is a first try that's easy on the coder but hard on the CPU which prepends a random number to each line, sorts them and then strips the random number from each line. In effect, the lines are sorted randomly:

cat myfile | awk 'BEGIN{srand();}{print rand()"\t"$0}' | sort -k1 -n | cut -f2- > myfile.shuffled

Not mentioned as of yet:

  1. The unsort util. Syntax (somewhat playlist oriented):

    unsort [-hvrpncmMsz0l] [--help] [--version] [--random] [--heuristic]
           [--identity] [--filenames[=profile]] [--separator sep] [--concatenate] 
           [--merge] [--merge-random] [--seed integer] [--zero-terminated] [--null] 
           [--linefeed] [file ...]
    
  2. msort can shuffle by line, but it's usually overkill:

    seq 10 | msort -jq -b -l -n 1 -c r
    

If you have Scala installed, here's a one-liner to shuffle the input:

ls -1 | scala -e 'for (l <- util.Random.shuffle(io.Source.stdin.getLines.toList)) println(l)'

I use a tiny perl script, which I call "unsort":

#!/usr/bin/perl
use List::Util 'shuffle';
@list = <STDIN>;
print shuffle(@list);

I've also got a NULL-delimited version, called "unsort0" ... handy for use with find -print0 and so on.

PS: Voted up 'shuf' too, I had no idea that was there in coreutils these days ... the above may still be useful if your systems doesn't have 'shuf'.


Ruby FTW:

ls | ruby -e 'puts STDIN.readlines.shuffle'

A one-liner for python:

python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines)," myFile

And for printing just a single random line:

python -c "import random, sys; print random.choice(open(sys.argv[1]).readlines())," myFile

But see this post for the drawbacks of python's random.shuffle(). It won't work well with many (more than 2080) elements.


In windows You may try this batch file to help you to shuffle your data.txt, The usage of the batch code is

C:\> type list.txt | shuffle.bat > maclist_temp.txt

After issuing this command, maclist_temp.txt will contain a randomized list of lines.

Hope this helps.


A simple and intuitive way would be to use shuf.

Example:

Assume words.txt as:

the
an
linux
ubuntu
life
good
breeze

To shuffle the lines, do:

$ shuf words.txt

which would throws the shuffled lines to standard output; So, you've to pipe it to an output file like:

$ shuf words.txt > shuffled_words.txt

One such shuffle run could yield:

breeze
the
linux
an
ubuntu
good
life

This answer complements the many great existing answers in the following ways:

  • The existing answers are packaged into flexible shell functions:

    • The functions take not only stdin input, but alternatively also filename arguments
    • The functions take extra steps to handle SIGPIPE in the usual way (quiet termination with exit code 141), as opposed to breaking noisily. This is important when piping the function output to a pipe that is closed early, such as when piping to head.
  • A performance comparison is made.


  • POSIX-compliant function based on awk, sort, and cut, adapted from the OP's own answer:
shuf() { awk 'BEGIN {srand(); OFMT="%.17f"} {print rand(), $0}' "$@" |
               sort -k1,1n | cut -d ' ' -f2-; }
shuf() { perl -MList::Util=shuffle -e 'print shuffle(<>);' "$@"; }
shuf() { python -c '
import sys, random, fileinput; from signal import signal, SIGPIPE, SIG_DFL;    
signal(SIGPIPE, SIG_DFL); lines=[line for line in fileinput.input()];   
random.shuffle(lines); sys.stdout.write("".join(lines))
' "$@"; }

See the bottom section for a Windows version of this function.

shuf() { ruby -e 'Signal.trap("SIGPIPE", "SYSTEM_DEFAULT");
                     puts ARGF.readlines.shuffle' "$@"; }

Performance comparison:

Note: These numbers were obtained on a late-2012 iMac with 3.2 GHz Intel Core i5 and a Fusion Drive, running OSX 10.10.3. While timings will vary with OS used, machine specs, awk implementation used (e.g., the BSD awk version used on OSX is usually slower than GNU awk and especially mawk), this should provide a general sense of relative performance.

Input file is a 1-million-lines file produced with seq -f 'line %.0f' 1000000.
Times are listed in ascending order (fastest first):

  • shuf
    • 0.090s
  • Ruby 2.0.0
    • 0.289s
  • Perl 5.18.2
    • 0.589s
  • Python
    • 1.342s with Python 2.7.6; 2.407s(!) with Python 3.4.2
  • awk + sort + cut
    • 3.003s with BSD awk; 2.388s with GNU awk (4.1.1); 1.811s with mawk (1.3.4);

For further comparison, the solutions not packaged as functions above:

  • sort -R (not a true shuffle if there are duplicate input lines)
    • 10.661s - allocating more memory doesn't seem to make a difference
  • Scala
    • 24.229s
  • bash loops + sort
    • 32.593s

Conclusions:

  • Use shuf, if you can - it's the fastest by far.
  • Ruby does well, followed by Perl.
  • Python is noticeably slower than Ruby and Perl, and, comparing Python versions, 2.7.6 is quite a bit faster than 3.4.1
  • Use the POSIX-compliant awk + sort + cut combo as a last resort; which awk implementation you use matters (mawk is faster than GNU awk, BSD awk is slowest).
  • Stay away from sort -R, bash loops, and Scala.

Windows versions of the Python solution (the Python code is identical, except for variations in quoting and the removal of the signal-related statements, which aren't supported on Windows):

  • For PowerShell (in Windows PowerShell, you'll have to adjust $OutputEncoding if you want to send non-ASCII characters via the pipeline):
# Call as `shuf someFile.txt` or `Get-Content someFile.txt | shuf`
function shuf {
  $Input | python -c @'
import sys, random, fileinput;
lines=[line for line in fileinput.input()];
random.shuffle(lines); sys.stdout.write(''.join(lines))
'@ $args  
}

Note that PowerShell can natively shuffle via its Get-Random cmdlet (though performance may be a problem); e.g.:
Get-Content someFile.txt | Get-Random -Count ([int]::MaxValue)

  • For cmd.exe (a batch file):

Save to file shuf.cmd, for instance:

@echo off
python -c "import sys, random, fileinput; lines=[line for line in fileinput.input()]; random.shuffle(lines); sys.stdout.write(''.join(lines))" %*

One liner for Python based on scai's answer, but a) takes stdin, b) makes the result repeatable with seed, c) picks out only 200 of all lines.

$ cat file | python -c "import random, sys; 
  random.seed(100); print ''.join(random.sample(sys.stdin.readlines(), 200))," \
  > 200lines.txt

Simple awk-based function will do the job:

shuffle() { 
    awk 'BEGIN{srand();} {printf "%06d %s\n", rand()*1000000, $0;}' | sort -n | cut -c8-
}

usage:

any_command | shuffle

This should work on almost any UNIX. Tested on Linux, Solaris and HP-UX.

Update:

Note, that leading zeros (%06d) and rand() multiplication makes it to work properly also on systems where sort does not understand numbers. It can be sorted via lexicographical order (a.k.a. normal string compare).


If like me you came here to look for an alternate to shuf for macOS then use randomize-lines.

Install randomize-lines(homebrew) package, which has an rl command which has similar functionality to shuf.

brew install randomize-lines

Usage: rl [OPTION]... [FILE]...
Randomize the lines of a file (or stdin).

  -c, --count=N  select N lines from the file
  -r, --reselect lines may be selected multiple times
  -o, --output=FILE
                 send output to file
  -d, --delimiter=DELIM
                 specify line delimiter (one character)
  -0, --null     set line delimiter to null character
                 (useful with find -print0)
  -n, --line-number
                 print line number with output lines
  -q, --quiet, --silent
                 do not output any errors or warnings
  -h, --help     display this help and exit
  -V, --version  output version information and exit

This bash function has the minimal dependency(only sort and bash):

shuf() {
while read -r x;do
    echo $RANDOM$'\x1f'$x
done | sort |
while IFS=$'\x1f' read -r x y;do
    echo $y
done
}

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to random

How can I get a random number in Kotlin? scikit-learn random state in splitting dataset Random number between 0 and 1 in python In python, what is the difference between random.uniform() and random.random()? Generate random colors (RGB) Random state (Pseudo-random number) in Scikit learn How does one generate a random number in Apple's Swift language? How to generate a random string of a fixed length in Go? Generate 'n' unique random numbers within a range What does random.sample() method in python do?

Examples related to command-line

Git is not working after macOS Update (xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools) Flutter command not found Angular - ng: command not found how to run python files in windows command prompt? How to run .NET Core console app from the command line Copy Paste in Bash on Ubuntu on Windows How to find which version of TensorFlow is installed in my system? How to install JQ on Mac by command-line? Python not working in the command line of git bash Run function in script from command line (Node JS)

Examples related to awk

What are NR and FNR and what does "NR==FNR" imply? awk - concatenate two string variable and assign to a third Printing column separated by comma using Awk command line Insert multiple lines into a file after specified pattern using shell script cut or awk command to print first field of first row How to run an awk commands in Windows? Linux bash script to extract IP address Print line numbers starting at zero using awk Trim leading and trailing spaces from a string in awk Use awk to find average of a column

Examples related to shuffle

Shuffle DataFrame rows What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming? Better way to shuffle two numpy arrays in unison How to randomize (shuffle) a JavaScript array? How can I shuffle the lines of a text file on the Unix command line or in a shell script? Random shuffling of an array Shuffling a list of objects Shuffle an array with python, randomize array item order with python