How can I select random files from a directory in bash

Question

I have a directory with about 2000 files  How can I select a random sample of N files through using either a bash script or a list of piped commands

User · Answer

Here are a few possibilities that don t parse the output of ls and that are 100  safe regarding files with spaces and funny symbols in their name  All of them will populate an array randf with a list of random files  This array is easily printed with printf   s n     randf      if needed    This one will possibly output the same file several times  and N needs to be known in advance  Here I chose N 42   a       randf      a RANDOM    a       1  42         This feature is not very well documented  If N is not known in advance  but you really liked the previous possibility  you can use eval  But it s evil  and you must really make sure that N doesn t come directly from user input without being thoroughly checked   N 42 a       eval randf        a RANDOM     a         1   N            I personally dislike eval and hence this answer  The same using a more straightforward method  a loop    N 42 a       randf    for  i 0 i lt N   i    do     randf       a RANDOM    a          done  If you don t want to possibly have several times the same file   N 42 a       randf    for  i 0 i lt N  amp  amp     a       i    do       j RANDOM    a           randf       a j          a      a    0 j      a    j 1     done    Note  This is a late answer to an old post  but the accepted answer links to an external page that shows terrible bash practice  and the other answer is not much better as it also parses the output of ls  A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice  but doesn t exactly answer the OP

User · Answer

I use this  it uses temporary file but goes deeply in a directory until it find a regular file and return it      find for a quasi-random file in a directory tree     directory to start search from  ROOT         tmp  tmp mytempfile     TARGET   ROOT  FILE      n  r  while   -e   TARGET     do      TARGET    readlink -f    TARGET   FILE           if   -d   TARGET     then       ls -1   TARGET  2 gt   dev null  gt   tmp    break        n   cat  tmp   wc -l          if    n    0    then         FILE   shuf -n 1  tmp    or if you dont have want to use shuf          r     RANDOM    n              FILE   tail -n       r   1      tmp   head -n 1          fi        else       if   -f   TARGET       then         rm -f  tmp         echo  TARGET         break        else            is not a regular file  restart          TARGET   ROOT          FILE          fi     fi done

User · Answer

This is an even later response to  gniourf gniourf s late answer  which I just upvoted because it s by far the best answer  twice over   Once for avoiding eval and once for safe filename handling    But it took me a few minutes to untangle the  not very well documented  feature s  this answer uses  If your Bash skills are solid enough that you saw immediately how it works  then skip this comment  But I didn t  and having untangled it I think it s worth explaining   Feature  1 is the shell s own file globbing  a     creates an array   a  whose members are the files in the current directory  Bash understands all the weirdnesses of filenames  so that list is guaranteed correct  guaranteed escaped  etc  No need to worry about properly parsing textual file names returned by ls   Feature  2 is Bash parameter expansions for arrays  one nested within another  This starts with    ARRAY      which expands to the length of  ARRAY    That expansion is then used to subscript the array  The standard way to find a random number between 1 and N is to take the value of random number modulo N  We want a random number between 0 and the length of our array  Here s the approach  broken into two lines for clarity s sake   LENGTH    ARRAY     RANDOM   a RANDOM  LENGTH     But this solution does it in a single line  removing the unnecessary variable assignment   Feature  3 is Bash brace expansion  although I have to confess I don t entirely understand it  Brace expansion is used  for instance  to generate a list of 25 files named filename1 txt  filename2 txt  etc  echo  filename  1  25   txt     The expression inside the subshell above     a RANDOM    a       1  42      uses that trick to produce 42 separate expansions  The brace expansion places a single digit in between the   and the    which at first I thought was subscripting the array  but if so it would be preceded by a colon   It would also have returned 42 consecutive items from a random spot in the array  which is not at all the same thing as returning 42 random items from the array   I think it s just making the shell run the expansion 42 times  thereby returning 42 random items from the array   But if someone can explain it more fully  I d love to hear it    The reason N has to be hardcoded  to 42  is that brace expansion happens before variable expansion   Finally  here s Feature  4  if you want to do this recursively for a directory hierarchy   shopt -s globstar a          This turns on a shell option that causes    to match recursively  Now your  a array contains every file in the entire hierarchy

User · Answer

Here s a script that uses GNU sort s random option   ls  sort -R  tail - N  while read file  do       Something involving  file  or you can leave       off the while to just get the filenames done

User · Answer

If you have more files in your folder  you can use the below piped command I found in unix stackexchange   find  some dir  -type f -print0   xargs -0 shuf -e -n 8 -z   xargs -0 cp -vt  target dir    Here I wanted to copy the files  but if you want to move files or do something else  just change the last command where I have used cp

User · Answer

You can use shuf  from the GNU coreutils package  for that  Just feed it a list of file names and ask it to return the first line from a random permutation    ls dirname   shuf -n 1   probably faster and more flexible  find dirname -type f   shuf -n 1   etc     Adjust the -n  --head-count COUNT value to return the number of wanted lines  For example to return 5 random filenames you would use   find dirname -type f   shuf -n 5

User · Answer

If you have Python installed  works with either Python 2 or Python 3    To select one file  or line from an arbitrary command   use  ls -1   python -c  import sys  import random  print random choice sys stdin readlines    rstrip       To select N files lines  use  note N is at the end of the command  replace this by a number   ls -1   python -c  import sys  import random  print    join random sample sys stdin readlines    int sys argv 1     rstrip     N

User · Answer

I use this  it uses temporary file but goes deeply in a directory until it find a regular file and return it      find for a quasi-random file in a directory tree     directory to start search from  ROOT         tmp  tmp mytempfile     TARGET   ROOT  FILE      n  r  while   -e   TARGET     do      TARGET    readlink -f    TARGET   FILE           if   -d   TARGET     then       ls -1   TARGET  2 gt   dev null  gt   tmp    break        n   cat  tmp   wc -l          if    n    0    then         FILE   shuf -n 1  tmp    or if you dont have want to use shuf          r     RANDOM    n              FILE   tail -n       r   1      tmp   head -n 1          fi        else       if   -f   TARGET       then         rm -f  tmp         echo  TARGET         break        else            is not a regular file  restart          TARGET   ROOT          FILE          fi     fi done

User · Answer

You can use shuf  from the GNU coreutils package  for that  Just feed it a list of file names and ask it to return the first line from a random permutation    ls dirname   shuf -n 1   probably faster and more flexible  find dirname -type f   shuf -n 1   etc     Adjust the -n  --head-count COUNT value to return the number of wanted lines  For example to return 5 random filenames you would use   find dirname -type f   shuf -n 5

User · Answer

If you have Python installed  works with either Python 2 or Python 3    To select one file  or line from an arbitrary command   use  ls -1   python -c  import sys  import random  print random choice sys stdin readlines    rstrip       To select N files lines  use  note N is at the end of the command  replace this by a number   ls -1   python -c  import sys  import random  print    join random sample sys stdin readlines    int sys argv 1     rstrip     N

User · Answer

How about a Perl solution slightly doctored from Mr  Kang over here   How can I shuffle the lines of a text file on the Unix command line or in a shell script        ls   perl -MList  Util shuffle -e   lines   shuffle  lt     print    lines 0  4

User · Answer

Here s a script that uses GNU sort s random option   ls  sort -R  tail - N  while read file  do       Something involving  file  or you can leave       off the while to just get the filenames done

User · Answer

This is the only script I can get to play nice with bash on MacOS  I combined and edited snippets from the following two links   ls command  how can I get a recursive full-path listing  one line per file   http   www linuxquestions org questions linux-general-1 is-there-a-bash-command-for-picking-a-random-file-678687      bin bash    Reads a given directory and picks a random file     The directory you want to use  You could use   1  instead if you   wanted to parametrize it  DIR   path to     DIR   1     Internal Field Separator set to newline  so file names with   spaces do not break our script  IFS      if    -d    DIR      then     Runs ls on the given dir  and dumps the output into a matrix      it uses the new lines character as a field delimiter  as explained above       file matrix    ls -LR    DIR        file matrix    ls -R  DIR   awk         amp  amp f s  0 f 0        amp  amp  f sub          s  0 f 1 next   NF amp  amp f  print s    0        num files    file matrix          This is the command you want to run on a random file      Change  ls -l  by anything you want  it s just an example    ls -l    file matrix    RANDOM num files      fi  exit 0

User · Answer

Here s a script that uses GNU sort s random option   ls  sort -R  tail - N  while read file  do       Something involving  file  or you can leave       off the while to just get the filenames done

User · Answer

This is an even later response to  gniourf gniourf s late answer  which I just upvoted because it s by far the best answer  twice over   Once for avoiding eval and once for safe filename handling    But it took me a few minutes to untangle the  not very well documented  feature s  this answer uses  If your Bash skills are solid enough that you saw immediately how it works  then skip this comment  But I didn t  and having untangled it I think it s worth explaining   Feature  1 is the shell s own file globbing  a     creates an array   a  whose members are the files in the current directory  Bash understands all the weirdnesses of filenames  so that list is guaranteed correct  guaranteed escaped  etc  No need to worry about properly parsing textual file names returned by ls   Feature  2 is Bash parameter expansions for arrays  one nested within another  This starts with    ARRAY      which expands to the length of  ARRAY    That expansion is then used to subscript the array  The standard way to find a random number between 1 and N is to take the value of random number modulo N  We want a random number between 0 and the length of our array  Here s the approach  broken into two lines for clarity s sake   LENGTH    ARRAY     RANDOM   a RANDOM  LENGTH     But this solution does it in a single line  removing the unnecessary variable assignment   Feature  3 is Bash brace expansion  although I have to confess I don t entirely understand it  Brace expansion is used  for instance  to generate a list of 25 files named filename1 txt  filename2 txt  etc  echo  filename  1  25   txt     The expression inside the subshell above     a RANDOM    a       1  42      uses that trick to produce 42 separate expansions  The brace expansion places a single digit in between the   and the    which at first I thought was subscripting the array  but if so it would be preceded by a colon   It would also have returned 42 consecutive items from a random spot in the array  which is not at all the same thing as returning 42 random items from the array   I think it s just making the shell run the expansion 42 times  thereby returning 42 random items from the array   But if someone can explain it more fully  I d love to hear it    The reason N has to be hardcoded  to 42  is that brace expansion happens before variable expansion   Finally  here s Feature  4  if you want to do this recursively for a directory hierarchy   shopt -s globstar a          This turns on a shell option that causes    to match recursively  Now your  a array contains every file in the entire hierarchy

User · Answer

This is the only script I can get to play nice with bash on MacOS  I combined and edited snippets from the following two links   ls command  how can I get a recursive full-path listing  one line per file   http   www linuxquestions org questions linux-general-1 is-there-a-bash-command-for-picking-a-random-file-678687      bin bash    Reads a given directory and picks a random file     The directory you want to use  You could use   1  instead if you   wanted to parametrize it  DIR   path to     DIR   1     Internal Field Separator set to newline  so file names with   spaces do not break our script  IFS      if    -d    DIR      then     Runs ls on the given dir  and dumps the output into a matrix      it uses the new lines character as a field delimiter  as explained above       file matrix    ls -LR    DIR        file matrix    ls -R  DIR   awk         amp  amp f s  0 f 0        amp  amp  f sub          s  0 f 1 next   NF amp  amp f  print s    0        num files    file matrix          This is the command you want to run on a random file      Change  ls -l  by anything you want  it s just an example    ls -l    file matrix    RANDOM num files      fi  exit 0

User · Answer

MacOS does not have the sort -R and shuf commands  so I needed a bash only solution that randomizes all files without duplicates and did not find that here  This solution is similar to gniourf gniourf s solution  4  but hopefully adds better comments    The script should be easy to modify to stop after N samples using a counter with if  or gniourf gniourf s for loop with N   RANDOM is limited to  32000 files  but that should do for most cases      bin bash  array        this is the array of files to shuffle   echo   array     for dummy in    array       do    do loop length array  times  once for each file     length    array         randomi      RANDOM    length       select a random index      filename   array  randomi       echo  Processing    filename      do something with the file      unset -v  array  randomi      set the element at index  randomi to NULL     array     array          remove NULL elements introduced by unset  copy array done

User · Answer

Here s a script that uses GNU sort s random option   ls  sort -R  tail - N  while read file  do       Something involving  file  or you can leave       off the while to just get the filenames done

User · Answer

ls   shuf -n 10   ten random files

User · Answer

A simple solution for selecting 5 random files while avoiding to parse ls  It also works with files containing spaces  newlines and other special characters   shuf -ezn 5     xargs -0 -n1 echo   Replace echo with the command you want to execute for your files

User · Answer

MacOS does not have the sort -R and shuf commands  so I needed a bash only solution that randomizes all files without duplicates and did not find that here  This solution is similar to gniourf gniourf s solution  4  but hopefully adds better comments    The script should be easy to modify to stop after N samples using a counter with if  or gniourf gniourf s for loop with N   RANDOM is limited to  32000 files  but that should do for most cases      bin bash  array        this is the array of files to shuffle   echo   array     for dummy in    array       do    do loop length array  times  once for each file     length    array         randomi      RANDOM    length       select a random index      filename   array  randomi       echo  Processing    filename      do something with the file      unset -v  array  randomi      set the element at index  randomi to NULL     array     array          remove NULL elements introduced by unset  copy array done

User · Answer

Here are a few possibilities that don t parse the output of ls and that are 100  safe regarding files with spaces and funny symbols in their name  All of them will populate an array randf with a list of random files  This array is easily printed with printf   s n     randf      if needed    This one will possibly output the same file several times  and N needs to be known in advance  Here I chose N 42   a       randf      a RANDOM    a       1  42         This feature is not very well documented  If N is not known in advance  but you really liked the previous possibility  you can use eval  But it s evil  and you must really make sure that N doesn t come directly from user input without being thoroughly checked   N 42 a       eval randf        a RANDOM     a         1   N            I personally dislike eval and hence this answer  The same using a more straightforward method  a loop    N 42 a       randf    for  i 0 i lt N   i    do     randf       a RANDOM    a          done  If you don t want to possibly have several times the same file   N 42 a       randf    for  i 0 i lt N  amp  amp     a       i    do       j RANDOM    a           randf       a j          a      a    0 j      a    j 1     done    Note  This is a late answer to an old post  but the accepted answer links to an external page that shows terrible bash practice  and the other answer is not much better as it also parses the output of ls  A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice  but doesn t exactly answer the OP

User · Answer

A simple solution for selecting 5 random files while avoiding to parse ls  It also works with files containing spaces  newlines and other special characters   shuf -ezn 5     xargs -0 -n1 echo   Replace echo with the command you want to execute for your files

User · Answer

If you have more files in your folder  you can use the below piped command I found in unix stackexchange   find  some dir  -type f -print0   xargs -0 shuf -e -n 8 -z   xargs -0 cp -vt  target dir    Here I wanted to copy the files  but if you want to move files or do something else  just change the last command where I have used cp

User · Answer

ls   shuf -n 10   ten random files

User · Answer

How about a Perl solution slightly doctored from Mr  Kang over here   How can I shuffle the lines of a text file on the Unix command line or in a shell script        ls   perl -MList  Util shuffle -e   lines   shuffle  lt     print    lines 0  4

[bash] How can I select random files from a directory in bash?

Examples related to bash

Examples related to random