[linux] Use find command but exclude files in two directories

I want to find files that end with _peaks.bed, but exclude files in the tmp and scripts folders.

My command is like this:

 find . -type f \( -name "*_peaks.bed" ! -name "*tmp*" ! -name "*scripts*" \)

But it didn't work. The files in tmp and script folder will still be displayed.

Does anyone have ideas about this?

This question is related to linux shell unix find

The answer is


Here's how you can specify that with find:

find . -type f -name "*_peaks.bed" ! -path "./tmp/*" ! -path "./scripts/*"

Explanation:

  • find . - Start find from current working directory (recursively by default)
  • -type f - Specify to find that you only want files in the results
  • -name "*_peaks.bed" - Look for files with the name ending in _peaks.bed
  • ! -path "./tmp/*" - Exclude all results whose path starts with ./tmp/
  • ! -path "./scripts/*" - Also exclude all results whose path starts with ./scripts/

Testing the Solution:

$ mkdir a b c d e
$ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
$ find . -type f ! -path "./a/*" ! -path "./b/*"

./d/4
./c/3
./e/a
./e/b
./e/5

You were pretty close, the -name option only considers the basename, where as -path considers the entire path =)


for me, this solution didn't worked on a command exec with find, don't really know why, so my solution is

find . -type f -path "./a/*" -prune -o -path "./b/*" -prune -o -exec gzip -f -v {} \;

Explanation: same as sampson-chen one with the additions of

-prune - ignore the proceding path of ...

-o - Then if no match print the results, (prune the directories and print the remaining results)

18:12 $ mkdir a b c d e
18:13 $ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
18:13 $ find . -type f -path "./a/*" -prune -o -path "./b/*" -prune -o -exec gzip -f -v {} \;

gzip: . is a directory -- ignored
gzip: ./a is a directory -- ignored
gzip: ./b is a directory -- ignored
gzip: ./c is a directory -- ignored
./c/3:    0.0% -- replaced with ./c/3.gz
gzip: ./d is a directory -- ignored
./d/4:    0.0% -- replaced with ./d/4.gz
gzip: ./e is a directory -- ignored
./e/5:    0.0% -- replaced with ./e/5.gz
./e/a:    0.0% -- replaced with ./e/a.gz
./e/b:    0.0% -- replaced with ./e/b.gz

Here is one way you could do it...

find . -type f -name "*_peaks.bed" | egrep -v "^(./tmp/|./scripts/)"

Use

find \( -path "./tmp" -o -path "./scripts" \) -prune -o  -name "*_peaks.bed" -print

or

find \( -path "./tmp" -o -path "./scripts" \) -prune -false -o  -name "*_peaks.bed"

or

find \( -path "./tmp" -path "./scripts" \) ! -prune -o  -name "*_peaks.bed"

The order is important. It evaluates from left to right. Always begin with the path exclusion.

Explanation

Do not use -not (or !) to exclude whole directory. Use -prune. As explained in the manual:

-prune    The primary shall always evaluate as  true;  it
          shall  cause  find  not  to descend the current
          pathname if it is a directory.  If  the  -depth
          primary  is specified, the -prune primary shall
          have no effect.

and in the GNU find manual:

-path pattern
              [...]
              To ignore  a  whole
              directory  tree,  use  -prune rather than checking
              every file in the tree.

Indeed, if you use -not -path "./pathname", find will evaluate the expression for each node under "./pathname".

find expressions are just condition evaluation.

  • \( \) - groups operation (you can use -path "./tmp" -prune -o -path "./scripts" -prune -o, but it is more verbose).
  • -path "./script" -prune - if -path returns true and is a directory, return true for that directory and do not descend into it.
  • -path "./script" ! -prune - it evaluates as (-path "./script") AND (! -prune). It revert the "always true" of prune to always false. It avoids printing "./script" as a match.
  • -path "./script" -prune -false - since -prune always returns true, you can follow it with -false to do the same than !.
  • -o - OR operator. If no operator is specified between two expressions, it defaults to AND operator.

Hence, \( -path "./tmp" -o -path "./scripts" \) -prune -o -name "*_peaks.bed" -print is expanded to:

[ (-path "./tmp" OR -path "./script") AND -prune ] OR ( -name "*_peaks.bed" AND print )

The print is important here because without it is expanded to:

{ [ (-path "./tmp" OR -path "./script" )  AND -prune ]  OR (-name "*_peaks.bed" ) } AND print

-print is added by find - that is why most of the time, you do not need to add it in you expression. And since -prune returns true, it will print "./script" and "./tmp".

It is not necessary in the others because we switched -prune to always return false.

Hint: You can use find -D opt expr 2>&1 1>/dev/null to see how it is optimized and expanded,
find -D search expr 2>&1 1>/dev/null to see which path is checked.


Try something like

find . \( -type f -name \*_peaks.bed -print \) -or \( -type d -and \( -name tmp -or -name scripts \) -and -prune \)

and don't be too surprised if I got it a bit wrong. If the goal is an exec (instead of print), just substitute it in place.


You can try below:

find ./ ! \( -path ./tmp -prune \) ! \( -path ./scripts -prune \) -type f -name '*_peaks.bed'

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to find

Find a file by name in Visual Studio Code Explaining the 'find -mtime' command find files by extension, *.html under a folder in nodejs MongoDB Show all contents from all collections How can I find a file/directory that could be anywhere on linux command line? Get all files modified in last 30 days in a directory FileNotFoundError: [Errno 2] No such file or directory Linux find and grep command together find . -type f -exec chmod 644 {} ; Find all stored procedures that reference a specific column in some table