Unix command to find lines common in two files

Question

I m sure I once found a unix command which could print the common lines from two or more files  does anyone know its name  It was much simpler than diff

User · Answer

awk  NR  FNR a  1    next  a  1    file1 file2

User · Answer

Just for reference if someone is still looking on how to do this for multiple files, see the linked answer to Finding matching lines across many files.

Combining these two answers (ans1 and ans2), I think you can get the result you are needing without sorting the files:

#!/bin/bash
ans="matching_lines"

for file1 in *
do 
    for file2 in *
        do 
            if  [ "$file1" != "$ans" ] && [ "$file2" != "$ans" ] && [ "$file1" != "$file2" ] ; then
                echo "Comparing: $file1 $file2 ..." >> $ans
                perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' $file1 $file2 >> $ans
            fi
         done 
done

Simply save it, give it execution rights (chmod +x compareFiles.sh) and run it. It will take all the files present in the current working directory and will make an all-vs-all comparison leaving in the "matching_lines" file the result.

Things to be improved:

Skip directories
Avoid comparing all the files two times (file1 vs file2 and file2 vs file1).
Maybe add the line number next to the matching string

User · Answer

While   grep -v -f 1 txt 2 txt  gt  3 txt   gives you the differences of two files  what is in 2 txt and not in 1 txt   you could easily do a  grep -f 1 txt 2 txt  gt  3 txt   to collect all common lines  which should provide an easy solution to your problem  If you have sorted files  you should take comm nonetheless  Regards

User · Answer

The command you are seeking is comm  eg -  comm -12 1 sorted txt 2 sorted txt   Here   -1   suppress column 1  lines unique to 1 sorted txt   -2   suppress column 2  lines unique to 2 sorted txt

User · Answer

On limited version of Linux  like a QNAP  nas  I was working on     comm did not exist grep -f file1 file2 can cause some problems as said by  ChristopherSchultz and using grep -F -f file1 file2 was really slow  more than 5 minutes - not finished it - over 2-3 seconds with the method below on files over 20MB    So here is what I did    sort file1  gt  file1 sorted sort file2  gt  file2 sorted  diff file1 sorted file2 sorted   grep   lt     sed  s   lt        gt  files diff diff file1 sorted files diff   grep   lt     sed  s   lt        gt  files same sorted   If files same sorted shall have been in same order than the original ones  than add this line for same order than file1    awk  FNR  NR  a  0   0  next    0 in a  print a  0    files same sorted file1  gt  files same   or  for same order than file2    awk  FNR  NR  a  0   0  next    0 in a  print a  0    files same sorted file2  gt  files same

User · Answer

To complement the Perl one-liner  here s its awk equivalent   awk  NR  FNR arr  0  next   0 in arr  file1 file2   This will read all lines from file1 into the array arr    and then check for each line in file2 if it already exists within the array  i e  file1   The lines that are found will be printed in the order in which they appear in file2  Note that the comparison in arr uses the entire line from file2 as index to the array  so it will only report exact matches on entire lines

User · Answer

Maybe you mean comm       Compare sorted files FILE1 and FILE2 line by line       With  no  options   produce three-column output   Column one   contains lines unique to FILE1  column   two contains lines unique to   FILE2  and column three contains lines common to both files    The secret in finding these information are the info pages  For GNU programs  they are much more detailed than their man-pages  Try info coreutils and it will list you all the small useful utils

User · Answer

The command you are seeking is comm  eg -  comm -12 1 sorted txt 2 sorted txt   Here   -1   suppress column 1  lines unique to 1 sorted txt   -2   suppress column 2  lines unique to 2 sorted txt

User · Answer

If the two files are not sorted yet  you can use   comm -12  lt  sort a txt   lt  sort b txt    and it will work  avoiding the error message comm  file 2 is not in sorted order  when doing comm -12 a txt b txt

User · Answer

Maybe you mean comm       Compare sorted files FILE1 and FILE2 line by line       With  no  options   produce three-column output   Column one   contains lines unique to FILE1  column   two contains lines unique to   FILE2  and column three contains lines common to both files    The secret in finding these information are the info pages  For GNU programs  they are much more detailed than their man-pages  Try info coreutils and it will list you all the small useful utils

User · Answer

Maybe you mean comm       Compare sorted files FILE1 and FILE2 line by line       With  no  options   produce three-column output   Column one   contains lines unique to FILE1  column   two contains lines unique to   FILE2  and column three contains lines common to both files    The secret in finding these information are the info pages  For GNU programs  they are much more detailed than their man-pages  Try info coreutils and it will list you all the small useful utils

User · Answer

perl -ne  print if   seen         ARGV      10     file1 file2

User · Answer

rm file3 txt  cat file1 out   while read line1 do         cat file2 out   while read line2         do                 if     line1     line2     then                         echo  line1  gt  gt file3 out                 fi         done done   This should do it

User · Answer

Maybe you mean comm       Compare sorted files FILE1 and FILE2 line by line       With  no  options   produce three-column output   Column one   contains lines unique to FILE1  column   two contains lines unique to   FILE2  and column three contains lines common to both files    The secret in finding these information are the info pages  For GNU programs  they are much more detailed than their man-pages  Try info coreutils and it will list you all the small useful utils

User · Answer

On limited version of Linux  like a QNAP  nas  I was working on     comm did not exist grep -f file1 file2 can cause some problems as said by  ChristopherSchultz and using grep -F -f file1 file2 was really slow  more than 5 minutes - not finished it - over 2-3 seconds with the method below on files over 20MB    So here is what I did    sort file1  gt  file1 sorted sort file2  gt  file2 sorted  diff file1 sorted file2 sorted   grep   lt     sed  s   lt        gt  files diff diff file1 sorted files diff   grep   lt     sed  s   lt        gt  files same sorted   If files same sorted shall have been in same order than the original ones  than add this line for same order than file1    awk  FNR  NR  a  0   0  next    0 in a  print a  0    files same sorted file1  gt  files same   or  for same order than file2    awk  FNR  NR  a  0   0  next    0 in a  print a  0    files same sorted file2  gt  files same

User · Answer

If the two files are not sorted yet  you can use   comm -12  lt  sort a txt   lt  sort b txt    and it will work  avoiding the error message comm  file 2 is not in sorted order  when doing comm -12 a txt b txt

User · Answer

To complement the Perl one-liner  here s its awk equivalent   awk  NR  FNR arr  0  next   0 in arr  file1 file2   This will read all lines from file1 into the array arr    and then check for each line in file2 if it already exists within the array  i e  file1   The lines that are found will be printed in the order in which they appear in file2  Note that the comparison in arr uses the entire line from file2 as index to the array  so it will only report exact matches on entire lines

User · Answer

To easily apply the comm command to unsorted files  use Bash s process substitution      bash --version GNU bash  version 3 2 51 1 -release Copyright  C  2007 Free Software Foundation  Inc    cat  gt  abc 123 567 132   cat  gt  def 132 777 321   So the files abc and def have one line in common  the one with  132   Using comm on unsorted files     comm abc def 123     132 567 132     777     321   comm -12 abc def   No output  The common line is not found     The last line produced no output  the common line was not discovered    Now use comm on sorted files  sorting the files with process substitution     comm  lt   sort abc    lt   sort def   123             132     321 567     777   comm -12  lt   sort abc    lt   sort def   132   Now we got the 132 line

User · Answer

The command you are seeking is comm  eg -  comm -12 1 sorted txt 2 sorted txt   Here   -1   suppress column 1  lines unique to 1 sorted txt   -2   suppress column 2  lines unique to 2 sorted txt

User · Answer

The command you are seeking is comm  eg -  comm -12 1 sorted txt 2 sorted txt   Here   -1   suppress column 1  lines unique to 1 sorted txt   -2   suppress column 2  lines unique to 2 sorted txt

User · Answer

perl -ne  print if   seen         ARGV      10     file1 file2

User · Answer

While   grep -v -f 1 txt 2 txt  gt  3 txt   gives you the differences of two files  what is in 2 txt and not in 1 txt   you could easily do a  grep -f 1 txt 2 txt  gt  3 txt   to collect all common lines  which should provide an easy solution to your problem  If you have sorted files  you should take comm nonetheless  Regards

User · Answer

To easily apply the comm command to unsorted files  use Bash s process substitution      bash --version GNU bash  version 3 2 51 1 -release Copyright  C  2007 Free Software Foundation  Inc    cat  gt  abc 123 567 132   cat  gt  def 132 777 321   So the files abc and def have one line in common  the one with  132   Using comm on unsorted files     comm abc def 123     132 567 132     777     321   comm -12 abc def   No output  The common line is not found     The last line produced no output  the common line was not discovered    Now use comm on sorted files  sorting the files with process substitution     comm  lt   sort abc    lt   sort def   123             132     321 567     777   comm -12  lt   sort abc    lt   sort def   132   Now we got the 132 line

User · Answer

rm file3 txt  cat file1 out   while read line1 do         cat file2 out   while read line2         do                 if     line1     line2     then                         echo  line1  gt  gt file3 out                 fi         done done   This should do it

User · Answer

Just for reference if someone is still looking on how to do this for multiple files, see the linked answer to Finding matching lines across many files.

Combining these two answers (ans1 and ans2), I think you can get the result you are needing without sorting the files:

#!/bin/bash
ans="matching_lines"

for file1 in *
do 
    for file2 in *
        do 
            if  [ "$file1" != "$ans" ] && [ "$file2" != "$ans" ] && [ "$file1" != "$file2" ] ; then
                echo "Comparing: $file1 $file2 ..." >> $ans
                perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' $file1 $file2 >> $ans
            fi
         done 
done

Simply save it, give it execution rights (chmod +x compareFiles.sh) and run it. It will take all the files present in the current working directory and will make an all-vs-all comparison leaving in the "matching_lines" file the result.

Things to be improved:

Skip directories
Avoid comparing all the files two times (file1 vs file2 and file2 vs file1).
Maybe add the line number next to the matching string

User · Answer

awk  NR  FNR a  1    next  a  1    file1 file2

[unix] Unix command to find lines common in two files

The answer is

Examples related to unix

Examples related to shell

Examples related to command-line

Tags