Comparing two files in linux terminal

Question

There are two files called  a txt  and  b txt  both have a list of words  Now I want to check which words are extra in  a txt  and  are not in  b txt     I need a efficient algorithm as I need to compare two dictionaries

User · Answer

if you have vim installed try this   vimdiff file1 file2   or  vim -d file1 file2   you will find it fantastic

User · Answer

You can also use  colordiff  Displays the output of diff with colors   About vimdiff  It allows you to compare files via SSH  for example    vimdiff  var log secure scp   192 168 1 25 var log secure   Extracted from  http   www sysadmit com 2016 05 linux-diferencias-entre-dos-archivos html

User · Answer

If you prefer the diff output style from git diff  you can use it with the --no-index flag to compare files not in a git repository   git diff --no-index a txt b txt   Using a couple of files with around 200k file name strings in each  I benchmarked  with the built-in timecommand  this approach vs some of the other answers here   git diff --no-index a txt b txt    1 2s  comm -23  lt  sort a txt   lt  sort b txt     0 2s  diff a txt b txt    2 6s  sdiff a txt b txt    2 7s  vimdiff a txt b txt    3 2s   comm seems to be the fastest by far  while git diff --no-index appears to be the fastest approach for diff-style output     Update 2018-03-25 You can actually omit the --no-index flag unless you are inside a git repository and want to compare untracked files within that repository  From the man pages      This form is to compare the given two paths on the filesystem  You can omit the --no-index option when running the command in a working tree controlled by Git and at least one of the paths points outside the working tree  or when running the command outside a working tree controlled by Git

User · Answer

You can use diff tool in linux to compare two files  You can use --changed-group-format and --unchanged-group-format options to filter required data   Following three options can use to select the relevant group for each option       lt        get lines from FILE1        get lines from FILE2        empty string  for removing lines from both files       E g  diff --changed-group-format    lt   --unchanged-group-format     file1 txt file2 txt    root vmoracle11 tmp   cat file1 txt  test one test two test three test four test eight  root vmoracle11 tmp   cat file2 txt  test one test three test nine  root vmoracle11 tmp   diff --changed-group-format    lt   --unchanged-group-format    file1 txt file2 txt  test two test four test eight

User · Answer

Also  do not forget about mcdiff - Internal diff viewer of GNU Midnight Commander   For example   mcdiff file1 file2   Enjoy

User · Answer

Use comm -13  requires sorted files      cat file1 one two three    cat file2 one two three four    comm -13  lt  sort file1   lt  sort file2  four

User · Answer

Using awk for it  Test files     cat a txt one two three four four   cat b txt three two one   The awk     awk   NR  FNR                        process b txt  or the first file     seen  0                    hash words to hash seen     next                       next word in b txt                                process a txt  or all files after the first    0 in seen   b txt a txt     if word is not hashed to seen  output it   Duplicates are outputed   four four   To avoid duplicates  add each newly met word in a txt to seen hash     awk   NR  FNR       seen  0      next      0 in seen                   if word is not hashed to seen     seen  0                    hash unseen a txt words to seen to avoid duplicates      print                      and output it    b txt a txt   Output   four   If the word lists are comma-separated  like     cat a txt four four three three two one five six   cat b txt one two three   you have to do a couple of extra laps  forloops    awk -F                         comma-separated input NR  FNR       for i 1 i lt  NF i            loop all comma-separated fields         seen  i      next         for i 1 i lt  NF i            if    i in seen                  seen  i           this time we buffer output  below                buffer buffer  buffer              i               if buffer                  output unempty buffers after each record in a txt         print buffer         buffer             b txt a txt   Output this time   four five six

User · Answer

Try sdiff  man sdiff     sdiff -s file1 file2

User · Answer

You can also use  sdiff file1 file2  To display differences side by side within your terminal

User · Answer

Here is my solution for this    mkdir temp mkdir results cp  usr share dict american-english   temp american-english-dictionary cp  usr share dict british-english   temp british-english-dictionary cat   temp american-english-dictionary   wc -l  gt    results count-american-english-dictionary cat   temp british-english-dictionary   wc -l  gt    results count-british-english-dictionary grep -Fxf   temp american-english-dictionary   temp british-english-dictionary  gt    results common-english grep -Fxvf   results common-english   temp american-english-dictionary  gt    results unique-american-english grep -Fxvf   results common-english   temp british-english-dictionary  gt    results unique-british-english

User · Answer

Sort them and use comm   comm -23  lt  sort a txt   lt  sort b txt    comm compares  sorted  input files and by default outputs three columns  lines that are unique to a  lines that are unique to b  and lines that are present in both  By specifying -1  -2 and or -3 you can suppress the corresponding output  Therefore comm -23 a b lists only the entries that are unique to a  I use the  lt       syntax to sort the files on the fly  if they are already sorted you don t need this

[linux] Comparing two files in linux terminal

Examples related to linux

Examples related to terminal

Examples related to diff

Examples related to file-comparison