Find duplicate lines in a file and count how many time each line was duplicated

Question

Suppose I have a file similar to the following   123  123  234  234  123  345   I would like to find how many times  123  was duplicated  how many times  234  was duplicated  etc  So ideally  the output would be like   123  3  234  2  345  1

User · Answer

Assuming you ve got access to a standard Unix shell and or cygwin environment    tr -s       n   lt  yourfile   sort   uniq -d -c         --space char   Basically  convert all space characters to linebreaks  then sort the tranlsated output and feed that to uniq and count duplicate lines

User · Answer

This will print duplicate lines only  with counts   sort FILE   uniq -cd   or  with GNU long options  on Linux    sort FILE   uniq --count --repeated   on BSD and OSX you have to use grep to filter out unique lines   sort FILE   uniq -c   grep -v     1     For the given example  the result would be     3 123   2 234     If you want to print counts for all lines including those that appear only once   sort FILE   uniq -c   or  with GNU long options  on Linux    sort FILE   uniq --count   For the given input  the output is     3 123   2 234   1 345     In order to sort the output with the most frequent lines on top  you can do the following  to get all results    sort FILE   uniq -c   sort -nr   or  to get only duplicate lines  most frequent first   sort FILE   uniq -cd   sort -nr   on OSX and BSD the final one becomes   sort FILE   uniq -c   grep -v     1     sort -nr

User · Answer

To find duplicate counts use below command as requested by you   sort filename   uniq -c   awk   print  2   1

User · Answer

Via awk   awk   dups  1     END for  num in dups   print num dups num     data   In awk  dups  1     command  the variable  1 holds the entire contents of column1 and square brackets are array access  So  for each 1st column of line in data file  the node of the array named dups is incremented   And at the end  we are looping over dups array with num as variable and print the saved numbers first then their number of duplicated value by dups num    Note that your input file has spaces on end of some lines  if you clear up those  you can use  0 in place of  1 in command above

User · Answer

To find and count duplicate lines in multiple files  you can try the following command   sort  lt files gt    uniq -c   sort -nr   or   cat  lt files gt    sort   uniq -c   sort -nr

User · Answer

In windows using  Windows PowerShell  I used the command mentioned below to achieve this  Get-Content   file txt   Group-Object   Select Name  Count   Also we can use the where-object Cmdlet to filter the result  Get-Content   file txt   Group-Object   Where-Object      Count -gt 1     Select Name  Count

User · Answer

Assuming there is one number per line   sort  lt file gt    uniq -c   You can use the more verbose --count flag too with the GNU version  e g   on Linux   sort  lt file gt    uniq --count

[file] Find duplicate lines in a file and count how many time each line was duplicated?

Examples related to file

Examples related to count

Examples related to find

Examples related to duplicates

Examples related to lines