How to count number of unique values of a field in a tab-delimited text file

Question

I have a text file with a large amount of data which is tab delimited  I want to have a look at the data such that I can see the unique values in a column  For example  Red     Ball 1 Sold Blue    Bat  5 OnSale                   So  its like the first column has colors  so I want to know how many different unique values are there in that column and I want to be able to do that for each column  I need to do this in a Linux command line  so probably using some bash script  sed  awk or something  What if I wanted a count of these unique values as well  Update  I guess I didn t put the second part clearly enough  What I wanted to do is to have a count of  quot each quot  of these unique values not know how many unique values are there  For instance  in the first column I want to know how many Red  Blue  Green etc coloured objects are there

User · Answer

Assuming the data file is actually Tab separated  not space aligned    lt test tsv awk   print  4     sort   uniq   Where  4 will be     1 - Red    2 - Ball    3 - 1    4 - Sold

User · Answer

You can use awk  sort  amp  uniq to do this  for example to list all the unique values in the first column  awk  lt  test txt   print  1     sort   uniq   As posted elsewhere  if you want to count the number of instances of something you can pipe the unique list into wc -l

User · Answer

Here is a bash script that fully answers the  revised  original question   That is  given any  tsv file  it provides the synopsis for each of the columns in turn   Apart from bash itself  it only uses standard  ix Mac tools  sed tr wc cut sort uniq      bin bash   Syntax   0 filename      The input is assumed to be a  tsv file  FILE   1   cols   sed -n 1p  FILE   tr -cd   t    wc -c  cols    cols   2    i 0 for   i 1  i  lt   cols  i     do   echo Column  i      cut -f  i  lt    FILE    sort   uniq -c   echo done

User · Answer

You can make use of cut  sort and uniq commands as follows   cat input file   cut -f 1   sort   uniq   gets unique values in field 1  replacing 1 by 2 will give you unique values in field 2   Avoiding UUOC     cut -f 1 input file   sort   uniq   EDIT   To count the number of unique occurences you can make use of wc command in the chain as   cut -f 1 input file   sort   uniq   wc -l

User · Answer

COLUMN is integer column number   INPUT FILE is input file name  cut -f   COLUMN   lt    INPUT FILE    sort -u   wc -l

User · Answer

awk -F   t     a  1      END   for  n in a  print n  a n      test csv

User · Answer

This script outputs the number of unique values in each column of a given file  It assumes that first line of given file is header line  There is no need for defining number of fields  Simply save the script in a bash file   sh  and provide the tab delimited file as a parameter to this script   Code      bin bash  awk    NR  1       for fi 1  fi lt  NF  fi            fname fi   fi      NR  1       for fi 1  fi lt  NF  fi             arr fname fi    fi        END      for fi 1  fi lt  NF  fi             out fname fi           for  item in arr fname fi                out out  t item   arr fname fi   item           print out              1   Execution Example   bash gt    script sh  lt path to tab-delimited file gt   Output Example  isRef    A 15      C 42     G 24     T 18 isCar    YEA 10    NO 40    NA 50 isTv     FALSE 33  TRUE 66

[linux] How to count number of unique values of a field in a tab-delimited text file?

Examples related to linux

Examples related to bash

Examples related to command-line