[perl] How do I remove duplicate items from an array in Perl?

I have an array in Perl:

my @my_array = ("one","two","three","two","three");

How do I remove the duplicates from the array?

This question is related to perl arrays unique duplicates

The answer is


Try this, seems the uniq function needs a sorted list to work properly.

use strict;

# Helper function to remove duplicates in a list.
sub uniq {
  my %seen;
  grep !$seen{$_}++, @_;
}

my @teststrings = ("one", "two", "three", "one");

my @filtered = uniq @teststrings;
print "uniq: @filtered\n";
my @sorted = sort @teststrings;
print "sort: @sorted\n";
my @sortedfiltered = uniq sort @teststrings;
print "uniq sort : @sortedfiltered\n";

Method 1: Use a hash

Logic: A hash can have only unique keys, so iterate over array, assign any value to each element of array, keeping element as key of that hash. Return keys of the hash, its your unique array.

my @unique = keys {map {$_ => 1} @array};

Method 2: Extension of method 1 for reusability

Better to make a subroutine if we are supposed to use this functionality multiple times in our code.

sub get_unique {
    my %seen;
    grep !$seen{$_}++, @_;
}
my @unique = get_unique(@array);

Method 3: Use module List::MoreUtils

use List::MoreUtils qw(uniq);
my @unique = uniq(@array);

That last one was pretty good. I'd just tweak it a bit:

my @arr;
my @uniqarr;

foreach my $var ( @arr ){
  if ( ! grep( /$var/, @uniqarr ) ){
     push( @uniqarr, $var );
  }
}

I think this is probably the most readable way to do it.


The variable @array is the list with duplicate elements

%seen=();
@unique = grep { ! $seen{$_} ++ } @array;

The Perl documentation comes with a nice collection of FAQs. Your question is frequently asked:

% perldoc -q duplicate

The answer, copy and pasted from the output of the command above, appears below:

Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
 How can I remove duplicate elements from a list or array?
   (contributed by brian d foy)

   Use a hash. When you think the words "unique" or "duplicated", think
   "hash keys".

   If you don't care about the order of the elements, you could just
   create the hash then extract the keys. It's not important how you
   create that hash: just that you use "keys" to get the unique elements.

       my %hash   = map { $_, 1 } @array;
       # or a hash slice: @hash{ @array } = ();
       # or a foreach: $hash{$_} = 1 foreach ( @array );

       my @unique = keys %hash;

   If you want to use a module, try the "uniq" function from
   "List::MoreUtils". In list context it returns the unique elements,
   preserving their order in the list. In scalar context, it returns the
   number of unique elements.

       use List::MoreUtils qw(uniq);

       my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
       my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

   You can also go through each element and skip the ones you've seen
   before. Use a hash to keep track. The first time the loop sees an
   element, that element has no key in %Seen. The "next" statement creates
   the key and immediately uses its value, which is "undef", so the loop
   continues to the "push" and increments the value for that key. The next
   time the loop sees that same element, its key exists in the hash and
   the value for that key is true (since it's not 0 or "undef"), so the
   next skips that iteration and the loop goes to the next element.

       my @unique = ();
       my %seen   = ();

       foreach my $elem ( @array )
       {
         next if $seen{ $elem }++;
         push @unique, $elem;
       }

   You can write this more briefly using a grep, which does the same
   thing.

       my %seen = ();
       my @unique = grep { ! $seen{ $_ }++ } @array;

Can be done with a simple Perl one liner.

my @in=qw(1 3 4  6 2 4  3 2 6  3 2 3 4 4 3 2 5 5 32 3); #Sample data 
my @out=keys %{{ map{$_=>1}@in}}; # Perform PFM
print join ' ', sort{$a<=>$b} @out;# Print data back out sorted and in order.

The PFM block does this:

Data in @in is fed into MAP. MAP builds an anonymous hash. Keys are extracted from the hash and feed into @out


Previous answers pretty much summarize the possible ways of accomplishing this task.

However, I suggest a modification for those who don't care about counting the duplicates, but do care about order.

my @record = qw( yeah I mean uh right right uh yeah so well right I maybe );
my %record;
print grep !$record{$_} && ++$record{$_}, @record;

Note that the previously suggested grep !$seen{$_}++ ... increments $seen{$_} before negating, so the increment occurs regardless of whether it has already been %seen or not. The above, however, short-circuits when $record{$_} is true, leaving what's been heard once 'off the %record'.

You could also go for this ridiculousness, which takes advantage of autovivification and existence of hash keys:

...
grep !(exists $record{$_} || undef $record{$_}), @record;

That, however, might lead to some confusion.

And if you care about neither order or duplicate count, you could for another hack using hash slices and the trick I just mentioned:

...
undef @record{@record};
keys %record; # your record, now probably scrambled but at least deduped

Install List::MoreUtils from CPAN

Then in your code:

use strict;
use warnings;
use List::MoreUtils qw(uniq);

my @dup_list = qw(1 1 1 2 3 4 4);

my @uniq_list = uniq(@dup_list);

Using concept of unique hash keys :

my @array  = ("a","b","c","b","a","d","c","a","d");
my %hash   = map { $_ => 1 } @array;
my @unique = keys %hash;
print "@unique","\n";

Output: a c b d


My usual way of doing this is:

my %unique = ();
foreach my $item (@myarray)
{
    $unique{$item} ++;
}
my @myuniquearray = keys %unique;

If you use a hash and add the items to the hash. You also have the bonus of knowing how many times each item appears in the list.


Examples related to perl

The program can't start because api-ms-win-crt-runtime-l1-1-0.dll is missing while starting Apache server on my computer "End of script output before headers" error in Apache Perl - Multiple condition if statement without duplicating code? How to decrypt hash stored by bcrypt Split a string into array in Perl Turning multiple lines into one comma separated line String compare in Perl with "eq" vs "==" how to remove the first two columns in a file using shell (awk, sed, whatever) Find everything between two XML tags with RegEx Difference between \w and \b regular expression meta characters

Examples related to arrays

PHP array value passes to next row Use NSInteger as array index How do I show a message in the foreach loop? Objects are not valid as a React child. If you meant to render a collection of children, use an array instead Iterating over arrays in Python 3 Best way to "push" into C# array Sort Array of object by object field in Angular 6 Checking for duplicate strings in JavaScript array what does numpy ndarray shape do? How to round a numpy array?

Examples related to unique

Count unique values with pandas per groups Find the unique values in a column and then sort them How can I check if the array of objects have duplicate property values? Firebase: how to generate a unique numeric ID for key? pandas unique values multiple columns Select unique values with 'select' function in 'dplyr' library Generate 'n' unique random numbers within a range SQL - select distinct only on one column Can I use VARCHAR as the PRIMARY KEY? Count unique values in a column in Excel

Examples related to duplicates

Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C Remove duplicates from a dataframe in PySpark How to "select distinct" across multiple data frame columns in pandas? How to find duplicate records in PostgreSQL Drop all duplicate rows across multiple columns in Python Pandas Left Join without duplicate rows from left table Finding duplicate integers in an array and display how many times they occurred How do I use SELECT GROUP BY in DataTable.Select(Expression)? How to delete duplicate rows in SQL Server? Python copy files to a new directory and rename if file name already exists