What s the most efficient way to erase duplicates and sort a vector

Question

I need to take a C   vector with potentially a lot of elements  erase duplicates  and sort it   I currently have the below code  but it doesn t work   vec erase        std  unique vec begin    vec end           vec end     std  sort vec begin    vec end       How can I correctly do this   Additionally  is it faster to erase the duplicates first  similar to coded above  or perform the sort first   If I do perform the sort first  is it guaranteed to remain sorted after std  unique is executed   Or is there another  perhaps more efficient  way to do all this

User · Answer

unique only removes consecutive duplicate elements  which is necessary for it to run in linear time   so you should perform the sort first  It will remain sorted after the call to unique

User · Answer

Here s a template to do it for you   template lt typename T gt  void removeDuplicates std  vector lt T gt  amp  vec        std  sort vec begin    vec end         vec erase std  unique vec begin    vec end     vec end         call it like   removeDuplicates lt int gt  vectorname

User · Answer

std  unique only works on consecutive runs of duplicate elements  so you d better sort first   However  it is stable  so your vector will remain sorted

User · Answer

void removeDuplicates std  vector lt int gt  amp  arr        for  int i   0  i  lt  arr size    i                  for  int j   i   1  j  lt  arr size    j                          if  arr i   gt  arr j                                 int temp   arr i                   arr i    arr j                   arr j    temp                                    std  vector lt int gt  y      int x   arr 0       int i   0      while  i  lt  arr size                  if  x    arr i                         y push back x               x   arr i                     i            if  i    arr size                y push back arr i - 1              arr   y

User · Answer

Efficiency is a complicated concept   There s time vs  space considerations  as well as general measurements  where you only get vague answers such as O n   vs  specific ones  e g  bubble sort can be much faster than quicksort  depending on input characteristics    If you have relatively few duplicates  then sort followed by unique and erase seems the way to go   If you had relatively many duplicates  creating a set from the vector and letting it do the heavy lifting could easily beat it   Don t just concentrate on time efficiency either   Sort unique erase operates in O 1  space  while the set construction operates in O n  space   And neither directly lends itself to a map-reduce parallelization  for really huge datasets

User · Answer

As already stated  unique requires a sorted container   Additionally  unique doesn t actually remove elements from the container   Instead  they are copied to the end  unique returns an iterator pointing to the first such duplicate element  and you are expected to call erase to actually remove the elements

User · Answer

std  unique only removes duplicate elements if they re neighbours  you have to sort the vector first before it will work as you intend   std  unique is defined to be stable  so the vector will still be sorted after running unique on it

User · Answer

std  set lt int gt  s  std  for each v cbegin    v cend      amp s  int val  s insert val      v clear    std  copy s cbegin    s cend    v cbegin

User · Answer

You need to sort it before you call unique because unique only removes duplicates that are next to each other    edit  38 seconds

User · Answer

If you are looking for performance and using std  vector  I recommend the one that this documentation link provides   std  vector lt int gt  myvector 10 20 20 20 30 30 20 20 10                  10 20 20 20 30 30 20 20 10 std  sort myvector begin    myvector end      const auto amp  it   std  unique  myvector begin    myvector end          10 20 30                                                                                                   myvector resize  std  distance myvector begin   it        10 20 30

User · Answer

I m not sure what you are using this for  so I can t say this with 100  certainty  but normally when I think  sorted  unique  container  I think of a std  set  It might be a better fit for your usecase   std  set lt Foo gt  foos vec begin    vec end        both sorted  amp  unique already   Otherwise  sorting prior to calling unique  as the other answers pointed out  is the way to go

User · Answer

The standard approach suggested by Nate Kohl  just using vector  sort   unique   sort  vec begin    vec end      vec erase  unique  vec begin    vec end      vec end        doesn t work for a vector of pointers   Look carefully at this example on cplusplus com   In their example  the  so called duplicates  moved to the end are actually shown as    undefined values   because those  so called duplicates  are SOMETIMES  extra elements  and SOMETIMES there are  missing elements  that were in the original vector   A problem occurs when using std  unique   on a vector of pointers to objects  memory leaks  bad read of data from HEAP   duplicate frees  which cause segmentation faults  etc    Here s my solution to the problem  replace std  unique   with ptgi  unique     See the file ptgi unique hpp below      ptgi  unique         Fix a problem in std  unique    such that none of the original elts in the collection are lost or duplicate     ptgi  unique   has the same interface as std  unique         There is the 2 argument version which calls the default operator   to compare elements        There is the 3 argument version  which you can pass a user defined functor for specialized comparison        ptgi  unique   is an improved version of std  unique   which doesn t looose any of the original data    in the collection  nor does it create duplicates        After ptgi  unique    every old element in the original collection is still present in the re-ordered collection     except that duplicates have been moved to a contiguous range  dupPosition  last  at the end        Thus on output       begin  dupPosition  range are unique elements       dupPosition  last  range are duplicates which can be removed     where         means inclusive  and        means exclusive        In the original std  unique   non-duplicates at end are moved downward toward beginning     In the improved ptgi unique    non-duplicates at end are swapped with duplicates near beginning        In addition if you have a collection of ptrs to objects  the regular std  unique   will loose memory     and can possibly delete the same pointer multiple times  leading to SEGMENTATION VIOLATION on Linux machines     but ptgi  unique   won t   Use valgrind 1  to find such memory leak problems          NOTE  IF you have a vector of pointers  that is  std  vector lt Object  gt   then upon return from ptgi  unique      you would normally do the following to get rid of the duplicate objects in the HEAP            delete objects from HEAP     std  vector lt Object  gt  objects      for  iter   dupPosition  iter    objects end      iter                delete   iter                   shrink the vector  But Object   pointers are NOT followed for duplicate deletes  this shrinks the vector size        objects erase dupPosition  objects end          NOTE  But if you have a vector of objects  that is  std  vector lt Object gt   then upon return from ptgi  unique    it    suffices to just call vector erase   as erase will automatically call delete on each object in the     dupPosition  end  range for you         std  vector lt Object gt  objects      objects erase dupPosition  last                                                                                                                      Example of differences between std  unique   vs ptgi  unique           Given          int data      10  11  21          Given this functor  ArrayOfIntegersEqualByTen          A functor which compares two integers a i  and a j  in an int a   array  after division by 10              given an int data   array  remove consecutive duplicates from it         functor used for std  unique  BUGGY  or ptgi  unique IMPROVED            Two numbers equal if  when divided by 10  integer division   the quotients are the same         Hence 50  59 are equal  60  69 are equal  etc      struct ArrayOfIntegersEqualByTen  public std  equal to lt int gt                bool operator    const int amp  arg1  const int amp  arg2  const                       return   arg1 10      arg2 10                              Now  if we call  problematic  std  unique  data  data 3  ArrayOfIntegersEqualByTen               TEST1  BEFORE UNIQ  10 11 21     TEST1  AFTER UNIQ  10 21 21     DUP INX 2              PROBLEM  11 is lost  and extra 21 has been added           More complicated example           TEST2  BEFORE UNIQ  10 20 21 22 30 31 23 24 11     TEST2  AFTER UNIQ  10 20 30 23 11 31 23 24 11     DUP INX 5              Problem  21 and 22 are deleted          Problem  11 and 23 are duplicated                NOW if ptgi  unique is called instead of std  unique  both problems go away           DEBUG  TEST1  NEW WAY 1     TEST1  BEFORE UNIQ  10 11 21     TEST1  AFTER UNIQ  10 21 11     DUP INX 2          DEBUG  TEST2  NEW WAY 1     TEST2  BEFORE UNIQ  10 20 21 22 30 31 23 24 11     TEST2  AFTER UNIQ  10 20 30 23 11 31 22 24 21     DUP INX 5         SEE  look at the  case study  below to understand which the last  AFTER UNIQ  results with that order      TEST2  AFTER UNIQ  10 20 30 23 11 31 22 24 21                                                                                                                    Case Study  how ptgi  unique   works      Remember we  remove adjacent duplicates       In this example  the input is NOT fully sorted when ptgi unique   is called         I put   separatators  BEFORE UNIQ to illustrate this     10    20 21 22    30 31    23 24   11        In example above  20  21  22 are  same  since dividing by 10 gives 2 quotient      And 30 31 are  same   since  10 quotient is 3      And 23  24 are same  since  10 quotient is 2      And 11 is  group of one  by itself      So there are 5 groups  but the 4th group  23  24  happens to be equal to group 2  20  21  22      So there are 5 groups  and the 5th group  11  is equal to group 1  10         R   result     F   first        10  20  21  22  30  31  23  24  11     R    F        10 is result  and first points to 20  and R    F  10    20  so bump R           R          F        Now we hits the  optimized out swap logic        avoid swap because R    F            now bump F until R    F  integer division by 10      10  20  21  22  30  31  23  24  11          R   F                 20    21 in 10x          R       F                 20    22 in 10x          R           F             20    30  so we do a swap of   R and F      Now first hits 21  22  then finally 30  which is different than R  so we swap bump R to 21 and swap with  30      10  20  30  22  21  31  23  24  11     after R  amp  F swap  21 and 30               R       F         10  20  30  22  21  31  23  24  11              R          F              bump F to 31  but R and F are same  30 vs 31               R               F         bump F to 23  R    F  so swap   R with F     10  20  30  22  21  31  23  24  11                     R           F          bump R to 22     10  20  30  23  21  31  22  24  11     after the R  amp  F swap  22  amp  23 swap                      R            F         will swap 22 and 23                     R                F         bump F to 24  but R and F are same in 10x                     R                    F     bump F  R    F  so swap   R  with F                         R                F     R and F are diff  so swap   R  with F  21 and 11      10  20  30  23  11  31  22  24  21                         R                F     aftter swap of old 21 and 11                         R                  F       F now at last    so loop terminates                             R               F      bump R by 1 to point to dupPostion  first duplicate in range         return R which now points to 31                                                                                                                 NOTES     1  the  ifdef IMPROVED STD UNIQUE ALGORITHM documents how we have modified the original std  unique       2  I ve heavily unit tested this code  including using valgrind 1   and it is  believed  to be 100  defect-free                                                                                                                     History      130201  dpb dbednar ptgi com created                                                                                                                ifndef PTGI UNIQUE HPP  define PTGI UNIQUE HPP     Created to solve memory leak problems when calling std  unique   on a vector lt Route  gt      Memory leaks discovered with valgrind and unitTesting     include  lt algorithm gt            std  swap     instead of std  myUnique  call this instead  where arg3 is a function ptr       like std  unique  it puts the dups at the end  but it uses swapping to preserve original    vector contents  to avoid memory leaks and duplicate pointers in vector lt Object  gt     ifdef IMPROVED STD UNIQUE ALGORITHM  error the  ifdef for IMPROVED STD UNIQUE ALGORITHM was defined previously   Something is wrong   endif   undef IMPROVED STD UNIQUE ALGORITHM  define IMPROVED STD UNIQUE ALGORITHM     similar to std  unique  except that this version swaps elements  to avoid    memory leaks  when vector contains pointers        Normally the input is sorted     Normal std  unique     10 20 20 20 30   30 20 20 10    a  b  c  d  e    f  g  h  i       10 20 30 20 10   30 20 20 10    a  b  e  g  i    f  g  h  i       Now GONE  c  d     Now DUPS  g  i     This causes memory leaks and segmenation faults due to duplicate deletes of same pointer    namespace ptgi       Return the position of the first in range of duplicates moved to end of vector        uses operator    of class for comparison        param  first  last  is a range to find duplicates within         return the dupPosition position  such that  dupPosition  end  are contiguous    duplicate elements     IF all items are unique  then it would return last     template  lt class ForwardIterator gt  ForwardIterator unique  ForwardIterator first  ForwardIterator last           compare iterators  not values     if  first    last          return last          remember the current item that we are looking at for uniqueness     ForwardIterator result   first          result is slow ptr where to store next unique item        first is  fast ptr which is looking at all elts         the first iterator moves over all elements  begin 1  end          while the current item  result  is the same as all elts        to the right   first  keeps going  until you find a different        element pointed to by  first   At that time  we swap them       while    first    last                if     result     first              ifdef IMPROVED STD UNIQUE ALGORITHM                inc result  then swap  result and  first              THIS IS WHAT WE WANT TO DO              BUT THIS COULD SWAP AN ELEMENT WITH ITSELF  UNCECESSARILY                std  swap   first      result                    BUT avoid swapping with itself when both iterators are the same               result              if  result    first                  std  swap   first   result    else                original code found in std  unique                  copies unique down                 result     first   endif                      return   result     template  lt class ForwardIterator  class BinaryPredicate gt  ForwardIterator unique  ForwardIterator first  ForwardIterator last  BinaryPredicate pred        if  first    last          return last          remember the current item that we are looking at for uniqueness     ForwardIterator result   first       while    first    last                if   pred  result  first              ifdef IMPROVED STD UNIQUE ALGORITHM                inc result  then swap  result and  first              THIS COULD SWAP WITH ITSELF UNCECESSARILY             std  swap   first      result                      BUT avoid swapping with itself when both iterators are the same               result              if  result    first                  std  swap   first   result     else                original code found in std  unique                  copies unique down                causes memory leaks  and duplicate ptrs                and uncessarily moves in place                  result     first   endif                      return   result        from now on  the  define is no longer needed  so get rid of it  undef IMPROVED STD UNIQUE ALGORITHM       end ptgi   namespace   endif   And here is the UNIT Test program that I used to test it      QUESTION  in test2  I had trouble getting one line to compile which was caused  by the declaration of operator      in the equal to Predicate   I m not sure how to correctly resolve that issue     Look for   OUT lines       Make sure that NOTES in ptgi unique hpp are correct  in how we should  cleanup  duplicates    from both a vector lt Integer gt   test1    and vector lt Integer  gt   test2      Run this with valgrind 1         In test2    IF we use the call to std  unique    we get this problem          dbednar ipeng8 TestSortRoutes     Main7     TEST2  ORIG nums before UNIQUE  10  20  21  22  30  31  23  24  11     TEST2  modified nums AFTER UNIQUE  10  20  30  23  11  31  23  24  11     INFO  dupInx 5     TEST2  uniq   10     TEST2  uniq   20     TEST2  uniq   30     TEST2  uniq   33427744     TEST2  uniq   33427808     Segmentation fault  core dumped        And if we run valgrind we seen various error about  read errors    mismatched free    definitely lost   etc         valgrind --leak-check full   Main7       359   Memcheck  a memory error detector       359   Command    Main7       359   Invalid read of size 4       359   Invalid free     delete   delete         359   HEAP SUMMARY        359       in use at exit  8 bytes in 2 blocks       359   LEAK SUMMARY        359      definitely lost  8 bytes in 2 blocks    But once we replace the call in test2   to use ptgi  unique    all valgrind   error messages disappear        130212   dpb dbednar ptgi com created                                                                                                                include  lt iostream gt     std  cout  std  cerr  include  lt string gt   include  lt vector gt       std  vector  include  lt sstream gt      std  ostringstream  include  lt algorithm gt        std  unique    include  lt functional gt       std  equal to    std  binary function    include  lt cassert gt      assert   MACRO   include  ptgi unique hpp      ptgi  unique         Integer is small  wrapper class  around a primitive int     There is no SETTER  so Integer s are IMMUTABLE  just like in JAVA   class Integer   private      int num  public          default CTOR   Integer zero          COMPRENSIVE CTOR    Integer five 5        Integer  int num   0             num num                      COPY CTOR     Integer  const Integer amp  rhs            num rhs num                      assignment  operator   needs nothing special    since all data members are primitives         GETTER for  num  data member        GETTER  are  always  const     int getNum   const               return num                   NO SETTER  because IMMUTABLE  similar to Java s Integer class           return  num         NB  toString   should  always  be a const method               NOTE  it is probably more efficient to call getNum   intead        of toString   when printing a number                BETTER to do this          Integer five 5           std  cout  lt  lt  five getNum    lt  lt    n         than this          std  cout  lt  lt  five toString    lt  lt    n       std  string toString   const               std  ostringstream oss          oss  lt  lt  num          return oss str                 convenience typedef s for iterating over std  vector lt Integer gt  typedef std  vector lt Integer gt   iterator      IntegerVectorIterator  typedef std  vector lt Integer gt   const iterator    ConstIntegerVectorIterator      convenience typedef s for iterating over std  vector lt Integer  gt  typedef std  vector lt Integer  gt   iterator     IntegerStarVectorIterator  typedef std  vector lt Integer  gt   const iterator   ConstIntegerStarVectorIterator      functor used for std  unique or ptgi  unique   on a std  vector lt Integer gt     Two numbers equal if  when divided by 10  integer division   the quotients are the same     Hence 50  59 are equal  60  69 are equal  etc  struct IntegerEqualByTen  public std  equal to lt Integer gt        bool operator    const Integer amp  arg1  const Integer amp  arg2  const               return   arg1 getNum   10      arg2 getNum   10                 functor used for std  unique or ptgi  unique on a std  vector lt Integer  gt     Two numbers equal if  when divided by 10  integer division   the quotients are the same     Hence 50  59 are equal  60  69 are equal  etc  struct IntegerEqualByTenPointer  public std  equal to lt Integer  gt           NB  the Integer  amp  looks funny to me         TECHNICAL PROBLEM ELSEWHERE so had to remove the  amp  from   amp    OUT   bool operator    const Integer  amp  arg1  const Integer  amp  arg2  const        bool operator    const Integer  arg1  const Integer  arg2  const               return   arg1- gt getNum   10      arg2- gt getNum   10              void test1    void test2    void printIntegerStarVector  const std  string amp  msg  const std  vector lt Integer  gt  amp  nums     int main         test1        test2        return 0        test1   uses a vector lt Object gt   namely vector lt Integer gt    so there is no problem with memory loss void test1         int data       10  20  21  22  30  31  23  24  11           turn C array into C   vector     std  vector lt Integer gt  nums data  data 9           arg3 is a functor     IntegerVectorIterator dupPosition   ptgi  unique  nums begin    nums end    IntegerEqualByTen           nums erase dupPosition  nums end          nums erase nums begin    dupPosition                                                                                              test2   uses a vector lt Integer  gt   so after ptgi unique    we have to be careful in    how we eliminate the duplicate Integer objects stored in the heap                                                                                       void test2         int data       10  20  21  22  30  31  23  24  11           turn C array into C   vector of Integer  pointers     std  vector lt Integer  gt  nums          put data   integers into equivalent Integer  objects in HEAP     for  int inx   0  inx  lt  9    inx                nums push back  new Integer data inx                    print the vector lt Integer  gt  to stdout     printIntegerStarVector   TEST2  ORIG nums before UNIQUE   nums            arg3 is a functor  if 1        corrected version which fixes SEGMENTATION FAULT and all memory leaks reported by valgrind 1         I THINK we want to use new C  11 cbegin   and cend   since the equal to predicate is passed  Integer   amp        DID NOT COMPILE   OUT   IntegerStarVectorIterator dupPosition   ptgi  unique  const cast lt ConstIntegerStarVectorIterator gt  nums begin     const cast lt ConstIntegerStarVectorIterator gt  nums end     IntegerEqualByTenPointer              DID NOT COMPILE when equal to predicate declared  Integer  amp  arg1  Integer  amp   arg2    OUT   IntegerStarVectorIterator dupPosition   ptgi  unique  const cast lt nums  const iterator gt  nums begin     const cast lt nums  const iterator gt  nums end     IntegerEqualByTenPointer               okay when equal to predicate declared  Integer  arg1  Integer   arg2      IntegerStarVectorIterator dupPosition   ptgi  unique nums begin    nums end    IntegerEqualByTenPointer       else        BUGGY version that causes SEGMENTATION FAULT and valgrind 1  errors     IntegerStarVectorIterator dupPosition   std  unique  nums begin    nums end    IntegerEqualByTenPointer       endif      printIntegerStarVector   TEST2  modified nums AFTER UNIQUE   nums        int dupInx   dupPosition - nums begin        std  cout  lt  lt   INFO  dupInx    lt  lt  dupInx  lt  lt   n           delete the dup Integer  objects in the  dupPosition  end  range     for  IntegerStarVectorIterator iter   dupPosition  iter    nums end      iter                delete   iter                 shrink the vector        NB  the Integer  ptrs are NOT followed by vector  erase       nums erase dupPosition  nums end              print the uniques  by following the iter to the Integer  pointer     for  IntegerStarVectorIterator iter   nums begin    iter    nums end       iter                std  cout  lt  lt   TEST2  uniq      lt  lt    iter - gt getNum    lt  lt    n                 remove the unique objects from heap     for  IntegerStarVectorIterator iter   nums begin    iter    nums end       iter                delete   iter                 shrink the vector     nums erase nums begin    nums end             the vector should now be completely empty     assert  nums size      0          print to stdout the string   info msg  num1  num2       numN n  void printIntegerStarVector  const std  string amp  msg  const std  vector lt Integer  gt  amp  nums         std  cout  lt  lt  msg  lt  lt            int inx   0      ConstIntegerStarVectorIterator  iter          use const iterator and const range         NB  cbegin   and cend   not supported until LATER  c  11      for  iter   nums begin    inx   0  iter    nums end      iter    inx                   output a comma seperator  AFTER  first         if  inx  gt  0              std  cout  lt  lt                    call Integer  toString           std  cout  lt  lt    iter - gt getNum           send int to stdout         std  cout  lt  lt    iter - gt toString         also works  but is probably slower                in conclusion  add newline     std  cout  lt  lt    n

User · Answer

Assuming that a is a vector  remove the contiguous duplicates using  a erase unique a begin   a end    a end     runs in O n  time

User · Answer

About alexK7 benchmarks  I tried them and got similar results  but when the range of values is 1 million the cases using std  sort  f1  and using std  unordered set  f5  produce similar time  When the range of values is 10 million f1 is faster than f5    If the range of values is limited and the values are unsigned int  it is possible to use std  vector  the size of which corresponds to the given range  Here is the code   void DeleteDuplicates vector bool std  vector lt unsigned gt  amp  v  unsigned range size        std  vector lt bool gt  v1 range size       for  auto amp  x  v               v1 x    true                v clear         unsigned count   0      for  auto amp  x  v1                if  x                        v push back count                       count

User · Answer

I agree with R  Pate and Todd Gardner  a std  set might be a good idea here   Even if you re stuck using vectors  if you have enough duplicates  you might be better off creating a set to do the dirty work   Let s compare three approaches   Just using vector  sort   unique  sort  vec begin    vec end      vec erase  unique  vec begin    vec end      vec end        Convert to set  manually   set lt int gt  s  unsigned size   vec size    for  unsigned i   0  i  lt  size    i   s insert  vec i     vec assign  s begin    s end        Convert to set  using a constructor   set lt int gt  s  vec begin    vec end      vec assign  s begin    s end        Here s how these perform as the number of duplicates changes     Summary  when the number of duplicates is large enough  it s actually faster to convert to a set and then dump the data back into a vector     And for some reason  doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used

User · Answer

More understandable code from  https   en cppreference com w cpp algorithm unique   include  lt iostream gt   include  lt algorithm gt   include  lt vector gt   include  lt string gt   include  lt cctype gt   int main             remove duplicate elements     std  vector lt int gt  v 1 2 3 1 2 3 3 4 5 4 5 6 7       std  sort v begin    v end        1 1 2 2 3 3 3 4 4 5 5 6 7      auto last   std  unique v begin    v end            v now holds  1 2 3 4 5 6 7 x x x x x x   where  x  is indeterminate     v erase last  v end          for  int i   v        std  cout  lt  lt  i  lt  lt           std  cout  lt  lt    n       ouput   1 2 3 4 5 6 7

User · Answer

void EraseVectorRepeats vector  lt int gt   amp  v    TOP for int y 0  y lt v size     y           for int z 0  z lt v size     z               if y  z     This if statement makes sure the number that it is on is not erased-just skipped-in order to keep only one copy of a repeated number                 continue               if v y   v z                    v erase v begin   z     whenever a number is erased the function goes back to start of the first loop because the size of the vector changes             goto TOP        This is a function that I created that you can use to delete repeats  The header files needed are just  lt iostream gt  and  lt vector gt

User · Answer

I redid Nate Kohl s profiling and got different results  For my test case  directly sorting the vector is always more efficient than using a set  I added a new more efficient method  using an unordered set   Keep in mind that the unordered set method only works if you have a good hash function for the type you need uniqued and sorted  For ints  this is easy   The standard library provides a default hash which is simply the identity function   Also  don t forget to sort at the end since unordered set is  well  unordered     I did some digging inside the set and unordered set implementation and discovered that the constructor actually construct a new node for every element  before checking its value to determine if it should actually be inserted  in Visual Studio implementation  at least    Here are the 5 methods   f1  Just using vector  sort   unique  sort  vec begin    vec end      vec erase  unique  vec begin    vec end      vec end        f2  Convert to set  using a constructor   set lt int gt  s  vec begin    vec end      vec assign  s begin    s end        f3  Convert to set  manually   set lt int gt  s  for  int i   vec      s insert i   vec assign  s begin    s end        f4  Convert to unordered set  using a constructor   unordered set lt int gt  s  vec begin    vec end      vec assign  s begin    s end      sort  vec begin    vec end        f5  Convert to unordered set  manually   unordered set lt int gt  s  for  int i   vec      s insert i   vec assign  s begin    s end      sort  vec begin    vec end        I did the test with a vector of 100 000 000 ints chosen randomly in ranges  1 10    1 1000   and  1 100000   The results  in seconds  smaller is better    range         f1       f2       f3       f4      f5  1 10       1 6821   7 6804   2 8232   6 2634  0 7980  1 1000     5 0773  13 3658   8 2235   7 6884  1 9861  1 100000   8 7955  32 1148  26 5485  13 3278  3 9822

User · Answer

Here s the example of the duplicate delete problem that occurs with std  unique     On a LINUX machine  the program crashes   Read the comments for details      Main10 cpp       Illustration of duplicate delete and memory leak in a vector lt int  gt  after calling std  unique     On a LINUX machine  it crashes the progam because of the duplicate delete        INPUT    1  2  2  3     OUTPUT   1  2  3  3        The two 3 s are actually pointers to the same 3 integer in the HEAP  which is BAD    because if you delete both int  pointers  you are deleting the same memory    location twice           Never mind the fact that we ignore the  dupPosition  returned by std  unique       but in any sensible program that  cleans up after istelf  you want to call deletex    on all int  poitners to avoid memory leaks           NOW IF you replace std  unique   with ptgi  unique    all of the the problems disappear     Why  Because ptgi unique merely reshuffles the data     OUTPUT   1  2  3  2     The ptgi unique has swapped the last two elements  so all of the original elements in    the INPUT are STILL in the OUTPUT        130215   dbednar ptgi com                                                                                  include  lt iostream gt   include  lt vector gt   include  lt algorithm gt   include  lt functional gt    include  ptgi unique hpp      functor used by std  unique to remove adjacent elts from vector lt int  gt  struct EqualToVectorOfIntegerStar  public std  equal to lt int   gt        bool operator    const int  arg1  const int  arg2  const               return   arg1     arg2             void printVector  const std  string amp  msg  const std  vector lt int  gt  amp  vnums    int main         int inums        1  2  2  3        std  vector lt int  gt  vnums          convert C array into vector of pointers to integers     for  size t inx   0  inx  lt  4     inx          vnums push back  new int inums inx           printVector  BEFORE UNIQ   vnums           INPUT   1  2A  2B  3     std  unique  vnums begin    vnums end    EqualToVectorOfIntegerStar             OUTPUT  1  2A  3  3       printVector  AFTER  UNIQ   vnums           now we delete 3 twice  and we have a memory leak because 2B is not deleted      for  size t inx   0  inx  lt  vnums size      inx                delete vnums inx                print a line of the form  msg  1 2 3    5 6 7 n   where 1  7 are the numbers in vnums vector    PS  you may pass  hello world   const char    because of implicit  automatic  conversion    from  const char    to std  string conversion   void printVector  const std  string amp  msg  const std  vector lt int  gt  amp  vnums        std  cout  lt  lt  msg  lt  lt             for  size t inx   0  inx  lt  vnums size      inx                   insert comma separator before current elt  but ONLY after first elt         if  inx  gt  0              std  cout  lt  lt               std  cout  lt  lt   vnums inx              std  cout  lt  lt    n

User · Answer

If you don t want to modify the vector  erase  sort  then you can use the Newton library  In the algorithm sublibrary there is a function call  copy single  template  lt class INPUT ITERATOR  typename T gt      void copy single  INPUT ITERATOR first  INPUT ITERATOR last  std  vector lt T gt   amp v     so You can   std  vector lt TYPE gt  copy     empty vector newton  copy single first  last  copy     where copy is the vector in where you want to push back the copy of the unique elements  but remember you push back the elements  and you don t create a new vector  anyway  this is faster because you don t erase   the elements  which takes a lot of time  except when you pop back    because of reassignment   I make some experiments and it s faster   Also  you can use   std  vector lt TYPE gt  copy     empty vector newton  copy single first  last  copy   original   copy    sometimes is still faster

User · Answer

You can do this as follows  std  sort v begin    v end     v erase std  unique v begin    v end     v end

User · Answer

With the Ranges v3 library  you can simply use action  unique vec    Note that it actually removes the duplicate elements  not just move them  Unfortunately  actions weren   t standardized in C  20 as other parts of the ranges library were you still have to use the original library even in C  20

User · Answer

If you do not want to change the order of elements  then you can try this solution   template  lt class T gt  void RemoveDuplicatesInVector std  vector lt T gt   amp  vec        set lt T gt  values      vec erase std  remove if vec begin    vec end      amp   const T  amp  value    return  values insert value  second      vec end

[c++] What's the most efficient way to erase duplicates and sort a vector?

Examples related to c++

Examples related to sorting

Examples related to vector

Examples related to stl

Examples related to duplicates