Algorithm efficient way to remove duplicate integers from an array

Question

I got this problem from an interview with Microsoft      Given an array of random integers    write an algorithm in C that removes   duplicated numbers and return the unique numbers in the original   array    E g Input   4  8  4  1  1  2  9   Output   4  8  1  2  9         One caveat is that the expected algorithm should not required the array to be sorted first  And when an element has been removed  the following elements must be shifted forward as well  Anyway  value of elements at the tail of the array where elements were shifted forward are negligible    Update  The result must be returned in the original array and helper data structure  e g  hashtable  should not be used  However  I guess order preservation is not necessary   Update2  For those who wonder why these impractical constraints  this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas

User · Answer

Simply take a variable x arr 0  and do xor operation by traversing the rest of the elements   If an element has repeated then the x will become zero   This way we know that the element has repeated previously   This also will just take o n  to scan through all of the elements in the original array

User · Answer

Here is a Java Version   int   removeDuplicate int   input            int arrayLen   input length          for int i 0 i lt arrayLen i                 for int j   i 1  j lt  arrayLen   j                     if   input i  input j      0                        input j    0                                    if  input j   0   amp  amp  j lt arrayLen-1                           input j    input j 1                           input j 1    0                                                                              return input

User · Answer

How about the following   int  temp   malloc sizeof int  len   int count   0  int x  0  int y  0  for x 0 x lt len x          for y 0 y lt count y                  if   temp y     array x                         break                      if y  count                  temp count      array x           count            memcpy array  temp  sizeof int  len     I try to declare a temp array and put the elements into that before copying everything back to the original array

User · Answer

1  Using O 1  extra space  in O n log n  time  This is possible  for instance    first do an in-place O n log n  sort then walk through the list once  writing the first instance of every back to the beginning of the list   I believe ejel s partner is correct that the best way to do this would be an in-place merge sort with a simplified merge step  and that that is probably the intent of the question  if you were eg  writing a new library function to do this as efficiently as possible with no ability to improve the inputs  and there would be cases it would be useful to do so without a hash-table  depending on the sorts of inputs  But I haven t actually checked this   2  Using O lots  extra space  in O n  time   declare a zero d array big enough to hold all integers walk through the array once set the corresponding array element to 1 for each integer  If it was already 1  skip that integer    This only works if several questionable assumptions hold    it s possible to zero memory cheaply  or the size of the ints are small compared to the number of them you re happy to ask your OS for 256 sizepof int  memory and it will cache it for you really really efficiently if it s gigantic   It s a bad answer  but if you have LOTS of input elements  but they re all 8-bit integers  or maybe even 16-bit integers  it could be the best way   3  O little -ish extra space  O n -ish time  As  2  but use a hash table   4  The clear way  If the number of elements is small  writing an appropriate algorithm is not useful if other code is quicker to write and quicker to read   Eg  Walk through the array for each unique elements  ie  the first element  the second element  duplicates of the first having been removed  etc  removing all identical elements  O 1  extra space  O n 2  time   Eg  Use library functions which do this  efficiency depends which you have easily available

User · Answer

You could do this in a single traversal  if you are willing to sacrifice memory   You can simply tally whether you have seen an integer or not in a hash associative array   If you have already seen a number  remove it as you go  or better yet  move numbers you have not seen into a new array  avoiding any shifting in the original array   In Perl   foreach  i   myary        if  defined  seen  i              seen  i    1          push  newary   i

User · Answer

This is the naive  N  N-1  2  solution  It uses constant additional space and maintains the original order  It is similar to the solution by  Byju  but uses no if     blocks  It also avoids copying an element onto itself    include  lt stdio h gt   include  lt stdlib h gt   int numbers      4  8  4  1  1  2  9    define COUNT  sizeof numbers   sizeof numbers 0    size t undup it int array    size t len    size t src dst        an array of size 1 cannot contain duplicate values    if  len  lt 2  return len        an array of size gt 1 will cannot at least one unique value    for  src dst 1  src  lt  len  src              size t cur          for  cur 0  cur  lt  dst  cur                       if  array cur     array src   break                            if  cur    dst  continue     found a duplicate                        array src  must be new  add it to the list of non-duplicates            if  dst  lt  src  array dst    array src      avoid copy-to-self            dst              return dst     number of valid alements in new array       void print it int array    size t len    size t idx   for  idx 0  idx  lt  len  idx               printf   c  d    idx               array idx               printf    n        int main void            size t cnt   COUNT       printf  Before undup              print it numbers  cnt            cnt   undup it numbers cnt        printf  After undup              print it numbers  cnt        return 0

User · Answer

Integer   arrayInteger    1 2 3 4 3 2 4 6 7 8 9 9 10     Set set   new HashSet    for Integer i arrayInteger  set add i    System out println set

User · Answer

The following example should solve your problem   def check dump x      if not x in t        t append x        return True  t     output   filter check dump  input   print output  True

User · Answer

A solution suggested by my girlfriend is a variation of merge sort  The only modification is that during the merge step  just disregard duplicated values  This solution would be as well O n log n   In this approach  the sorting duplication removal are combined together  However  I m not sure if that makes any difference  though

User · Answer

I ve posted this once before on SO  but I ll reproduce it here because it s pretty cool   It uses hashing  building something like a hash set in place  It s guaranteed to be O 1  in axillary space  the recursion is a tail call   and is typically O N  time complexity  The algorithm is as follows    Take the first element of the array  this will be the sentinel  Reorder the rest of the array  as much as possible  such that each element is in the position corresponding to its hash  As this step is completed  duplicates will be discovered  Set them equal to sentinel  Move all elements for which the index is equal to the hash to the beginning of the array  Move all elements that are equal to sentinel  except the first element of the array  to the end of the array  What s left between the properly hashed elements and the duplicate elements will be the elements that couldn t be placed in the index corresponding to their hash because of a collision  Recurse to deal with these elements    This can be shown to be O N  provided no pathological scenario in the hashing  Even if there are no duplicates  approximately 2 3 of the elements will be eliminated at each recursion  Each level of recursion is O n  where small n is the amount of elements left  The only problem is that  in practice  it s slower than a quick sort when there are few duplicates  i e  lots of collisions  However  when there are huge amounts of duplicates  it s amazingly fast   Edit  In current implementations of D  hash t is 32 bits  Everything about this algorithm assumes that there will be very few  if any  hash collisions in full 32-bit space  Collisions may  however  occur frequently in the modulus space  However  this assumption will in all likelihood be true for any reasonably sized data set  If the key is less than or equal to 32 bits  it can be its own hash  meaning that a collision in full 32-bit space is impossible  If it is larger  you simply can t fit enough of them into 32-bit memory address space for it to be a problem  I assume hash t will be increased to 64 bits in 64-bit implementations of D  where datasets can be larger  Furthermore  if this ever did prove to be a problem  one could change the hash function at each level of recursion   Here s an implementation in the D programming language   void uniqueInPlace T  ref T   dataIn        uniqueInPlaceImpl dataIn  0      void uniqueInPlaceImpl T  ref T   dataIn  size t start        if dataIn length - start  lt  2          return       invariant T sentinel   dataIn start       T   data   dataIn start   1           static hash t getHash T elem            static if is T    uint     is T    int                 return cast hash t  elem            else static if   traits compiles  elem toHash                 return elem toHash            else               static auto ti   typeid typeof elem                return ti getHash  amp elem                        for size t index   0  index  lt  data length             if data index     sentinel                index                continue                     auto hash   getHash data index     data length          if index    hash                index                continue                     if data index     data hash                 data index    sentinel              index                continue                     if data hash     sentinel                swap data hash   data index                index                continue                     auto hashHash   getHash data hash     data length          if hashHash    hash                swap data index   data hash                if hash  lt  index                  index              else               index                          size t swapPos   0      foreach i  0  data length            if data i     sentinel  amp  amp  i    getHash data i     data length                swap data i   data swapPos                           size t sentinelPos   data length      for size t i   swapPos  i  lt  sentinelPos             if data i     sentinel                swap data i   data --sentinelPos              else               i                         dataIn   dataIn 0  sentinelPos   start   1       uniqueInPlaceImpl dataIn  start   swapPos   1

User · Answer

It d be cool if you had a good DataStructure that could quickly tell if it contains an integer  Perhaps a tree of some sort   DataStructure elementsSeen   new DataStructure    int elementsRemoved   0  for int i 0 i lt array Length i       if elementsSeen Contains array i       elementsRemoved      else     array i-elementsRemoved    array i     array Length   array Length - elementsRemoved

User · Answer

Well  it s basic implementation is quite simple  Go through all elements  check whether there are duplicates in the remaining ones and shift the rest over them   It s terrible inefficient and you could speed it up by a helper-array for the output or sorting binary trees  but this doesn t seem to be allowed

User · Answer

The return value of the function should be the number of unique elements and they are all stored at the front of the array  Without this additional information  you won t even know if there were any duplicates   Each iteration of the outer loop processes one element of the array  If it is unique  it stays in the front of the array and if it is a duplicate  it is overwritten by the last unprocessed element in the array  This solution runs in O n 2  time    include  lt stdio h gt   include  lt stdlib h gt   size t rmdup int  arr  size t len      size t prev   0    size t curr   1    size t last   len - 1    while  curr  lt   last        for  prev   0  prev  lt  curr  amp  amp  arr curr     arr prev     prev       if  prev    curr            curr        else         arr curr    arr last         --last              return curr     void print array int  arr  size t len      printf         size t curr   0    for  curr   0  curr  lt  len    curr        if  curr  gt  0  printf            printf   d   arr curr          printf          int main       int arr      4  8  4  1  1  2  9     printf  Before        size t len   sizeof  arr    sizeof  arr 0      print array arr  len     len   rmdup arr  len     printf   nAfter        print array arr  len     printf   n      return 0

User · Answer

For someone who want to have simple solution in C     int  rmdup int path    int start  int end  int amp  newEnd        int ret 100   newEnd   end  int j   start   for  int i   start  i  lt  end  i          if  path i     path i 1         newEnd--          continue            ret j      path i      ret j      path end    for int i   start  i  lt   newEnd  i         path i    ret i

User · Answer

This can be done in a single pass  in O N  time in the number of integers in the input list  and O N  storage in the number of unique integers   Walk through the list from front to back  with two pointers  dst  and  src  initialized to the first item   Start with an empty hash table of  integers seen    If the integer at src is not present in the hash  write it to the slot at dst and increment dst   Add the integer at src to the hash  then increment src   Repeat until src passes the end of the input list

User · Answer

In Java I would solve it like this  Don t know how to write this in C      int length   array length     for  int i   0  i  lt  length  i                for  int j   i   1  j  lt  length  j                      if  array i     array j                           int k  j              for  k   j   1  l   j  k  lt  length  k    l                                  if  array k     array i                                       array l    array k                                   else                                    l--                                             length   l

User · Answer

How about   void rmdup int  array  int length        int  current    end   array   length - 1       for   current   array   1  array  lt  end  array    current   array   1                 while   current  lt   end                         if    current     array                                  current    end--                            else                               current                                      Should be O n 2  or less

User · Answer

Here is my solution          find duplicates in an array and remove them  void unique int  input  int n         merge sort input  0  n          int prev   0          for int i   1   i  lt  n   i                     if input i     input prev                  if prev  lt  i-1                     input prev      input i

User · Answer

After review the problem  here is my delphi way  that may help  var A  Array of Integer  I J C K  P  Integer  begin C  10  SetLength A 10   A 0   1  A 1   4  A 2   2  A 3   6  A 4   3  A 5   4  A 6   3  A 7   4  A 8   2  A 9   5   for I    0 to C-1 do begin   for J    I 1 to C-1 do     if A I  A J  then     begin       for K    C-1 Downto J do         if A J  lt  gt A k  then         begin           P  A K             A K   0            A J   P            C  K            break          end         else         begin           A K   0            C  K          end      end  end     tructate array setlength A C   end

User · Answer

Create a BinarySearchTree which has O n  complexity

User · Answer

Given an array of n elements  write an algorithm to remove all duplicates from the array in time O nlogn   Algorithm delete duplicates  a 1    n     Remove duplicates from the given array    input parameters  a 1 n   an array of n elements      temp 1 n     an array of n elements    temp i  a i  for i 1 to n   temp i  value a i   temp i  key i     based on  value  sort the array temp     based on  value  delete duplicate elements from temp     based on  key  sort the array temp   construct an array p using temp    p i  temp i value    return p    In other of elements is maintained in the output array using the  key   Consider the key is of length O n   the time taken for performing sorting on the key and value is O nlogn   So the time taken to delete all duplicates from the array is O nlogn

User · Answer

If you are looking for the superior O-notation  then sorting the array with an O n log n  sort then doing a O n  traversal may be the best route  Without sorting  you are looking at O n 2    Edit  if you are just doing integers  then you can also do radix sort to get O n

User · Answer

First  you should create an array check n  where n is the number of elements of the array you want to make duplicate-free and set the value of every element of the check array  equal to 1  Using a for loop traverse the array with the duplicates  say its name is arr  and in the for-loop write this          if  check arr i      1            arr i    0            else           check arr i     0            With that  you set every duplicate equal to zero  So the only thing is left to do is to traverse the arr array and print everything it s not equal to zero  The order stays and it takes linear time  3 n

User · Answer

One more efficient implementation   int i  j      new length of modified array    int NewLength   1   for i 1  i lt  Length  i         for j 0  j lt  NewLength   j                if array i     array j         break              if none of the values in index 0  j  of array is not same as array i         then copy the current value to corresponding new position in array       if  j  NewLength         array NewLength      array i       In this implementation there is no need for sorting the array   Also if a duplicate element is found  there is no need for shifting all elements after this by one position   The output of this code is array   with size NewLength  Here we are starting from the 2nd elemt in array and comparing it with all the elements in array up to this array  We are holding an extra index variable  NewLength  for modifying the input array  NewLength variabel is initialized to 0   Element in array 1  will be compared with array 0    If they are different  then value in array NewLength  will be modified with array 1  and increment NewLength   If they are same  NewLength will not be modified   So if we have an array  1 2 1 3 1   then  In First pass of  j  loop  array 1   2  will be compared with array0  then 2 will be written to array NewLength    array 1  so array will be  1 2  since NewLength   2  In second pass of  j  loop  array 2   1  will be compared with array0 and array1  Here since array 2   1  and array0 are same loop will break here  so array will be  1 2  since NewLength   2  and so on

User · Answer

If you are allowed to use C    a call to std  sort followed by a call to std  unique will give you the answer  The time complexity is O N log N  for the sort and O N  for the unique traversal   And if C   is off the table there isn t anything that keeps these same algorithms from being written in C

User · Answer

import java util ArrayList    public class C        public static void main String   args             int arr      2 5 5 5 9 11 11 23 34 34 34 45 45            ArrayList lt Integer gt  arr1   new ArrayList lt Integer gt              for int i 0 i lt arr length-1 i                  if arr i     arr i 1                    arr i    99999                                   for int i 0 i lt arr length i                 if arr i     99999                    arr1 add arr i                                     System out println arr1

User · Answer

this is what i ve got  though it misplaces the order we can sort in ascending or descending to fix it up    include  lt stdio h gt  int main void   int x n myvar 0  printf  Enter a number   t    scanf   d   amp n   int arr n  changedarr n    for x 0 x lt n x         printf  Enter a number for array  d     x       scanf   d   amp arr x      printf   nOriginal Number in an array n    for x 0 x lt n x         printf   d t  arr x       int i 0 j 0     printf  i tj tarr tchanged n     for  int i   0  i  lt  n  i             printf   d t d t d t d n  i j arr i  changedarr i         for  int j   0  j  lt n  j                     if  i  j                        continue                     else if arr i   arr j                changedarr j  0                     else              changedarr i  arr i                     printf   d t d t d t d n  i j arr i  changedarr i               myvar  1       printf   n nmyvar  d n  myvar   int count 0  printf   nThe unique items  n    for  int i   0  i  lt  myvar  i              if changedarr i   0               count  1              printf   d t  changedarr i                       printf   n

User · Answer

Insert all the elements in a binary tree the disregards duplicates - O nlog n    Then extract all of them back in the array by doing a traversal - O n   I am assuming that you don t need order preservation

User · Answer

An array should obviously be  traversed  right-to-left to avoid unneccessary copying of values back and forth   If you have unlimited memory  you can allocate a bit array for sizeof type-of-element-in-array    8 bytes to have each bit signify whether you ve already encountered corresponding value or not   If you don t  I can t think of anything better than traversing an array and comparing each value with values that follow it and then if duplicate is found  remove these values altogether  This is somewhere near O n 2   or O  n 2-n  2     IBM has an article on kinda close subject

User · Answer

In JAVA       Integer   arrayInteger    1 2 3 4 3 2 4 6 7 8 9 9 10        String value           for Integer i arrayInteger                if  value contains Integer toString i                 value   Integer toString i                             String   arraySplitToString   value split           Integer   arrayIntResult   new Integer arraySplitToString length       for int i   0   i  lt  arraySplitToString length   i             arrayIntResult i    Integer parseInt arraySplitToString i            output    1  2  3  4  6  7  8  9  10   hope this will help

User · Answer

Let s see    O N  pass to find min max allocate bit-array for found  O N  pass swapping duplicates to end

User · Answer

This can be done in one pass with an O N log N  algorithm and no extra storage   Proceed from element a 1  to a N   At each stage i  all of the elements to the left of a i  comprise a sorted heap of elements a 0  through a j   Meanwhile  a second index j  initially 0  keeps track of the size of the heap   Examine a i  and insert it into the heap  which now occupies elements a 0  to a j 1   As the element is inserted  if a duplicate element a k  is encountered having the same value  do not insert a i  into the heap  i e   discard it   otherwise insert it into the heap  which now grows by one element and now comprises a 0  to a j 1   and increment j   Continue in this manner  incrementing i until all of the array elements have been examined and inserted into the heap  which ends up occupying a 0  to a j   j is the index of the last element of the heap  and the heap contains only unique element values   int algorithm int   a  int n        int   i  j         for  j   0  i   1   i  lt  n   i                     Insert a i  into the heap a 0   j          if  heapInsert a  j  a i                j              return j       bool heapInsert a    int n  int val           Insert val into heap a 0   n         code omitted for brevity        if  duplicate element a k     val          return false      a k    val      return true      Looking at the example  this is not exactly what was asked for since the resulting array preserves the original element order  But if this requirement is relaxed  the algorithm above should do the trick

User · Answer

Use bloom filter for hashing  This will reduce the memory overhead very significantly

User · Answer

Some of the answers that are written here are pretty trivial  O n 2  or sorting and traversing in O NlogN   and I m assuming that is not what was expected in an interview for Microsoft  Obviously any answer above O n  wasn t what they were looking for  The update states that there shouldn t be any helper data structures so any answer that has one  a hash table  tree  bit array or whatever  shouldn t be a valid solution   If you can allocate additional memory then Jeff B s answer is probably easiest way to do it  I have a good answer for questions like these but the MAXINT needs to be bounded by the size of the array   Example  An array of size 100 may contain any number between 1 and 100  Remove the dups as the original question   The answer to this in O n  time and O 1  memory is      FLAG ALL DUPS IN THE ORIGIN ARRAY int maxNumInArray   findMaxNumInArray arr   int dup   findMinNumInArray arr  - 1  for  int i 0  i  lt  arrLength    i        int seekIndex   arr i     maxNumInArray 1       if  arr seekIndex   gt  maxNumInArray          arr i    dup     invalidate index     else         arr seekIndex    arr seekIndex    maxNumInArray        REMOVE EMPTY SPACES int i   0  int j   arrLength arr -1  while  i lt j        while  arr i     dup            i      while  arr j     dup          --j      swap arr i   arr j        If you don t know the bounds my answer isn t useful but u can try and play with it  Oh  and this specific variation wont work with negative numbers but its not a problem to fix it

[c] Algorithm: efficient way to remove duplicate integers from an array

Examples related to c

Examples related to algorithm

Examples related to arrays

Examples related to duplicates