[java] function to remove duplicate characters in a string

The following code is trying to remove any duplicate characters in a string. I'm not sure if the code is right. Can anybody help me work with the code (i.e whats actually happening when there is a match in characters)?

public static void removeDuplicates(char[] str) {
  if (str == null) return;
  int len = str.length;
  if (len < 2) return;
  int tail = 1;
  for (int i = 1; i < len; ++i) {
    int j;
    for (j = 0; j < tail; ++j) {
      if (str[i] == str[j]) break;
    }
    if (j == tail) {
      str[tail] = str[i];
      ++tail;
    }
  }
  str[tail] = 0;
}

This question is related to java string

The answer is


Using guava you can just do something like Sets.newHashSet(charArray).toArray(); If you are not using any libraries, you can still use new HashSet<Char>() and add your char array there.


Given the following question :

Write code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra copy of the array is not.

Since one or two additional variables are fine but no buffer is allowed, you can simulate the behaviour of a hashmap by using an integer to store bits instead. This simple solution runs at O(n), which is faster than yours. Also, it isn't conceptually complicated and in-place :

    public static void removeDuplicates(char[] str) {
        int map = 0;
        for (int i = 0; i < str.length; i++) {
            if ((map & (1 << (str[i] - 'a'))) > 0) // duplicate detected
                str[i] = 0;
            else // add unique char as a bit '1' to the map
                map |= 1 << (str[i] - 'a');
        }
    }

The drawback is that the duplicates (which are replaced with 0's) will not be placed at the end of the str[] array. However, this can easily be fixed by looping through the array one last time. Also, an integer has the capacity for only regular letters.


I couldn't understand the logic behind the solution so I wrote my simple solution:

  public static void removeDuplicates(char[] str) {

    if (str == null) return; //If the string is null return      
    int length = str.length; //Getting the length of the string
    if (length < 2) return; //Return if the length is 1 or smaller

    for(int i=0; i<length; i++){ //Loop through letters on the array

        int j;

        for(j=i+1;j<length;j++){ //Loop through letters after the checked letters (i) 

            if (str[j]==str[i]){ //If you find duplicates set it to 0

                str[j]=0;
            }
        }
    }
}

private static String removeDuplicateCharactersFromWord(String word) {

    String result = new String("");

    for (int i = 0; i < word.length(); i++) {
        if (!result.contains("" + word.charAt(i))) {
            result += "" + word.charAt(i);
        }
    }

    return result;
}

I understand that this is a Java question, but since I have a nice solution which could inspire someone to convert this into Java, by all means. Also I like answers where multiple language submissions are available to common problems.

So here is a Python solution which is O(n) and also supports the whole ASCII range. Of course it does not treat 'a' and 'A' as the same:

I am using 8 x 32 bits as the hashmap:

Also input is a string array using dedup(list('some string'))

def dedup(str):
    map = [0,0,0,0,0,0,0,0]
    for i in range(len(str)):
        ascii = ord(str[i])
        slot = ascii / 32
        bit = ascii % 32
        bitOn = map[slot] & (1 << bit)
        if bitOn:
            str[i] = ''
        else:
            map[slot] |= 1 << bit

    return ''.join(str)

also a more pythonian way to do this is by using a set:

def dedup(s):
    return ''.join(list(set(s)))

Substringing method. Concatenation is done with .concat() to avoid allocation additional memory for left hand and right hand of +. Note: This removes even duplicate spaces.

private static String withoutDuplicatesSubstringing(String s){

        for(int i = 0; i < s.length(); i++){
          String sub = s.substring(i+1);
          int index = -1;
          while((index = sub.toLowerCase().indexOf(Character.toLowerCase(s.charAt(i)))) > -1 && !sub.isEmpty()){
              sub = sub.substring(0, index).concat(sub.substring(index+1, sub.length()));
          }
          s = s.substring(0, i+1).concat(sub);
        }
        return s;
    }

Test case:

String testCase1 = "nanananaa! baaaaatmaan! batman!";

Output: na! btm


#include <iostream>
#include <string>
using namespace std;

int main() {
    // your code goes here
    string str;
    cin >> str;
    long map = 0;
    for(int  i =0; i < str.length() ; i++){
        if((map & (1L << str[i])) > 0){
            str[i] = 0;
        }
        else{
            map |= 1L << str[i];
        }
    }
    cout << str;
    return 0;
}

This will solve purpose in fast and simple code. It gives result in O(n).

void removeduplicate(char *str)
{
    int checker = 0;
    int cnt = 0;
    for(int i = 0; i < strlen(str); i++)
    {
        int val = *(str + i) - (int)'a';
        if ((checker & (1 << val)) > 0) continue;
        else {
            *(str + cnt) = *(str + i);
            cnt++;
        }
        checker |= (1 << val);
    }
    *(str+cnt) = '\0';
}

public String removeDuplicateChar(String nonUniqueString) {
    String uniqueString = "";
    for (char currentChar : nonUniqueString.toCharArray()) {
        if (!uniqueString.contains("" + currentChar)) {
            uniqueString += currentChar;
        }
    }

    return uniqueString;
}

I have written a piece of code to solve the problem. I have checked with certain values, got the required output.

Note: It's time consuming.

static void removeDuplicate(String s) {

    char s1[] = s.toCharArray();

    Arrays.sort(s1);                    //Sorting is performed, a to z
                //Since adjacent values are compared

    int myLength = s1.length;           //Length of the character array is stored here

    int i = 0;                          //i refers to the position of original char array
    int j = 0;          //j refers to the position of char array after skipping the duplicate values 

    while(i != myLength-1 ){

        if(s1[i]!=s1[i+1]){     //Compares two adjacent characters, if they are not the same
            s1[j] = s1[i];      //if not same, then, first adjacent character is stored in s[j]
            s1[j+1] = s1[i+1];  //Second adjacent character is stored in s[j+1]
            j++;                //j is incremented to move to next location
        }

        i++;                    //i is incremented
    }

    //the length of s is i. i>j

    String s4 = new String (s1);        //Char Array to String

    //s4[0] to s4[j+1] contains the length characters after removing the duplicate
    //s4[j+2] to s4[i] contains the last set of characters of the original char array

    System.out.println(s4.substring(0, j+1));

}

Feel free to run my code with your inputs. Thanks.


public class StringRedundantChars {
    /**
     * @param args
     */
    public static void main(String[] args) {

        //initializing the string to be sorted
        String sent = "I love painting and badminton";

        //Translating the sentence into an array of characters
        char[] chars = sent.toCharArray();

        System.out.println("Before Sorting");
        showLetters(chars);

        //Sorting the characters based on the ASCI character code. 
        java.util.Arrays.sort(chars);

        System.out.println("Post Sorting");
        showLetters(chars);

        System.out.println("Removing Duplicates");
        stripDuplicateLetters(chars);

        System.out.println("Post Removing Duplicates");
        //Sorting to collect all unique characters 
        java.util.Arrays.sort(chars);
        showLetters(chars);

    }

    /**
     * This function prints all valid characters in a given array, except empty values
     * 
     * @param chars Input set of characters to be displayed
     */
    private static void showLetters(char[] chars) {

        int i = 0;
        //The following loop is to ignore all white spaces
        while ('\0' == chars[i]) {
            i++;
        }
        for (; i < chars.length; i++) {
            System.out.print(" " + chars[i]);
        }
        System.out.println();
    }

    private static char[] stripDuplicateLetters(char[] chars) {

        // Basic cursor that is used to traverse through the unique-characters
        int cursor = 0;
        // Probe which is used to traverse the string for redundant characters
        int probe = 1;

        for (; cursor < chars.length - 1;) {

            // Checking if the cursor and probe indices contain the same
            // characters
            if (chars[cursor] == chars[probe]) {
                System.out.println("Removing char : " + chars[probe]);
                // Please feel free to replace the redundant character with
                // character. I have used '\0'
                chars[probe] = '\0';
                // Pushing the probe to the next character
                probe++;
            } else {
                // Since the probe has traversed the chars from cursor it means
                // that there were no unique characters till probe.
                // Hence set cursor to the probe value
                cursor = probe;
                // Push the probe to refer to the next character
                probe++;
            }
        }
        System.out.println();

        return chars;
    }
}

    String s = "Javajk";
    List<Character> charz = new ArrayList<Character>();
    for (Character c : s.toCharArray()) {
        if (!(charz.contains(Character.toUpperCase(c)) || charz
                .contains(Character.toLowerCase(c)))) {
            charz.add(c);
        }
    }
     ListIterator litr = charz.listIterator();
   while (litr.hasNext()) {

       Object element = litr.next();
       System.err.println(":" + element);

   }    }

this will remove the duplicate if the character present in both the case.


Your code is, I'm sorry to say, very C-like.

A Java String is not a char[]. You say you want to remove duplicates from a String, but you take a char[] instead.

Is this char[] \0-terminated? Doesn't look like it because you take the whole .length of the array. But then your algorithm tries to \0-terminate a portion of the array. What happens if the arrays contains no duplicates?

Well, as it is written, your code actually throws an ArrayIndexOutOfBoundsException on the last line! There is no room for the \0 because all slots are used up!

You can add a check not to add \0 in this exceptional case, but then how are you planning to use this code anyway? Are you planning to have a strlen-like function to find the first \0 in the array? And what happens if there isn't any? (due to all-unique exceptional case above?).

What happens if the original String/char[] contains a \0? (which is perfectly legal in Java, by the way, see JLS 10.9 An Array of Characters is Not a String)

The result will be a mess, and all because you want to do everything C-like, and in place without any additional buffer. Are you sure you really need to do this? Why not work with String, indexOf, lastIndexOf, replace, and all the higher-level API of String? Is it provably too slow, or do you only suspect that it is?

"Premature optimization is the root of all evils". I'm sorry but if you can't even understand what the original code does, then figuring out how it will fit in the bigger (and messier) system will be a nightmare.


My minimal suggestion is to do the following:

  • Make the function takes and returns a String, i.e. public static String removeDuplicates(String in)
  • Internally, works with char[] str = in.toCharArray();
  • Replace the last line by return new String(str, 0, tail);

This does use additional buffers, but at least the interface to the rest of the system is much cleaner.


Alternatively, you can use StringBuilder as such:

static String removeDuplicates(String s) {
    StringBuilder noDupes = new StringBuilder();
    for (int i = 0; i < s.length(); i++) {
        String si = s.substring(i, i + 1);
        if (noDupes.indexOf(si) == -1) {
            noDupes.append(si);
        }
    }
    return noDupes.toString();
}

Note that this is essentially the same algorithm as what you had, but much cleaner and without as many little corner cases, etc.


A O(n) solution:

import java.util.*;
import java.io.*;

public class String_Duplicate_Removal
{

public static String duplicate_removal(String s)
    {
        if(s.length()<2)
            return s;

        else if(s.length()==2)
        {
            if(s.charAt(0)==s.charAt(1))
                s = Character.toString(s.charAt(0));
            return s;
        }

        boolean [] arr = new boolean[26];

        for(int i=0;i<s.length();i++)
        {

            if(arr[s.charAt(i)-'a']==false)
                arr[s.charAt(i)-'a']=true;

            else
            {
                s= ((new StringBuilder(s)).deleteCharAt(i)).toString();
                i--;
            }
        }
        return s;
    }   

    public static void main(String [] args)
    {
        String s = "abbashbhqa";
        System.out.println(duplicate_removal(s));
    }
}

An improved version for using bitmask to handle 256 chars:

public static void removeDuplicates3(char[] str) 
{
  long map[] = new long[] {0, 0, 0 ,0};
  long one = 1;

  for (int i = 0; i < str.length; i++) 
  {
    long chBit = (one << (str[i]%64));
    int n = (int) str[i]/64;

    if ((map[n] & chBit ) > 0) // duplicate detected
        str[i] = 0;
    else // add unique char as a bit '1' to the map
        map[n] |= chBit ;
  }

  // get rid of those '\0's
  int wi = 1;
  for (int i=1; i<str.length; i++)
  {
    if (str[i]!=0) str[wi++] = str[i];
  }

  // setting the rest as '\0'
  for (;wi<str.length; wi++) str[wi] = 0;
}

Result: "##1!!ASDJasanwAaw.,;..][,[]==--0" ==> "#1!ASDJasnw.,;][=-0" (double quotes not included)


public class RemoveRepeatedCharacters {

    /**
     * This method removes duplicates in a given string in one single pass.
     * Keeping two indexes, go through all the elements and as long as subsequent characters match, keep
     * moving the indexes in opposite directions. When subsequent characters don't match, copy value at higher index
     * to (lower + 1) index.
     * Time Complexity = O(n)
     * Space  = O(1)
     * 
     */
    public static void removeDuplicateChars(String text) {

        char[] ch = text.toCharArray();
        int i = 0; //first index
        for(int j = 1; j < ch.length; j++) {
            while(i >= 0 && j < ch.length && ch[i] == ch[j]) {
                i--;
                j++;
                System.out.println("i = " + i + " j = " + j);               

            }

            if(j < ch.length) {
                ch[++i] = ch[j];
            }

        }

        //Print the final string
        for(int k = 0; k <= i; k++)
            System.out.print(ch[k]);

    }

    public static void main(String[] args) {

        String text = "abccbdeefgg";
        removeDuplicateChars(text);

    }

}

This would be much easier if you just looped through the array and added all new characters to a list, then retruned that list.

With this approach, you need to reshuffle the array as you step through it and eventually redimension it to the appropriate size in the end.


(Java) Avoiding usage of Map, List data structures:

private String getUniqueStr(String someStr) {
    StringBuilder uniqueStr = new StringBuilder();
            if(someStr != null) {
       for(int i=0; i <someStr.length(); i++)   {
        if(uniqueStr.indexOf(String.valueOf(someStr.charAt(i))) == -1)  {
            uniqueStr.append(someStr.charAt(i));
        }
       }
            }
    return uniqueStr.toString();
}

/* program to remove the duplicate character in string */
/* Author senthilkumar M*/

char *dup_remove(char *str) 
{
    int i = 0, j = 0, l = strlen(str);
    int flag = 0, result = 0;

    for(i = 0; i < l; i++) {
         result = str[i] - 'a';
         if(flag & (1 << result)) {
            */* if duplicate found remove & shift the array*/*
            for(j = i; j < l; j++) {
                  str[j] = str[j+1];
             }
             i--; 
             l--; /* duplicates removed so string length reduced by 1 character*/
             continue;
         }
         flag |= (1 << result);
     }
     return str;
}

Yet another solution, seems to be the most concise so far:

private static String removeDuplicates(String s)
    {   
        String x = new String(s);

        for(int i=0;i<x.length()-1;i++)
            x = x.substring(0,i+1) + (x.substring(i+1)).replace(String.valueOf(x.charAt(i)), "");

        return x;
    }   

char[] chars = s.toCharArray();
    HashSet<Character> charz = new HashSet<Character>();

    for(Character c : s.toCharArray() )
    {
        if(!charz.contains(c))
        {
            charz.add(c);
            //System.out.print(c);
        }
    }

    for(Character c : charz)
    {
        System.out.print(c);
    }

public class RemoveCharsFromString {

static String testcase1 = "No, I am going to Noida";
static String testcase2 = "goings";

public static void main(String args[])throws StringIndexOutOfBoundsException{
    RemoveCharsFromString testInstance= new RemoveCharsFromString();
    String result = testInstance.remove(testcase1,testcase2);
    System.out.println(result);
}

//write your code here
public String remove(String str, String str1)throws StringIndexOutOfBoundsException
    {   String result=null;


       if (str == null)
        return "";


    try
    {
     for (int i = 0; i < str1.length (); i++) 
    {


        char ch1=str1.charAt(i);
        for(int j=0;j<str.length();j++)
        {
            char ch = str.charAt (j);

        if (ch == ch1)
        {
        String s4=String.valueOf(ch);
        String s5= str.replaceAll(s4, "");
        str=s5;


        }
        }

    }
    }
    catch(Exception e)
    {

    }
    result=str;
    return result;
    }
 }

This is my solution.

The algorithm is mainly the same as the one in the book "Cracking the code interview" where this exercise comes from, but I tried to improve it a bit and make the code more understandable:

public static void removeDuplicates(char[] str) {
        // if string has less than 2 characters, it can't contain
        // duplicate values, so there's nothing to do
        if (str == null || str.length < 2) {
            return;
        }

        // variable which indicates the end of the part of the string 
        // which is 'cleaned' (all duplicates removed)
        int tail = 0;

        for (int i = 0; i < str.length; i++) {
            boolean found = false;
            
            // check if character is already present in
            // the part of the array before the current char
            for (int j = 0; j < i; j++) {
                if (str[j] == str[i]) {
                    found = true;
                    break;
                }
            }

            // if char is already present
            // skip this one and do not copy it
            if (found) {
                continue;
            }

            // copy the current char to the index 
            // after the last known unique char in the array
            str[tail] = str[i];
            tail++;
        }
                          
        str[tail] = '\0';
    }

One of the important requirements from the book is to do it in-place (as in my solution), which means that no additional data structure should be used as a helper while processing the string. This improves performance by not wasting memory unnecessarily.


This function removes duplicate from string inline. I have used C# as a coding language and the duplicates are removed inline

 public static void removeDuplicate(char[] inpStr)
        {
            if (inpStr == null) return;
            if (inpStr.Length < 2) return;

        for (int i = 0; i < inpStr.Length; ++i)
        {

            int j, k;
            for (j = 1; j < inpStr.Length; j++)
            {

                if (inpStr[i] == inpStr[j] && i != j)
                {
                    for (k = j; k < inpStr.Length - 1; k++)
                    {
                        inpStr[k] = inpStr[k + 1];
                    }
                    inpStr[k] = ' ';
                }
            }

        }


        Console.WriteLine(inpStr);

    }

This is my solution

public static String removeDup(String inputString){
    if (inputString.length()<2) return inputString;
    if (inputString==null) return null;
    char[] inputBuffer=inputString.toCharArray();
    for (int i=0;i<inputBuffer.length;i++){
        for (int j=i+1;j<inputBuffer.length;j++){
            if (inputBuffer[i]==inputBuffer[j]){
                inputBuffer[j]=0;
            }
        }
    }
    String result=new String(inputBuffer);
    return result;
}

Question: Remove Duplicate characters in a string Method 1 :(Python)

import collections

a = "GiniGinaProtijayi"

aa = collections.OrderedDict().fromkeys(a)
print(''.join(aa))

Method 2 :(Python)

a = "GiniGinaProtijayi"
list = []
aa = [ list.append(ch) for ch in a if  ch  not in list]
print( ''.join(list))

IN Java:

class test2{
    public static void main(String[] args) {

 String a = "GiniGinaProtijayi";
 List<Character> list = new ArrayList<>();

       for(int i = 0 ; i < a.length() ;i++) {
           char ch = a.charAt(i);
           if( list.size() == 0 ) {list.add(ch);}
           if(!list.contains(ch)) {list.add(ch) ;}

       }//for
       StringBuffer sbr = new StringBuffer();

      for( char ch : list) {sbr.append(ch);}
      System.out.println(sbr);

    }//main

}//end

public static void main (String [] args)
    {
        String s = "aabbbeeddsfre";//sample string
        String temp2="";//string with no duplicates
        HashMap<Integer,Character> tc = new HashMap<Integer,Character>();//create a hashmap to store the char's
        char [] charArray = s.toCharArray();
        for (Character c : charArray)//for each char
        {
            if (!tc.containsValue(c))//if the char is not already in the hashmap
                {
                    temp2=temp2+c.toString();//add the char to the output string
                    tc.put(c.hashCode(),c);//and add the char to the hashmap
                }
        }

        System.out.println(temp2);//final string
    }

instead of HashMap I think we can use Set too.


public static void main(String[] args) {

    char[] str = { 'a', 'b', 'a','b','c','e','c' };

    for (int i = 1; i < str.length; i++) {
        for (int j = 0; j < i; j++) {
            if (str[i] == str[j]) {
                str[i] = ' ';
            }
        }

    }
    System.out.println(str);
}

Well I came up with the following solution. Keeping in mind that S and s are not duplicates. Also I have just one hard coded value.. But the code works absolutely fine.

public static String removeDuplicate(String str) {

    StringBuffer rev = new StringBuffer();  
    rev.append(str.charAt(0));

    for(int i=0; i< str.length(); i++)
    {
        int flag = 0;
        for(int j=0; j < rev.length(); j++)
        {
            if(str.charAt(i) == rev.charAt(j))
            {
                flag = 0;
                break;
            }
            else
            {
                flag = 1;
            }
        }
        if(flag == 1)
        {
            rev.append(str.charAt(i));
        }
    }

    return rev.toString();
}

public class RemoveDuplicateInString {
    public static void main(String[] args) {
        String s = "ABCDDCA";
        RemoveDuplicateInString rs = new RemoveDuplicateInString();
        System.out.println(rs.removeDuplicate(s));

    }

    public String removeDuplicate(String s) {
        String retn = null;
        boolean[] b = new boolean[256];

        char[] ch = s.toCharArray();
        for (int i = 0; i < ch.length; i++) {

            if (b[ch[i]]) {
                ch[i]=' ';

            }

            else {
                b[ch[i]] = true;

            }
        }

        retn = new String(ch);
        return retn;

    }

}

package com.java.exercise;

public class RemoveCharacter {

    /**
     * @param args
     */
    public static void main(String[] args) {
        RemoveCharacter rem = new RemoveCharacter();
        char[] ch=rem.GetDuplicates("JavavNNNNNNC".toCharArray());
        char[] desiredString="JavavNNNNNNC".toCharArray();
        System.out.println(rem.RemoveDuplicates(desiredString, ch));

    }
    char[] GetDuplicates(char[] input)
    {
        int ctr=0;
        char[] charDupl=new char[20];
        for (int i = 0; i <input.length; i++)
        {
            char tem=input[i];
            for (int j= 0; j < i; j++)
            {
                if (tem == input[j])
                {
                    charDupl[ctr++] = input[j];
                }

            }
        }


        return charDupl;
    }
     public char[] RemoveDuplicates(char[] input1, char []input2)
     {

         int coutn =0;
         char[] out2 = new char[10];
         boolean flag = false;
         for (int i = 0; i < input1.length; i++)
         {
             for (int j = 0; j < input2.length; j++)
             {

                     if (input1[i] == input2[j])
                     {
                         flag = false;
                         break;
                     }
                     else
                     {
                         flag = true;
                     }

             }
             if (flag)
             {
                 out2[coutn++]=input1[i];
                 flag = false;
             }
         }
         return out2;
     }
}

I resolve a similar exercise of the book : crackring the coding interview using recursion.

package crackingcodeinterview;

public class Exercise {

static String textString = "this is a random text of example!@#$%^(^452464156";

public static void main(String[] args) {

    filterLetters(0, "");
}

public static void filterLetters(int position, String letters) {
    if (position != textString.length()) {

        boolean p = false;

        for (int i = 0; i < letters.length(); i++) {
            if (letters.charAt(i) == textString.charAt(position)) {
                p = true;
                break;
            }
        }

        if (!p) {
            letters += textString.charAt(position);
        }
        position++;
        filterLetters(position, letters);
    } else {
        System.out.println(letters);
    }
  }
}

Other solution using substring and recursion

public class MyClass {
public static void main(String args[]) {

    getUnicLetter("esta es una cadena con letras repetidas","");
}



   public static String getUnicLetter(String originalWord,String finalWord){
        if(originalWord.isEmpty()) return  null;
        System.out.print(finalWord);
        return getUnicLetter(originalWord.replace(originalWord.substring(0,1),""),finalWord.contains(originalWord.substring(0,1)) ? "" : originalWord.substring(0,1));
    }

}