[java] Finding repeated words on a string and counting the repetitions

I need to find repeated words on a string, and then count how many times they were repeated. So basically, if the input string is this:

String s = "House, House, House, Dog, Dog, Dog, Dog";

I need to create a new string list without repetitions and save somewhere else the amount of repetitions for each word, like such:

New String: "House, Dog"

New Int Array: [3, 4]

Is there a way to do this easily with Java? I've managed to separate the string using s.split() but then how do I count repetitions and eliminate them on the new string? Thanks!

This question is related to java string repeat

The answer is


//program to find number of repeating characters in a string
//Developed by Rahul Lakhmara

import java.util.*;

public class CountWordsInString {
    public static void main(String[] args) {
        String original = "I am rahul am i sunil so i can say am i";
        // making String type of array
        String[] originalSplit = original.split(" ");
        // if word has only one occurrence
        int count = 1;
        // LinkedHashMap will store the word as key and number of occurrence as
        // value
        Map<String, Integer> wordMap = new LinkedHashMap<String, Integer>();

        for (int i = 0; i < originalSplit.length - 1; i++) {
            for (int j = i + 1; j < originalSplit.length; j++) {
                if (originalSplit[i].equals(originalSplit[j])) {
                    // Increment in count, it will count how many time word
                    // occurred
                    count++;
                }
            }
            // if word is already present so we will not add in Map
            if (wordMap.containsKey(originalSplit[i])) {
                count = 1;
            } else {
                wordMap.put(originalSplit[i], count);
                count = 1;
            }
        }

        Set word = wordMap.entrySet();
        Iterator itr = word.iterator();
        while (itr.hasNext()) {
            Map.Entry map = (Map.Entry) itr.next();
            // Printing
            System.out.println(map.getKey() + " " + map.getValue());
        }
    }
}

I hope this will help you

public void countInPara(String str) {

    Map<Integer,String> strMap = new HashMap<Integer,String>();
    List<String> paraWords = Arrays.asList(str.split(" "));
    Set<String> strSet = new LinkedHashSet<>(paraWords);
    int count;

    for(String word : strSet) {
        count = Collections.frequency(paraWords, word);
        strMap.put(count, strMap.get(count)==null ? word : strMap.get(count).concat(","+word));
    }

    for(Map.Entry<Integer,String> entry : strMap.entrySet())
        System.out.println(entry.getKey() +" :: "+ entry.getValue());
}

please try these it may be help for you.

public static void main(String[] args) {
        String str1="House, House, House, Dog, Dog, Dog, Dog";
        String str2=str1.replace(",", "");
        Map<String,Integer> map=findFrquenciesInString(str2);
        Set<String> keys=map.keySet();
        Collection<Integer> vals=map.values();
        System.out.println(keys);
        System.out.println(vals);
    }

private static Map<String,Integer> findFrquenciesInString(String str1) {
        String[] strArr=str1.split(" ");
        Map<String,Integer> map=new HashMap<>();
        for(int i=0;i<strArr.length;i++) {
            int count=1;
            for(int j=i+1;j<strArr.length;j++) {
                if(strArr[i].equals(strArr[j]) && strArr[i]!="-1") {
                    strArr[j]="-1";
                    count++;
                }
            }
            if(count>1 && strArr[i]!="-1") {
                map.put(strArr[i], count);
                strArr[i]="-1";
            }
        }
        return map;
    }

As mentioned by others use String::split(), followed by some map (hashmap or linkedhashmap) and then merge your result. For completeness sake putting the code.

import java.util.*;

public class Genric<E>
{
    public static void main(String[] args) 
    {
        Map<String, Integer> unique = new LinkedHashMap<String, Integer>();
        for (String string : "House, House, House, Dog, Dog, Dog, Dog".split(", ")) {
            if(unique.get(string) == null)
                unique.put(string, 1);
            else
                unique.put(string, unique.get(string) + 1);
        }
        String uniqueString = join(unique.keySet(), ", ");
        List<Integer> value = new ArrayList<Integer>(unique.values());

        System.out.println("Output = " + uniqueString);
        System.out.println("Values = " + value);

    }

    public static String join(Collection<String> s, String delimiter) {
        StringBuffer buffer = new StringBuffer();
        Iterator<String> iter = s.iterator();
        while (iter.hasNext()) {
            buffer.append(iter.next());
            if (iter.hasNext()) {
                buffer.append(delimiter);
            }
        }
        return buffer.toString();
    }
}

New String is Output = House, Dog

Int array (or rather list) Values = [3, 4] (you can use List::toArray) for getting an array.


It may help you somehow.

String st="I am am not the one who is thinking I one thing at time";
String []ar = st.split("\\s");
Map<String, Integer> mp= new HashMap<String, Integer>();
int count=0;

for(int i=0;i<ar.length;i++){
    count=0;

    for(int j=0;j<ar.length;j++){
        if(ar[i].equals(ar[j])){
        count++;                
        }
    }

    mp.put(ar[i], count);
}

System.out.println(mp);

If you pass a String argument it will count the repetition of each word

/**
 * @param string
 * @return map which contain the word and value as the no of repatation
 */
public Map findDuplicateString(String str) {
    String[] stringArrays = str.split(" ");
    Map<String, Integer> map = new HashMap<String, Integer>();
    Set<String> words = new HashSet<String>(Arrays.asList(stringArrays));
    int count = 0;
    for (String word : words) {
        for (String temp : stringArrays) {
            if (word.equals(temp)) {
                ++count;
            }
        }
        map.put(word, count);
        count = 0;
    }

    return map;

}

output:

 Word1=2, word2=4, word2=1,. . .

public static void main(String[] args) {
    String s="sdf sdfsdfsd sdfsdfsd sdfsdfsd sdf sdf sdf ";
    String st[]=s.split(" ");
    System.out.println(st.length);
    Map<String, Integer> mp= new TreeMap<String, Integer>();
    for(int i=0;i<st.length;i++){

        Integer count=mp.get(st[i]);
        if(count == null){
            count=0;
        }           
        mp.put(st[i],++count);
    }
   System.out.println(mp.size());
   System.out.println(mp.get("sdfsdfsd"));


}

If this is a homework, then all I can say is: use String.split() and HashMap<String,Integer>.

(I see you've found split() already. You're along the right lines then.)


as introduction of stream has changed the way we code; i would like to add some of the ways of doing this using it

    String[] strArray = str.split(" ");
    
    //1. All string value with their occurrences
    Map<String, Long> counterMap = 
            Arrays.stream(strArray).collect(Collectors.groupingBy(e->e, Collectors.counting()));

    //2. only duplicating Strings
    Map<String, Long> temp = counterMap.entrySet().stream().filter(map->map.getValue() > 1).collect(Collectors.toMap(map -> map.getKey(), map -> map.getValue()));
    System.out.println("test : "+temp);
    
    //3. List of Duplicating Strings
    List<String> masterStrings = Arrays.asList(strArray);
    Set<String> duplicatingStrings = 
            masterStrings.stream().filter(i -> Collections.frequency(masterStrings, i) > 1).collect(Collectors.toSet());

You can use Prefix tree (trie) data structure to store words and keep track of count of words within Prefix Tree Node.

  #define  ALPHABET_SIZE 26
  // Structure of each node of prefix tree
  struct prefix_tree_node {
    prefix_tree_node() : count(0) {}
    int count;
    prefix_tree_node *child[ALPHABET_SIZE];
  };
  void insert_string_in_prefix_tree(string word)
  {
    prefix_tree_node *current = root;
    for(unsigned int i=0;i<word.size();++i){
      // Assuming it has only alphabetic lowercase characters
            // Note ::::: Change this check or convert into lower case
    const unsigned int letter = static_cast<int>(word[i] - 'a');

      // Invalid alphabetic character, then continue
      // Note :::: Change this condition depending on the scenario
      if(letter > 26)
        throw runtime_error("Invalid alphabetic character");

      if(current->child[letter] == NULL)
        current->child[letter] = new prefix_tree_node();

      current = current->child[letter];
    }
  current->count++;
  // Insert this string into Max Heap and sort them by counts
}

    // Data structure for storing in Heap will be something like this
    struct MaxHeapNode {
       int count;
       string word;
    };

After inserting all words, you have to print word and count by iterating Maxheap.


    public static void main(String[] args){
    String string = "elamparuthi, elam, elamparuthi";
    String[] s = string.replace(" ", "").split(",");
    String[] op;
    String ops = "";

    for(int i=0; i<=s.length-1; i++){
        if(!ops.contains(s[i]+"")){
            if(ops != "")ops+=", "; 
            ops+=s[i];
        }

    }
    System.out.println(ops);
}

public class Counter {

private static final int COMMA_AND_SPACE_PLACE = 2;

private String mTextToCount;
private ArrayList<String> mSeparateWordsList;

public Counter(String mTextToCount) {
    this.mTextToCount = mTextToCount;

    mSeparateWordsList = cutStringIntoSeparateWords(mTextToCount);
}

private ArrayList<String> cutStringIntoSeparateWords(String text)
{
    ArrayList<String> returnedArrayList = new ArrayList<>();


    if(text.indexOf(',') == -1)
    {
        returnedArrayList.add(text);
        return returnedArrayList;
    }

    int position1 = 0;
    int position2 = 0;

    while(position2 < text.length())
    {
        char c = ',';
        if(text.toCharArray()[position2] == c)
        {
            String tmp = text.substring(position1, position2);
            position1 += tmp.length() + COMMA_AND_SPACE_PLACE;
            returnedArrayList.add(tmp);
        }
        position2++;
    }

    if(position1 < position2)
    {
        returnedArrayList.add(text.substring(position1, position2));
    }

    return returnedArrayList;
}

public int[] countWords()
{
    if(mSeparateWordsList == null) return null;


    HashMap<String, Integer> wordsMap = new HashMap<>();

    for(String s: mSeparateWordsList)
    {
        int cnt;

        if(wordsMap.containsKey(s))
        {
            cnt = wordsMap.get(s);
            cnt++;
        } else {
            cnt = 1;
        }
        wordsMap.put(s, cnt);
    }                
    return printCounterResults(wordsMap);
}

private int[] printCounterResults(HashMap<String, Integer> m)
{        
    int index = 0;
    int[] returnedIntArray = new int[m.size()];

    for(int i: m.values())
    {
        returnedIntArray[index] = i;
        index++;
    }

    return returnedIntArray;

}

}


public class StringsCount{

    public static void main(String args[]) {

        String value = "This is testing Program testing Program";

        String item[] = value.split(" ");

        HashMap<String, Integer> map = new HashMap<>();

        for (String t : item) {
            if (map.containsKey(t)) {
                map.put(t, map.get(t) + 1);

            } else {
                map.put(t, 1);
            }
        }
        Set<String> keys = map.keySet();
        for (String key : keys) {
            System.out.println(key);
            System.out.println(map.get(key));
        }

    }
}

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class DuplicateWord {

    public static void main(String[] args) {
        String para = "this is what it is this is what it can be";
        List < String > paraList = new ArrayList < String > ();
        paraList = Arrays.asList(para.split(" "));
        System.out.println(paraList);
        int size = paraList.size();

        int i = 0;
        Map < String, Integer > duplicatCountMap = new HashMap < String, Integer > ();
        for (int j = 0; size > j; j++) {
            int count = 0;
            for (i = 0; size > i; i++) {
                if (paraList.get(j).equals(paraList.get(i))) {
                    count++;
                    duplicatCountMap.put(paraList.get(j), count);
                }

            }

        }
        System.out.println(duplicatCountMap);
        List < Integer > myCountList = new ArrayList < > ();
        Set < String > myValueSet = new HashSet < > ();
        for (Map.Entry < String, Integer > entry: duplicatCountMap.entrySet()) {
            myCountList.add(entry.getValue());
            myValueSet.add(entry.getKey());
        }
        System.out.println(myCountList);
        System.out.println(myValueSet);
    }

}

Input: this is what it is this is what it can be

Output:

[this, is, what, it, is, this, is, what, it, can, be]

{can=1, what=2, be=1, this=2, is=3, it=2}

[1, 2, 1, 2, 3, 2]

[can, what, be, this, is, it]


package day2;

import java.util.ArrayList;
import java.util.HashMap;`enter code here`
import java.util.List;

public class DuplicateWords {

    public static void main(String[] args) {
        String S1 = "House, House, House, Dog, Dog, Dog, Dog";
        String S2 = S1.toLowerCase();
        String[] S3 = S2.split("\\s");

        List<String> a1 = new ArrayList<String>();
        HashMap<String, Integer> hm = new HashMap<>();

        for (int i = 0; i < S3.length - 1; i++) {

            if(!a1.contains(S3[i]))
            {
                a1.add(S3[i]);
            }
            else
            {
                continue;
            }

            int Count = 0;

            for (int j = 0; j < S3.length - 1; j++)
            {
                if(S3[j].equals(S3[i]))
                {
                    Count++;
                }
            }

            hm.put(S3[i], Count);
        }

        System.out.println("Duplicate Words and their number of occurrences in String S1 : " + hm);
    }
}

Try this,

public class DuplicateWordSearcher {
@SuppressWarnings("unchecked")
public static void main(String[] args) {

    String text = "a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f";

    List<String> list = Arrays.asList(text.split(" "));

    Set<String> uniqueWords = new HashSet<String>(list);
    for (String word : uniqueWords) {
        System.out.println(word + ": " + Collections.frequency(list, word));
    }
}

}


Using Java 8 streams collectors:

public static Map<String, Integer> countRepetitions(String str) {
    return Arrays.stream(str.split(", "))
        .collect(Collectors.toMap(s -> s, s -> 1, (a, b) -> a + 1));
}

Input: "House, House, House, Dog, Dog, Dog, Dog, Cat"

Output: {Cat=1, House=3, Dog=4}


Use Function.identity() inside Collectors.groupingBy and store everything in a MAP.

String a  = "Gini Gina Gina Gina Gina Protijayi Protijayi "; 
        Map<String, Long> map11 = Arrays.stream(a.split(" ")).collect(Collectors
                .groupingBy(Function.identity(),Collectors.counting()));
        System.out.println(map11);

// output => {Gina=4, Gini=1, Protijayi=2}

In Python we can use collections.Counter()

a = "Roopa Roopi  loves green color Roopa Roopi"
words = a.split()

wordsCount = collections.Counter(words)
for word,count in sorted(wordsCount.items()):
    print('"%s" is repeated %d time%s.' % (word,count,"s" if count > 1 else "" ))

Output :

"Roopa" is repeated 2 times. "Roopi" is repeated 2 times. "color" is repeated 1 time. "green" is repeated 1 time. "loves" is repeated 1 time.


Please use the below code. It is the most simplest as per my analysis. Hope you will like it:

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Scanner;
import java.util.Set;

public class MostRepeatingWord {

    String mostRepeatedWord(String s){
        String[] splitted = s.split(" ");
        List<String> listString = Arrays.asList(splitted);
        Set<String> setString = new HashSet<String>(listString);
        int count = 0;
        int maxCount = 1;
        String maxRepeated = null;
        for(String inp: setString){
            count = Collections.frequency(listString, inp);
            if(count > maxCount){
                maxCount = count;
                maxRepeated = inp;
            }
        }
        return maxRepeated;
    }
    public static void main(String[] args) 
    {       
        System.out.println("Enter The Sentence: ");
        Scanner s = new Scanner(System.in);
        String input = s.nextLine();
        MostRepeatingWord mrw = new MostRepeatingWord();
        System.out.println("Most repeated word is: " + mrw.mostRepeatedWord(input));

    }
}

import java.util.HashMap;
import java.util.Scanner;
public class class1 {
public static void main(String[] args) {
    Scanner in = new Scanner(System.in);
    String inpStr = in.nextLine();
    int key;

    HashMap<String,Integer> hm = new HashMap<String,Integer>();
    String[] strArr = inpStr.split(" ");

    for(int i=0;i<strArr.length;i++){
        if(hm.containsKey(strArr[i])){
            key = hm.get(strArr[i]);
            hm.put(strArr[i],key+1);

        }
        else{
            hm.put(strArr[i],1);
        }   
    }
    System.out.println(hm);
}

}


Hope this helps :

public static int countOfStringInAText(String stringToBeSearched, String masterString){

    int count = 0;
    while (masterString.indexOf(stringToBeSearched)>=0){
      count = count + 1;
      masterString = masterString.substring(masterString.indexOf(stringToBeSearched)+1);
    }
    return count;
}

package string;

import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class DublicatewordinanArray {
public static void main(String[] args) {
String str = "This is Dileep Dileep Kumar Verma Verma";
DuplicateString(str);
    }
public static void DuplicateString(String str) {
String word[] = str.split(" ");
Map < String, Integer > map = new HashMap < String, Integer > ();
for (String w: word)
if (!map.containsKey(w)) {
map.put(w, 1);
    }
else {
map.put(w, map.get(w) + 1);
        }
Set < Map.Entry < String, Integer >> entrySet = map.entrySet();
 for (Map.Entry < String, Integer > entry: entrySet)
if (entry.getValue() > 1) {
 System.out.printf("%s : %d %n", entry.getKey(), entry.getValue());
}
 }
}

Once you have got the words from the string it is easy. From Java 10 onwards you can try the following code:

import java.util.Arrays;
import java.util.stream.Collectors;

public class StringFrequencyMap {
    public static void main(String... args) {
        String[] wordArray = {"House", "House", "House", "Dog", "Dog", "Dog", "Dog"};
        var freq = Arrays.stream(wordArray)
                         .collect(Collectors.groupingBy(x -> x, Collectors.counting()));
        System.out.println(freq);
    }
}

Output:

{House=3, Dog=4}

Using java8

private static void findWords(String s, List<String> output, List<Integer> count){
    String[] words = s.split(", ");
    Map<String, Integer> map = new LinkedHashMap<>();
    Arrays.stream(words).forEach(e->map.put(e, map.getOrDefault(e, 0) + 1));
    map.forEach((k,v)->{
        output.add(k);
        count.add(v);
    });
}

Also, use a LinkedHashMap if you want to preserve the order of insertion

private static void findWords(){
    String s = "House, House, House, Dog, Dog, Dog, Dog";
    List<String> output = new ArrayList<>();
    List<Integer> count = new ArrayList<>();
    findWords(s, output, count);
    System.out.println(output);
    System.out.println(count);
}

Output

[House, Dog]
[3, 4]

import java.util.HashMap;
import java.util.LinkedHashMap;

public class CountRepeatedWords {

    public static void main(String[] args) {
          countRepeatedWords("Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead.");
    }

    public static void countRepeatedWords(String wordToFind) {
        String[] words = wordToFind.split(" ");
        HashMap<String, Integer> wordMap = new LinkedHashMap<String, Integer>();

        for (String word : words) {
            wordMap.put(word,
                (wordMap.get(word) == null ? 1 : (wordMap.get(word) + 1)));
        }

            System.out.println(wordMap);
    }

}

//program to find number of repeating characters in a string
//Developed by Subash<[email protected]>


import java.util.Scanner;

public class NoOfRepeatedChar

{

   public static void main(String []args)

   {

//input through key board

Scanner sc = new Scanner(System.in);

System.out.println("Enter a string :");

String s1= sc.nextLine();


    //formatting String to char array

    String s2=s1.replace(" ","");
    char [] ch=s2.toCharArray();

    int counter=0;

    //for-loop tocompare first character with the whole character array

    for(int i=0;i<ch.length;i++)
    {
        int count=0;

        for(int j=0;j<ch.length;j++)
        {
             if(ch[i]==ch[j])
                count++; //if character is matching with others
        }
        if(count>1)
        {
            boolean flag=false;

            //for-loop to check whether the character is already refferenced or not 
            for (int k=i-1;k>=0 ;k-- )
            {
                if(ch[i] == ch[k] ) //if the character is already refferenced
                    flag=true;
            }
            if( !flag ) //if(flag==false) 
                counter=counter+1;
        }
    }
    if(counter > 0) //if there is/are any repeating characters
            System.out.println("Number of repeating charcters in the given string is/are " +counter);
    else
            System.out.println("Sorry there is/are no repeating charcters in the given string");
    }
}

For Strings with no space, we can use the below mentioned code

private static void findRecurrence(String input) {
    final Map<String, Integer> map = new LinkedHashMap<>();
    for(int i=0; i<input.length(); ) {
        int pointer = i;
        int startPointer = i;
        boolean pointerHasIncreased = false;
        for(int j=0; j<startPointer; j++){
            if(pointer<input.length() && input.charAt(j)==input.charAt(pointer) && input.charAt(j)!=32){
                pointer++;
                pointerHasIncreased = true;
            }else{
                if(pointerHasIncreased){
                    break;
                }
            }
        }
        if(pointer - startPointer >= 2) {
            String word = input.substring(startPointer, pointer);
            if(map.containsKey(word)){
                map.put(word, map.get(word)+1);
            }else{
                map.put(word, 1);
            }
            i=pointer;
        }else{
            i++;
        }
    }
    for(Map.Entry<String, Integer> entry : map.entrySet()){
        System.out.println(entry.getKey() + " = " + (entry.getValue()+1));
    }
}

Passing some input as "hahaha" or "ba na na" or "xxxyyyzzzxxxzzz" give the desired output.


/*count no of Word in String using TreeMap we can use HashMap also but word will not display in sorted order */

import java.util.*;

public class Genric3
{
    public static void main(String[] args) 
    {
        Map<String, Integer> unique = new TreeMap<String, Integer>();
        String string1="Ram:Ram: Dog: Dog: Dog: Dog:leela:leela:house:house:shayam";
        String string2[]=string1.split(":");

        for (int i=0; i<string2.length; i++)
        {
            String string=string2[i];
            unique.put(string,(unique.get(string) == null?1:(unique.get(string)+1)));
        }

        System.out.println(unique);
    }
}