[java] How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:

    String[] words = instring.split("\\s+");
    for (int i = 0; i < words.length; i++) {
        words[i] = words[i].toLowerCase();
    }
    String[] wordsout = new String[50];
    Arrays.fill(wordsout,"");
    int e = 0;
    for (int i = 0; i < words.length; i++) {
        if (words[i] != "") {
            wordsout[e] = words[e];
            wordsout[e] = wordsout[e].replaceAll(" ", "");
            e++;
        }
    }
    return wordsout;

I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.

This question is related to java regex string formatting

The answer is


If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:

public String modified(final String input){
    final StringBuilder builder = new StringBuilder();
    for(final char c : input.toCharArray())
        if(Character.isLetterOrDigit(c))
            builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
    return builder.toString();
}

It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.


I don't like to use regex, so here is another simple solution.

public String removePunctuations(String s) {
    String res = "";
    for (Character c : s.toCharArray()) {
        if(Character.isLetterOrDigit(c))
            res += c;
    }
    return res;
}

Note: This will include both Letters and Digits


You can use following regular expression construct

Punctuation: One of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~

inputString.replaceAll("\\p{Punct}", "");

You may try this:-

Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);

[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to formatting

How to add empty spaces into MD markdown readme on GitHub? VBA: Convert Text to Number How to change indentation in Visual Studio Code? How do you change the formatting options in Visual Studio Code? (Excel) Conditional Formatting based on Adjacent Cell Value 80-characters / right margin line in Sublime Text 3 Format certain floating dataframe columns into percentage in pandas Format JavaScript date as yyyy-mm-dd AngularJS format JSON string output converting multiple columns from character to numeric format in r