[regex] Regex to match only uppercase "words" with some exceptions

I have technical strings as the following:

"The thing P1 must connect to the J236 thing in the Foo position."

I would like to match with a regular expression those only-in-uppercase words (namely here P1 and J236). The problem is that I don't want to match the first letter of the sentence when it is a one-letter word.

Example, in:

"A thing P1 must connect ..." 

I want P1 only, not A and P1. By doing that, I know that I can miss a real "word" (like in "X must connect to Y") but I can live with it.

Additionally, I don't want to match uppercase words if the sentence is all uppercase.

Example:

"THING P1 MUST CONNECT TO X2."

Of course, ideally, I would like to match the technical words P1 and X2 here but since they are "hidden" in the all-uppercase sentence and since these technical words have no specific pattern, it's impossible. Again I can live with it because all-uppercase sentences are not so frequent in my files.

Thanks!

This question is related to regex match uppercase

The answer is


Maybe you can run this regex first to see if the line is all caps:

^[A-Z \d\W]+$

That will match only if it's a line like THING P1 MUST CONNECT TO X2.

Otherwise, you should be able to pull out the individual uppercase phrases with this:

[A-Z][A-Z\d]+

That should match "P1" and "J236" in The thing P1 must connect to the J236 thing in the Foo position.


Don't do things like [A-Z] or [0-9]. Do \p{Lu} and \d instead. Of course, this is valid for perl based regex flavours. This includes java.

I would suggest that you don't make some huge regex. First split the text in sentences. then tokenize it (split into words). Use a regex to check each token/word. Skip the first token from sentence. Check if all tokens are uppercase beforehand and skip the whole sentence if so, or alter the regex in this case.


Why do you need to do this in one monster-regex? You can use actual code to implement some of these rules, and doing so would be much easier to modify if those requirements change later.

For example:

if(/^[A-Z0-9\s]*$/)
    # sentence is all uppercase, so just fail out
    return 0;

# Carry on with matching uppercase terms

For the first case you propose you can use: '[[:blank:]]+[A-Z0-9]+[[:blank:]]+', for example:

echo "The thing P1 must connect to the J236 thing in the Foo position" | grep -oE '[[:blank:]]+[A-Z0-9]+[[:blank:]]+'

In the second case maybe you need to use something else and not a regex, maybe a script with a dictionary of technical words...

Cheers, Fernando


I'm not a regex guru by any means. But try:

<[A-Z0-9][A-Z0-9]+>

<           start of word
[A-Z0-9]    one character
[A-Z0-9]+   and one or more of them
>           end of word

I won't try for the bonus points of the whole upper case sentence. hehe


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to match

How to test if a string contains one of the substrings in a list, in pandas? Delete rows containing specific strings in R Jquery Value match Regex How to compare two columns in Excel and if match, then copy the cell next to it Search File And Find Exact Match And Print Line? How to test multiple variables against a value? Selecting data frame rows based on partial string match in a column how to check if string contains '+' character PHP compare two arrays and get the matched values not the difference Check whether a string matches a regex in JS

Examples related to uppercase

Capitalize the first letter of string in AngularJs Ignoring upper case and lower case in Java Convert from lowercase to uppercase all values in all character variables in dataframe In Android EditText, how to force writing uppercase? how to convert Lower case letters to upper case letters & and upper case letters to lower case letters How to convert a string from uppercase to lowercase in Bash? How to change a string into uppercase Java Program to test if a character is uppercase/lowercase/number/vowel How do I lowercase a string in Python? Capitalize or change case of an NSString in Objective-C