[regex] How to match "anything up until this sequence of characters" in a regular expression?

Take this regular expression: /^[^abc]/. This will match any single character at the beginning of a string, except a, b, or c.

If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the result, until it meets either an a, or b, or c.

For example, with the source string "qwerty qwerty whatever abc hello", the expression will match up to "qwerty qwerty wh".

But what if I wanted the matching string to be "qwerty qwerty whatever "

...In other words, how can I match everything up to (but not including) the exact sequence "abc"?

This question is related to regex

The answer is


The $ marks the end of a string, so something like this should work: [[^abc]*]$ where you're looking for anything NOT ENDING in any iteration of abc, but it would have to be at the end

Also if you're using a scripting language with regex (like php or js), they have a search function that stops when it first encounters a pattern (and you can specify start from the left or start from the right, or with php, you can do an implode to mirror the string).


If you're looking to capture everything up to "abc":

/^(.*?)abc/

Explanation:

( ) capture the expression inside the parentheses for access using $1, $2, etc.

^ match start of line

.* match anything, ? non-greedily (match the minimum number of characters required) - [1]

[1] The reason why this is needed is that otherwise, in the following string:

whatever whatever something abc something abc

by default, regexes are greedy, meaning it will match as much as possible. Therefore /^.*abc/ would match "whatever whatever something abc something ". Adding the non-greedy quantifier ? makes the regex only match "whatever whatever something ".


This will make sense about regex.

  1. The exact word can be get from the following regex command:

("(.*?)")/g

Here, we can get the exact word globally which is belonging inside the double quotes. For Example, If our search text is,

This is the example for "double quoted" words

then we will get "double quoted" from that sentence.


On python:

.+?(?=abc) works for the single line case.

[^]+?(?=abc) does not work, since python doesn't recognize [^] as valid regex. To make multiline matching work, you'll need to use the re.DOTALL option, for example:

re.findall('.+?(?=abc)', data, re.DOTALL)

As @Jared Ng and @Issun pointed out, the key to solve this kind of RegEx like "matching everything up to a certain word or substring" or "matching everything after a certain word or substring" is called "lookaround" zero-length assertions. Read more about them here.

In your particular case, it can be solved by a positive look ahead: .+?(?=abc)

A picture is worth a thousand words. See the detail explanation in the screenshot.

Regex101 Screenshot


What you need is look around assertion like .+? (?=abc).

See: Lookahead and Lookbehind Zero-Length Assertions

Be aware that [abc] isn't the same as abc. Inside brackets it's not a string - each character is just one of the possibilities. Outside the brackets it becomes the string.


try this

.+?efg

Query :

select REGEXP_REPLACE ('abcdefghijklmn','.+?efg', '') FROM dual;

output :

hijklmn

I ended in this stackoverflow question after looking for help to solve my problem but found no solution to it :(

So I had to improvise... after some time I managed to reach the regex I needed:

enter image description here

As you can see, I needed up to one folder ahead of "grp-bps" folder, without including last dash. And it was required to have at least one folder after "grp-bps" folder.

Edit

Text version for copy-paste (change 'grp-bps' for your text):

.*\/grp-bps\/[^\/]+

For regex in Java, and I believe also in most regex engines, if you want to include the last part this will work:

.+?(abc)

For example, in this line:

I have this very nice senabctence

select all characters until "abc" and also include abc

using our regex, the result will be: I have this very nice senabc

Test this out: https://regex101.com/r/mX51ru/1


I believe you need subexpressions. If I remember right you can use the normal () brackets for subexpressions.

This part is From grep manual:

 Back References and Subexpressions
       The back-reference \n, where n is a single digit, matches the substring
       previously matched  by  the  nth  parenthesized  subexpression  of  the
       regular expression.

Do something like ^[^(abc)] should do the trick.