[regex] Using regular expressions to do mass replace in Notepad++ and Vim

So I've got a big text file which looks like the following:

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D

It's several hundred lines long and I really don't want to do it manually. The expression that I'm trying to use is:

<option value='.{1,}' >

Which is working as intended when i run it through several online regular expression testers. I basically want to remove everything before A, B, C, etc. The issue is when I try to use that expression in Vim and Notepad++, it can't seem to find anything.

This question is related to regex vim notepad++

The answer is


There is a very simple solution to this unless I have not understood the problem. The following regular expression:

(.*)(>)(.*)

will match the pattern specified in your post.

So, in notepad++ you find (.*)(>)(.*) and replace it with \3.

The regular expressions are basically greedy in the sense that if you specify (.*) it will match the whole line and what you want to do is break it down somehow so that you can extract the string you want to keep. Here, I have done exactly the same and it works fine in Notepad++ and Editplus3.


Very simple just Find:

<option value value=.*?>

and Click Replace


Notepad ++ : Search Mode = Regular expression

Find what: (.*>)(.)

Replace with: \2


It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:

%s/^<[^>]+>//

Notepad ++ : Search Mode = Regular expression

Find what: (.*>)(.)

Replace with: \2


Here's a nice article on Notepad++ Regular expressions
http://markantoniou.blogspot.com/2008/06/notepad-how-to-use-regular-expressions.html


Vim:

:%s/.* >//


Having the same problem (with jQuery " done..." strings), but only in Notepad++, I asked, received good friendly replies (that made me understand what I had missed), then spent the time to build a detailed step-by-step explanation, see Finding Line Beginning using Regular expression in Notepad++

Versailles, Tue 27 Apr 2010 22:53:25 +0200


In Notepad++ you don't need to use Regular Expressions for this.

Hold down alt to allow you to select a rectangle of text across multiple rows at once. Select the chunk you want to be rid of, and press delete.


It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:

%s/^<[^>]+>//

This will remove the option tag and just leave the letters in vim:

:%s/<option.*>//g

This will remove the option tag and just leave the letters in vim:

:%s/<option.*>//g

It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:

%s/^<[^>]+>//

This will work. Tested it in my vim. the single quotes are the trouble.

1,$s/^<option value value=['].['] >/

Vim:

:%s/.* >//


In vim

:%s/<option value='.\{1,}' >//

or

:%s/<option value='.\+' >//

In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses, the bounded number curly braces and some others.

See :help /magic to see which special characters need to be escaped (and how to change that).


There are two problems with your original solution. Firstly, your example text:

<option value value='1' >A

has two occurences of the "value" word. Your regex does not. Also, you need to escape the opening brace in the quantifier of your regex or Vim will interpret it as a literal brace. This regex works:

:%s/<option value value='.\{1,}' >//g

In Notepad++ you don't need to use Regular Expressions for this.

Hold down alt to allow you to select a rectangle of text across multiple rows at once. Select the chunk you want to be rid of, and press delete.


Very simple just Find:

<option value value=.*?>

and Click Replace


In vim

:%s/<option value='.\{1,}' >//

or

:%s/<option value='.\+' >//

In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses, the bounded number curly braces and some others.

See :help /magic to see which special characters need to be escaped (and how to change that).


A little after the fact, but in case its useful to anyone, I was able to follow one of the examples on here (by sdgfsdg) and quickly pick up Regular Expressions for Notepad++.

I had to similarly pull out some redundant data from a list of HTML select dropdown options, of the form:

<select>
  <option value="AC">saint_helena">Ascension Island</option>
  <option value="AD">andorra">Andorra</option>
  <option value="AE">united_arab_emirates">United Arab Emirates</option>
  <option value="AF">afghanistan">Afghanistan</option>:
  ...
</select>

And what I really wanted was:

<select>
  <option value="AC">Ascension Island</option>
  <option value="AD">Andorra</option>
  <option value="AE">United Arab Emirates</option>
  <option value="AF">Afghanistan</option>
  ...
</select>

After some hair-pulling I realized that as of version 5.8.5 (Sep. 2010) the Regular Expressions still don't seem to allow certain loops in the expressions (unless there is another syntax), for example, the following would find even ">united_arab_emirated_emirates"> despite its additional separating underscores:

(">)([a-z]+([_]*[a-z]*)*)(">)

This query worked in most generic RegEx tools but while within Notepad++, I had to account for the maximum number of nested underscores (which unfortunately was 8) by hand, using the much uglier:

(">)([a-z]+[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*)[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*(">)

If someone knows a way to simulate a Regex loop in Notepad++'s replace feature, please let me know.


Find what: *(">)([a-z]+[_][a-z][_][a-z][_][a-z][_][a-z])[_][a-z][_][a-z][_][a-z][_][a-z](">)*


Replace with: ">


Result: 255 occurrences were replaced.


This will remove the option tag and just leave the letters in vim:

:%s/<option.*>//g

There is a very simple solution to this unless I have not understood the problem. The following regular expression:

(.*)(>)(.*)

will match the pattern specified in your post.

So, in notepad++ you find (.*)(>)(.*) and replace it with \3.

The regular expressions are basically greedy in the sense that if you specify (.*) it will match the whole line and what you want to do is break it down somehow so that you can extract the string you want to keep. Here, I have done exactly the same and it works fine in Notepad++ and Editplus3.


This will work. Tested it in my vim. the single quotes are the trouble.

1,$s/^<option value value=['].['] >/

There are two problems with your original solution. Firstly, your example text:

<option value value='1' >A

has two occurences of the "value" word. Your regex does not. Also, you need to escape the opening brace in the quantifier of your regex or Vim will interpret it as a literal brace. This regex works:

:%s/<option value value='.\{1,}' >//g

In vim

:%s/<option value='.\{1,}' >//

or

:%s/<option value='.\+' >//

In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses, the bounded number curly braces and some others.

See :help /magic to see which special characters need to be escaped (and how to change that).


In Notepad++ :

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D


Find what: (.*)(>)(.)
Replace with: \3

Replace All


A
B
C
D

It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:

%s/^<[^>]+>//

In vim

:%s/<option value='.\{1,}' >//

or

:%s/<option value='.\+' >//

In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses, the bounded number curly braces and some others.

See :help /magic to see which special characters need to be escaped (and how to change that).


In Notepad++ :

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D


Find what: (.*)(>)(.)
Replace with: \3

Replace All


A
B
C
D

This will remove the option tag and just leave the letters in vim:

:%s/<option.*>//g

A little after the fact, but in case its useful to anyone, I was able to follow one of the examples on here (by sdgfsdg) and quickly pick up Regular Expressions for Notepad++.

I had to similarly pull out some redundant data from a list of HTML select dropdown options, of the form:

<select>
  <option value="AC">saint_helena">Ascension Island</option>
  <option value="AD">andorra">Andorra</option>
  <option value="AE">united_arab_emirates">United Arab Emirates</option>
  <option value="AF">afghanistan">Afghanistan</option>:
  ...
</select>

And what I really wanted was:

<select>
  <option value="AC">Ascension Island</option>
  <option value="AD">Andorra</option>
  <option value="AE">United Arab Emirates</option>
  <option value="AF">Afghanistan</option>
  ...
</select>

After some hair-pulling I realized that as of version 5.8.5 (Sep. 2010) the Regular Expressions still don't seem to allow certain loops in the expressions (unless there is another syntax), for example, the following would find even ">united_arab_emirated_emirates"> despite its additional separating underscores:

(">)([a-z]+([_]*[a-z]*)*)(">)

This query worked in most generic RegEx tools but while within Notepad++, I had to account for the maximum number of nested underscores (which unfortunately was 8) by hand, using the much uglier:

(">)([a-z]+[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*)[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*(">)

If someone knows a way to simulate a Regex loop in Notepad++'s replace feature, please let me know.


Find what: *(">)([a-z]+[_][a-z][_][a-z][_][a-z][_][a-z])[_][a-z][_][a-z][_][a-z][_][a-z](">)*


Replace with: ">


Result: 255 occurrences were replaced.


In notepad++

Search

(<option value="\w\w">)\w+">(.+)

Replace with

\1\2

Having the same problem (with jQuery " done..." strings), but only in Notepad++, I asked, received good friendly replies (that made me understand what I had missed), then spent the time to build a detailed step-by-step explanation, see Finding Line Beginning using Regular expression in Notepad++

Versailles, Tue 27 Apr 2010 22:53:25 +0200


In Notepad++ :

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D


Find what: (.*)(>)(.)
Replace with: \3

Replace All


A
B
C
D

Here's a nice article on Notepad++ Regular expressions
http://markantoniou.blogspot.com/2008/06/notepad-how-to-use-regular-expressions.html


There is a very simple solution to this unless I have not understood the problem. The following regular expression:

(.*)(>)(.*)

will match the pattern specified in your post.

So, in notepad++ you find (.*)(>)(.*) and replace it with \3.

The regular expressions are basically greedy in the sense that if you specify (.*) it will match the whole line and what you want to do is break it down somehow so that you can extract the string you want to keep. Here, I have done exactly the same and it works fine in Notepad++ and Editplus3.


There are two problems with your original solution. Firstly, your example text:

<option value value='1' >A

has two occurences of the "value" word. Your regex does not. Also, you need to escape the opening brace in the quantifier of your regex or Vim will interpret it as a literal brace. This regex works:

:%s/<option value value='.\{1,}' >//g

Vim:

:%s/.* >//


This will work. Tested it in my vim. the single quotes are the trouble.

1,$s/^<option value value=['].['] >/

Vim:

:%s/.* >//


There is a very simple solution to this unless I have not understood the problem. The following regular expression:

(.*)(>)(.*)

will match the pattern specified in your post.

So, in notepad++ you find (.*)(>)(.*) and replace it with \3.

The regular expressions are basically greedy in the sense that if you specify (.*) it will match the whole line and what you want to do is break it down somehow so that you can extract the string you want to keep. Here, I have done exactly the same and it works fine in Notepad++ and Editplus3.


In notepad++

Search

(<option value="\w\w">)\w+">(.+)

Replace with

\1\2

This will work. Tested it in my vim. the single quotes are the trouble.

1,$s/^<option value value=['].['] >/

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to vim

Why does using from __future__ import print_function breaks Python2-style print? How to run vi on docker container? How can I install MacVim on OS X? Find and replace strings in vim on multiple lines Running Python code in Vim How do I set the default font size in Vim? Move cursor to end of file in vim Set encoding and fileencoding to utf-8 in Vim How to select all and copy in vim? Why I've got no crontab entry on OS X when using vim?

Examples related to notepad++

How to view Plugin Manager in Notepad++ How to format JSON in notepad++ CR LF notepad++ removal How to install a Notepad++ plugin offline? Find duplicates and delete all in notepad++ How to compare two files in Notepad++ v6.6.8 Notepad++ cached files location How to indent HTML tags in Notepad++ How to change background color in the Notepad++ text editor? How do I stop Notepad++ from showing autocomplete for all words in the file