[regex] What Regex would capture everything from ' mark to the end of a line?

I have a text file that denotes remarks with a single '.

Some lines have two quotes but I need to get everything from the first instance of a ' and the line feed.

I AL01                  ' A-LINE                            '091398 GDK 33394178    
         402922 0831850 '                                   '091398 GDK 33394179    
I AL02                  ' A-LINE                            '091398 GDK 33394180    
         400722 0833118 '                                   '091398 GDK 33394181    
I A10A                  ' A-LINE 102                       '  53198 DJ  33394182    
         395335 0832203 '                                  '  53198 DJ  33394183    
I A10B                  ' A-LINE 102                       '  53198 DJ  3339418

This question is related to regex

The answer is


'.*$

Starting with a single quote ('), match any character (.) zero or more times (*) until the end of the line ($).


When I tried '.* in windows (Notepad ++) it would match everything after first ' until end of last line.

To capture everything until end of that line I typed the following:

'.*?\n

This would only capture everything from ' until end of that line.


https://regex101.com/r/Jjc2xR/1

/(\w*\(Hex\): w*)(.*?)(?= |$)/gm

I'm sure this one works, it will capture de hexa serial in the badly structured text multilined bellow

     Space Reservation: disabled
         Serial Number: wCVt1]IlvQWv
   Serial Number (Hex): 77435674315d496c76515776
               Comment: new comment

I'm a eternal newbie in regex but I'll try explain this one

(\w*(Hex): w*) : Find text in line where string contains "Hex: "

(.*?) This is the second captured text and means everything after

(?= |$) create a limit that is the space between = and the |

So with the second group, you will have the value


The appropriate regex would be the ' char followed by any number of any chars [including zero chars] ending with an end of string/line token:

'.*$

And if you wanted to capture everything after the ' char but not include it in the output, you would use:

(?<=').*$

This basically says give me all characters that follow the ' char until the end of the line.

Edit: It has been noted that $ is implicit when using .* and therefore not strictly required, therefore the pattern:

'.* 

is technically correct, however it is clearer to be specific and avoid confusion for later code maintenance, hence my use of the $. It is my belief that it is always better to declare explicit behaviour than rely on implicit behaviour in situations where clarity could be questioned.


This will capture everything up to the ' in backreference 1 - and everything after the ' in backreference 2. You may need to escape the apostrophes though depending on language (\')

/^([^']*)'?(.*)$/

Quick modification: if the line doesn't have an ' - backreference 1 should still catch the whole line.

^ - start of string
([^']*) - capture any number of not ' characters
'? - match the ' 0 or 1 time
(.*) - capture any number of characters
$ - end of string

In your example I'd go for the following pattern:

'([^\n]+)$

use multiline and global options to match all occurences.

To include the linefeed in the match you could use:

'[^\n]+\n

But this might miss the last line if it has no linefeed.

For a single line, if you don't need to match the linefeed I'd prefer to use:

'[^$]+$