[regex] How to remove carriage returns and new lines in Postgresql?

All,

I am stuck again trying to get my data in a format that I need it in. I have a text field that looks like this.

"deangelo 001 deangelo

local origin of name: italain

from the american name deangelo

meaning: of the angels

emotional spectrum • he is a fountain of joy for all.

personal integrity • his good name is his most precious asset. personality • it’s hard to soar with eagles when you’re surrounded by turkeys! relationships • starts slowly, but a relationship with deangelo builds over time. travel & leisure • a trip of a lifetime is in his future.

career & money • a gifted child, deangelo will need to be challenged constantly.

life’s opportunities • joy and happiness await this blessed person.

deangelo’s lucky numbers: 12 • 38 • 18 • 34 • 29 • 16

"

What would the best way be in Postgresql to remove the carriage returns and new lines? I've tried several things and none of them want to behave.

select regexp_replace(field, E'\r\c', '  ', 'g') from mytable
    WHERE id = 5520805582

SELECT regexp_replace(field, E'[^\(\)\&\/,;\*\:.\>\<[:space:]a-zA-Z0-9-]', ' ')
    FROM mytable
    WHERE field~ E'[^\(\)\&\/,;\*\:.\<\>[:space:]a-zA-Z0-9-]'
    AND id = 5520805582;

Thanks in advance, Adam

This question is related to regex postgresql

The answer is


In the case you need to remove line breaks from the begin or end of the string, you may use this:

UPDATE table 
SET field = regexp_replace(field, E'(^[\\n\\r]+)|([\\n\\r]+$)', '', 'g' );

Have in mind that the hat ^ means the begin of the string and the dollar sign $ means the end of the string.

Hope it help someone.


OP asked specifically about regexes since it would appear there's concern for a number of other characters as well as newlines, but for those just wanting strip out newlines, you don't even need to go to a regex. You can simply do:

select replace(field,E'\n','');

I think this is an SQL-standard behavior, so it should extend back to all but perhaps the very earliest versions of Postgres. The above tested fine for me in 9.4 and 9.2


select regexp_replace(field, E'[\\n\\r\\u2028]+', ' ', 'g' )

I had the same problem in my postgres d/b, but the newline in question wasn't the traditional ascii CRLF, it was a unicode line separator, character U2028. The above code snippet will capture that unicode variation as well.

Update... although I've only ever encountered the aforementioned characters "in the wild", to follow lmichelbacher's advice to translate even more unicode newline-like characters, use this:

select regexp_replace(field, E'[\\n\\r\\f\\u000B\\u0085\\u2028\\u2029]+', ' ', 'g' )