Match whitespace but not newlines

Question

I sometimes want to match whitespace but not newline   So far I ve been resorting to    t    Is there a less awkward way

User · Accepted Answer

Perl versions 5 10 and later support subsidiary vertical and horizontal character classes   v and  h  as well as the generic whitespace character class  s  The cleanest solution is to use the horizontal whitespace character class  h  This will match tab and space from the ASCII set  non-breaking space from extended ASCII  or any of these Unicode characters  U 0009 CHARACTER TABULATION U 0020 SPACE U 00A0 NO-BREAK SPACE  not matched by  s   U 1680 OGHAM SPACE MARK U 2000 EN QUAD U 2001 EM QUAD U 2002 EN SPACE U 2003 EM SPACE U 2004 THREE-PER-EM SPACE U 2005 FOUR-PER-EM SPACE U 2006 SIX-PER-EM SPACE U 2007 FIGURE SPACE U 2008 PUNCTUATION SPACE U 2009 THIN SPACE U 200A HAIR SPACE U 202F NARROW NO-BREAK SPACE U 205F MEDIUM MATHEMATICAL SPACE U 3000 IDEOGRAPHIC SPACE   The vertical space pattern  v is less useful  but matches these characters  U 000A LINE FEED U 000B LINE TABULATION U 000C FORM FEED U 000D CARRIAGE RETURN U 0085 NEXT LINE  not matched by  s   U 2028 LINE SEPARATOR U 2029 PARAGRAPH SEPARATOR   There are seven vertical whitespace characters which match  v and eighteen horizontal ones which match  h   s matches twenty-three characters  All whitespace characters are either vertical or horizontal with no overlap  but they are not proper subsets because  h also matches U 00A0 NO-BREAK SPACE  and  v also matches U 0085 NEXT LINE  neither of which are matched by  s

User · Answer

The below regex would match white spaces but not of a new line character          n  s    DEMO  If you want to add carriage return also then add  r with the   operator inside the negative lookahead           n r   s    DEMO  Add   after the non-capturing group to match one or more white spaces           n r   s     DEMO  I don t know why you people failed to mention the POSIX character class    blank    which matches any horizontal whitespaces  spaces and tabs   This POSIX chracter class would work on BRE Basic REgular Expressions   ERE Extended Regular Expression   PCRE Perl Compatible Regular Expression    DEMO

User · Answer

A variation on Greg   s answer that includes carriage returns too       S r n     This regex is safer than     S n   with no  r  My reasoning is that Windows uses  r n for newlines  and Mac OS 9 used  r  You   re unlikely to find  r without  n nowadays  but if you do find it  it couldn   t mean anything but a newline  Thus  since  r can mean a newline  we should exclude it too

User · Answer

m   g just give space in      and it will work  Or use  S     it will replace all the special characters like tab  newlines  spaces  and so on

User · Answer

Use a double-negative       S r n     That is  not-not-whitespace  the capital S complements  or not-carriage-return or not-newline  Distributing the outer not  i e   the complementing   in the character class  with De Morgan s law  this is equivalent to    whitespace but not carriage return or newline     Including both  r and  n in the pattern correctly handles all of Unix  LF   classic Mac OS  CR   and DOS-ish  CR nbsp LF  newline conventions    No need to take my word for it       usr bin env perl  use strict  use warnings   use 5 005     for qr    my  ws not crlf   qr    S r n     for         f     t     r     n       my  qq   qq          printf   -4s   gt   s n    qq       eval  qq      ws not crlf    match     no match       Output           match   f     match   t     match   r     no match   n     no match  Note the exclusion of vertical tab  but this is addressed in v5 18   Before objecting too harshly  the Perl documentation uses the same technique  A footnote in the    Whitespace    section of perlrecharclass reads     Prior to Perl v5 18   s did not match the vertical tab     S cK   obscurely  matches what  s traditionally did    The same section of perlrecharclass also suggests other approaches that won   t offend language teachers    opposition to double-negatives   Outside locale and Unicode rules or when the  a switch is in effect      s matches   t n f r   and  starting in Perl v5 18  the vertical tab   cK     Discard  r and  n to leave    t f cK    for matching whitespace but not newline   If your text is Unicode  use code similar to the sub below to construct a pattern from the table in the aforementioned documentation section   sub ws not nl     local        lt  lt  EOTable   0x0009        CHARACTER TABULATION   h s 0x000a              LINE FEED  LF     vs 0x000b             LINE TABULATION    vs   1  0x000c              FORM FEED  FF     vs 0x000d        CARRIAGE RETURN  CR     vs 0x0020                       SPACE   h s 0x0085             NEXT LINE  NEL     vs   2  0x00a0              NO-BREAK SPACE   h s   2  0x1680            OGHAM SPACE MARK   h s 0x2000                     EN QUAD   h s 0x2001                     EM QUAD   h s 0x2002                    EN SPACE   h s 0x2003                    EM SPACE   h s 0x2004          THREE-PER-EM SPACE   h s 0x2005           FOUR-PER-EM SPACE   h s 0x2006            SIX-PER-EM SPACE   h s 0x2007                FIGURE SPACE   h s 0x2008           PUNCTUATION SPACE   h s 0x2009                  THIN SPACE   h s 0x200a                  HAIR SPACE   h s 0x2028              LINE SEPARATOR    vs 0x2029         PARAGRAPH SEPARATOR    vs 0x202f       NARROW NO-BREAK SPACE   h s 0x205f   MEDIUM MATHEMATICAL SPACE   h s 0x3000           IDEOGRAPHIC SPACE   h s EOTable    my  class    while    0x  0-9a-f  4   s   A-Z s    mg        my  hex  name      1  2       next if  name      b   CR NL NEL SEPARATOR  b        class       N U  hex           qr   class  u        Other Applications  The double-negative trick is also handy for matching alphabetic characters too  Remember that  w matches    word characters     alphabetic characters and digits and underscore  We ugly-Americans sometimes want to write it as  say   if    A-Za-z               but a double-negative character-class can respect the locale   if      W d                Expressing    a word character but not digit or underscore    this way is a bit opaque  A POSIX character-class communicates the intent more directly  if      alpha                 or with a Unicode property as szbalint suggested  if    p Letter

User · Answer

What you are looking for is the POSIX blank character class  In Perl it is referenced as     blank     in Java  don t forget to enable UNICODE CHARACTER CLASS    p Blank   Compared to the similar  h  POSIX blank is supported by a few more regex engines  reference   A major benefit is that its definition is fixed in Annex C  Compatibility Properties of Unicode Regular Expressions and standard across all regex flavors that support Unicode   In Perl  for example   h chooses to additionally include the MONGOLIAN VOWEL SEPARATOR   However  an argument in favor of  h is that it always detects Unicode characters  even if the engines don t agree on which   while POSIX character classes are often by default ASCII-only  as in Java   But the problem is that even sticking to Unicode doesn t solve the issue 100   Consider the following characters which are not considered whitespace in Unicode   U 180E MONGOLIAN VOWEL SEPARATOR  U 200B ZERO WIDTH SPACE  U 200C ZERO WIDTH NON-JOINER  U 200D ZERO WIDTH JOINER  U 2060 WORD JOINER  U FEFF ZERO WIDTH NON-BREAKING SPACE Taken from https   en wikipedia org wiki White-space character   The aforementioned Mongolian vowel separator isn t included for what is probably a good reason  It  along with 200C and 200D  occur within words  AFAIK   and therefore breaks the cardinal rule that all other whitespace obeys  you can tokenize with it  They re more like modifiers  However  ZERO WIDTH SPACE  WORD JOINER  and ZERO WIDTH NON-BREAKING SPACE  if it used as other than a byte-order mark  fit the whitespace rule in my book  Therefore  I include them in my horizontal whitespace character class  In Java  static public final String HORIZONTAL WHITESPACE    quot    p Blank   u200B  u2060  uFFEF  quot

[regex] Match whitespace but not newlines

Examples related to regex

Examples related to perl