Neatest way to remove linebreaks in Perl

Question

I m maintaining a script that can get its input from various sources  and works on it per line  Depending on the actual source used  linebreaks might be Unix-style  Windows-style or even  for some aggregated input  mixed      When reading from a file it goes something like this    lines    lt IN gt   process   lines         sub process        lines   shift      foreach my  line     lines             chomp  line           Handle line by line           So  what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks  I m coming up with way too many ways of solving this  one of the usual drawbacks of Perl     What s your opinion on the neatest way to chomp off generic linebreaks  What would be the most efficient   Edit  A small clarification - the method  process  gets a list of lines from somewhere  not nessecarily read from a file  Each line might have   No trailing linebreaks Unix-style linebreaks Windows-style linebreaks Just Carriage-Return  when original data has Windows-style linebreaks and is read with        n   An aggregated set where lines have different styles

User · Answer

To extend Ted Cambron s answer above and something that hasn t been addressed here  If you remove all line breaks indiscriminately from a chunk of entered text  you will end up with paragraphs running into each other without spaces when you output that text later  This is what I use   sub cleanLines       my  text   shift        text    s  r      replace  r with space      text    s  n      replace  n with space      text    s      g   replace double-spaces with single space      return  text      The last substitution uses the g  greedy  modifier so it continues to find double-spaces until it replaces them all   Effectively substituting anything more that single space

User · Answer

After digging a bit through the perlre docs a bit  I ll present my best suggestion so far that seems to work pretty good  Perl 5 10 added the  R character class as a generalized linebreak    line    s  R  g    It s the same as      gt  x0D x0A    x0A- x0C x85 x 2028  x 2029      I ll keep this question open a while yet  just to see if there s more nifty ways waiting to be suggested

User · Answer

Reading perlport I d suggest something like   line    s  015  012        to be safe for whatever platform you re on and whatever linefeed style you may be processing because what s in  r and  n may differ through different Perl flavours

User · Answer

line    s   r n    g

User · Answer

Note from 2017  File  Slurp is not recommended due to design mistakes and unmaintained errors  Use File  Slurper or Path  Tiny instead   extending on your answer  use File  Slurp     my  value   File  Slurp  slurp  filename    value    s  R   g    File  Slurp abstracts away the File IO stuff and just returns a string for you    NOTE   Important to note the addition of  g   without it  given a multi-line string  it will only replace the first offending character    Also  the removal of    which is redundant for this purpose  as we want to strip all line breaks  not just line-breaks before whatever is meant by   on this OS    In a multi-line string    matches the end of the string and that would be problematic     Point 3 means that point 2 is made with the assumption that you d also want to use  m otherwise     would be basically meaningless for anything practical in a string with  1 lines  or  doing single line processing  an OS which actually understands   and manages to find the  R   that proceed the     Examples   while  my  line    lt  foo gt            line     regex      Given the above notation  an OS which does not understand whatever your files   n  or   r  delimiters  in the default scenario with the OS s default delimiter set for    will result in reading your whole file as one contiguous string   unless your string has the  OS s delimiters in it  where it will delimit by that    So in this case all of these regex are useless      R        Will only erase the last sequence of  R in the file    R        Will only erase the first sequence of  R in the file    012  015      When will only erase the first 012 015    012   or  015 sequence   015 012 will result in either  012 or  015 being emitted     R       If there happens to be no byte sequences of   015 OSDELIMITER  in the file  then then NO linebreaks will be removed except for the OS s own ones    It would appear nobody gets what I m talking about  so here is example code  that is tested to NOT remove line feeds  Run it  you ll see that it leaves the linefeeds in       usr bin perl   use strict  use warnings   my  fn    TestFile txt    my  LF     012   my  CR     015    my  UnixNL    LF  my  DOSNL     CR    LF  my  MacNL     CR   sub generate        my  filename   shift      my  lineDelimiter   shift       open my  fh    gt     filename      for   0    10                 print  fh   0            print  fh join     map   chr  int  rand 26    60       0    20          print  fh   1            print  fh  lineDelimiter- gt             print  fh   2              close  fh     sub parse        my  filename   shift      my  osDelimiter   shift      my  message   shift      print  Parsing  message File  filename    n        local       osDelimiter       open my  fh    lt     filename      while   my  line    lt  fh gt                    line    s  R              print   gt       line      lt               print  Done  n n       my  all      DOSNL  MacNL  UnixNL   generate  Windows txt    sub    DOSNL     generate  Mac txt    sub    MacNL    generate  Unix txt   sub    UnixNL    generate  Mixed txt   sub       return  all  int rand 2           for my  os      MacNL    On Mac       DOSNL    On Windows       UnixNL    On Unix         for   qw  Windows Mac Unix Mixed              parse        txt       os              For the CLEARLY Unprocessed output  see here  http   pastebin com f2c063d74   Note there are certain combinations that of course work  but they are likely the ones you yourself na  vely tested   Note that in this output  all results must be of the form  gt   string  lt  gt   string  lt   with NO LINE FEEDS to be considered valid output    and  string is of the general form  0  data 1  delimiter 2  where in all output sources  there should be either     Nothing between  1  and  2  only   lt  gt   between  1  and  2

User · Answer

In your example  you can just go   chomp  lines     Or      join      lines   s   r n    g    Or    lines   split    r n     join      lines     Using these directly on a file   perl -e     join     lt  gt    s   r n    g  print   lt a txt  less  perl -e  chomp  a  lt  gt   print  a   lt a txt  less

User · Answer

Whenever I go through input and want to remove or replace characters I run it through little subroutines like this one   sub clean        my  text   shift        text    s  n  g       text    s  r  g       return  text      It may not be fancy but this method has been working flawless for me for years

[perl] Neatest way to remove linebreaks in Perl

Examples related to perl

Examples related to line-breaks