Replace all whitespace with a line break paragraph mark to make a word list

Question

I am trying to vocab list for a Greek text we are translating in class. I want to replace every space or tab character with a paragraph mark so that every word appears on its own line. Can anyone give me the sed command, and explain what it is that I'm doing? I’m still trying to figure sed out.

This question is related to regex sed

User · Answer

This should do the work   sed -e  s    t    n g       t  means a space OR an tab  If you want any kind of space  you could also use  s      t   means as many spaces OR tabs as you want  but at least one   s x y  means replace the pattern x by y  here  n is a new line   The g at the end means that you have to repeat as many times it occurs in every line

User · Answer

option 1  echo   cat testfile   Option 2  tr       n   lt  testfile

User · Answer

You can also do it with xargs   cat old   xargs -n1  gt  new   or  xargs -n1  lt  old  gt  new

User · Answer

All of the examples listed above for sed break on one platform or another. None of them work with the version of sed shipped on Macs.

However, Perl's regex works the same on any machine with Perl installed:

perl -pe 's/\s+/\n/g' file.txt

If you want to save the output:

perl -pe 's/\s+/\n/g' file.txt > newfile.txt

If you want only unique occurrences of words:

perl -pe 's/\s+/\n/g' file.txt | sort -u > newfile.txt

User · Answer

For reasonably modern versions of sed  edit the standard input to yield the standard output with    echo  t                   p      sed -E -e  s    blank      n g  t                   p     If your vocabulary words are in files named lesson1 and lesson2  redirect sed   s standard output to the file all-vocab with  sed -E -e  s    blank      n g  lesson1 lesson2  gt  all-vocab   What it means    The character class    blank    matches either a single space character or  a single tab character    Use    space    instead to match any single whitespace character  commonly space  tab  newline  carriage return  form-feed  and vertical tab   The   quantifier means match one or more of the previous pattern  So    blank     is a sequence of one or more characters that are all space or tab   The  n in the replacement is the newline that you want  The  g modifier on the end means perform the substitution as many times as possible rather than just once  The -E option tells sed to use POSIX extended regex syntax and in particular for this case the   quantifier  Without -E  your sed command becomes sed -e  s    blank       n g    Note the use of    rather than simple       Perl Compatible Regexes  For those familiar with Perl-compatible regexes and a PCRE-capable sed  use  s  to match runs of at least one whitespace character  as in  sed -E -e  s  s   n g  old  gt  new   or  sed -e  s  s    n g  old  gt  new   These commands read input from the file old and write the result to a file named new in the current directory   Maximum portability  maximum cruftiness  Going back to almost any version of sed since Version 7 Unix  the command invocation is a bit more baroque     echo  t                   p      sed -e  s    t    t      g  t                   p     Notes    Here we do not even assume the existence of the humble   quantifier and simulate it with a single space-or-tab     t   followed by zero or more of them     t     Similarly  assuming sed does not understand  n for newline  we have to include it on the command line verbatim    The   and the end of the first line of the command is a continuation marker that escapes the immediately following newline  and the remainder of the command is on the next line    Note  There must be no whitespace preceding the escaped newline  That is  the end of the first line must be exactly backslash followed by end-of-line   This error prone process helps one appreciate why the world moved to visible characters  and you will want to exercise some care in trying out the command with copy-and-paste     Note on backslashes and quoting  The commands above all used single quotes      rather than double quotes       Consider     echo                         That is  the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings  You typically want to protect all the backslashes common in regexes with single quotes

User · Answer

You could use POSIX    blank    to match a horizontal white-space character   sed  s    blank       n g  file   or you may use    space    instead of    blank    also   Example     echo  this  is a sentence    sed  s    blank       n g  this is a sentence

User · Answer

The portable way to do this is   sed -e  s    t    t      g    That s an actual newline between the backslash and the slash-g  Many sed implementations don t know about  n  so you need a literal newline  The backslash before the newline prevents sed from getting upset about the newline   in sed scripts the commands are normally terminated by newlines   With GNU sed you can use  n in the substitution  and  s in the regex   sed -e  s  s s   n g    GNU sed also supports  extended  regular expressions  that s egrep style  not perl-style  if you give it the -r flag  so then you can use     sed -r -e  s  s   n g    If this is for Linux only  you can probably go with the GNU command  but if you want this to work on systems with a non-GNU sed  eg  BSD  Mac OS-X   you might want to go with the more portable option

User · Answer

Using gawk   gawk    1  1 1  OFS   n  file

[regex] Replace all whitespace with a line break/paragraph mark to make a word list

The answer is

Examples related to regex

Examples related to sed

Tags