Why should text files end with a newline

Question

I assume everyone here is familiar with the adage that all text files should end with a newline  I ve known of this  rule  for years but I ve always wondered     why

User · Answer

I was always under the impression the rule came from the days when parsing a file without an ending newline was difficult. That is, you would end up writing code where an end of line was defined by the EOL character or EOF. It was just simpler to assume a line ended with EOL.

However I believe the rule is derived from C compilers requiring the newline. And as pointed out on “No newline at end of file” compiler warning, #include will not add a newline.

User · Answer

It s very late here but I just faced one bug in file processing and that came because the files were not ending with empty newline  We were processing text files with sed and sed was omitting the last line from output which was causing invalid json structure and sending rest of the process to fail state    All we were doing was   There is one sample file say  foo txt with some json content inside it           someProp  value          someProp  value     lt -- No newline here   The file was created in widows machine and window scripts were processing that file using PowerShell commands  All good   When we processed same file using sed command sed  s value newValue g  foo txt  gt  foo txt tmp   The newly generated file was          someProp  value          someProp  value   and boom  it failed the rest of the processes because of the invalid JSON   So it s always a good practice to end your file with empty new line

User · Answer

Imagine that the file is being processed while the file is still being generated by another process   It might have to do with that  A flag that indicates that the file is ready to be processed

User · Answer

In addition to the above practical reasons  it wouldn t surprise me if the originators of Unix  Thompson  Ritchie  et al   or their Multics predecessors realized that there is a theoretical reason to use line terminators rather than line separators   With line terminators  you can encode all possible files of lines   With line separators  there s no difference between a file of zero lines and a file containing a single empty line  both of them are encoded as a file containing zero characters   So  the reasons are    Because that s the way POSIX defines it  Because some tools expect it or  misbehave  without it   For example  wc -l will not count a final  line  if it doesn t end with a newline  Because it s simple and convenient   On Unix  cat just works and it works without complication   It just copies the bytes of each file  without any need for interpretation   I don t think there s a DOS equivalent to cat   Using copy a b c will end up merging the last line of file a with the first line of file b  Because a file  or stream  of zero lines can be distinguished from a file of one empty line

User · Answer

This answer is an attempt at a technical answer rather than opinion   If we want to be POSIX purists  we define a line as      A sequence of zero or more non-  lt newline  characters plus a terminating  lt newline  character    Source  https   pubs opengroup org onlinepubs 9699919799 basedefs V1 chap03 html tag 03 206  An incomplete line as      A sequence of one or more non-  lt newline  characters at the end of the file    Source  https   pubs opengroup org onlinepubs 9699919799 basedefs V1 chap03 html tag 03 195  A text file as      A file that contains characters organized into zero or more lines  The lines do not contain NUL characters and none can exceed  LINE MAX  bytes in length  including the  lt newline  character  Although POSIX 1-2008 does not distinguish between text files and binary files  see the ISO C standard   many utilities only produce predictable or meaningful output when operating on text files  The standard utilities that have such restrictions always specify  text files  in their STDIN or INPUT FILES sections    Source  https   pubs opengroup org onlinepubs 9699919799 basedefs V1 chap03 html tag 03 397  A string as      A contiguous sequence of bytes terminated by and including the first null byte    Source  https   pubs opengroup org onlinepubs 9699919799 basedefs V1 chap03 html tag 03 396  From this then  we can derive that the only time we will potentially encounter any type of issues are if we deal with the concept of a line of a file or a file as a text file  being that a text file is an organization of zero or more lines  and a line we know must terminate with a  lt newline       Case in point  wc -l filename   From the wc s manual we read      A line is defined as a string of characters delimited by a  lt newline  character    What are the implications to JavaScript  HTML  and CSS files then being that they are text  files   In browsers  modern IDEs  and other front-end applications there are no issues with skipping EOL at EOF   The applications will parse the files properly   It has to since not all Operating Systems conform to the POSIX standard  so it would be impractical for non-OS tools  e g  browsers  to handle files according to the POSIX standard  or any OS-level standard    As a result  we can be relatively confident that EOL at EOF will have virtually no negative impact at the application level - regardless if it is running on a UNIX OS   At this point we can confidently say that skipping EOL at EOF is safe when dealing with JS  HTML  CSS on the client-side   Actually  we can state that minifying any one of these files  containing no  lt newline  is safe   We can take this one step further and say that as far as NodeJS is concerned it too cannot adhere to the POSIX standard being that it can run in non-POSIX compliant environments   What are we left with then   System level tooling   This means the only issues that may arise are with tools that make an effort to adhere their functionality to the semantics of POSIX  e g  definition of a line as shown in wc    Even so  not all shells will automatically adhere to POSIX   Bash for example does not default to POSIX behavior   There is a switch to enable it  POSIXLY CORRECT    Food for thought on the value of EOL being  lt newline   https   www rfc-editor org old EOLstory txt  Staying on the tooling track  for all practical intents and purposes  let s consider this   Let s work with a file that has no EOL   As of this writing the file in this example is a minified JavaScript with no EOL   curl http   cdnjs cloudflare com ajax libs AniJS 0 5 0 anijs-min js -o x js curl http   cdnjs cloudflare com ajax libs AniJS 0 5 0 anijs-min js -o y js    cat x js y js  gt  z js  -rw-r--r--  1 milanadamovsky   7905 Aug 14 23 17 x js -rw-r--r--  1 milanadamovsky   7905 Aug 14 23 17 y js -rw-r--r--  1 milanadamovsky  15810 Aug 14 23 18 z js   Notice the cat file size is exactly the sum of its individual parts   If the concatenation of JavaScript files is a concern for JS files  the more appropriate concern would be to start each JavaScript file with a semi-colon   As someone else mentioned in this thread  what if you want to cat two files whose output becomes just one line instead of two   In other words  cat does what it s supposed to do   The man of cat only mentions reading input up to EOF  not  lt newline    Note that the -n switch of cat will also print out a non-  lt newline  terminated line  or incomplete line  as a line - being that the count starts at 1  according to the man       -n      Number the output lines  starting at 1    Now that we understand how POSIX defines a line   this behavior becomes ambiguous  or really   non-compliant   Understanding a given tool s purpose and compliance will help in determining how critical it is to end files with an EOL   In C  C    Java  JARs   etc    some standards will dictate a newline for validity - no such standard exists for JS  HTML  CSS   For example  instead of using wc -l filename one could do awk   x   END  print x   filename   and rest assured that the task s success is not jeopardized by a file we may want to process that we did not write  e g  a third party library such as the minified JS we curld  - unless our intent was truly to count lines in the POSIX compliant sense   Conclusion  There will be very few real life use cases where skipping EOL at EOF for certain text files such as JS  HTML  and CSS will have a negative impact - if at all   If we rely on  lt newline  being present  we are restricting the reliability of our tooling only to the files that we author and open ourselves up to potential errors introduced by third party files   Moral of the story   Engineer tooling that does not have the weakness of relying on EOL at EOF   Feel free to post use cases as they apply to JS  HTML and CSS where we can examine how skipping EOL has an adverse effect

User · Answer

IMHO  it s a matter of personal style and opinion    In olden days  I didn t put that newline  A character saved means more speed through that 14 4K modem    Later  I put that newline so that it s easier to select the final line using shift downarrow

User · Answer

Basically there are many programs which will not process files correctly if they don t get the final EOL EOF    GCC warns you about this because it s expected as part of the C standard   section 5 1 1 2 apparently    quot No newline at end of file quot  compiler warning

User · Answer

I ve wondered this myself for years  But i came across a good reason today   Imagine a file with a record on every line  ex  a CSV file   And that the computer was writing records at the end of the file  But it suddenly crashed  Gee was the last line complete   not a nice situation   But if we always terminate the last line  then we would know  simply check if last line is terminated   Otherwise we would probably have to discard the last line every time  just to be safe

User · Answer

I personally like new lines at the end of source code files   It may have its origin with Linux or all UNIX systems for that matter  I remember there compilation errors  gcc if I m not mistaken  because source code files did not end with an empty new line  Why was it made this way one is left to wonder

User · Answer

A separate use case  when your text file is version controlled  in this case specifically under git although it applies to others too   If content is added to the end of the file  then the line that was previously the last line will have been edited to include a newline character  This means that blameing the file to find out when that line was last edited will show the text addition  not the commit before that you actually wanted to see

User · Answer

There s also a practical programming issue with files lacking newlines at the end  The read Bash built-in  I don t know about other read implementations  doesn t work as expected   printf   foo nbar    while read line do     echo  line done   This prints only foo  The reason is that when read encounters the last line  it writes the contents to  line but returns exit code 1 because it reached EOF  This breaks the while loop  so we never reach the echo  line part  If you want to handle this situation  you have to do the following   while read line      -n    line-     do     echo  line done  lt   lt  printf   foo nbar     That is  do the echo if the read failed because of a non-empty line at end of file  Naturally  in this case there will be one extra newline in the output which was not in the input

User · Answer

It may be related to the difference between    text file  each line is supposed to end in an end-of-line  binary file  there are no true  lines  to speak of  and the length of the file must be preserved    If each line does end in an end-of-line  this avoids  for instance  that concatenating two text files would make the last line of the first run into the first line of the second    Plus  an editor can check at load whether the file ends in an end-of-line  saves it in its local option  eol   and uses that when writing the file   A few years back  2005   many editors  ZDE  Eclipse  Scite       did  forget  that final EOL  which was not very appreciated  Not only that  but they interpreted that final EOL incorrectly  as  start a new line   and actually start to display another line as if it already existed  This was very visible with a  proper  text file with a well-behaved text editor like vim  compared to opening it in one of the above editors  It displayed an extra line below the real last line of the file  You see something like this   1 first line 2 middle line 3 last line 4

User · Answer

Why should  text  files end with a newline    As well expressed by many  because     Many programs do not behave well  or fail without it  Even programs that well handle a file lack an ending   n   the tool s functionality may not meet the user s expectations - which can be unclear in this corner case  Programs rarely disallow final   n   I do not know of any       Yet this begs the next question      What should code do about text files without a newline     Most important - Do not write code that assumes a text file ends with a newline   Assuming a file conforms to a format leads to data corruption  hacker attacks and crashes   Example      Bad code while  fgets buf  sizeof buf  instream          What happens if there is no  n  buf   is truncated leading to who knows what   buf strlen buf  - 1      0       attempt to rid trailing  n          If the final trailing   n  is needed  alert the user to its absence and the action taken   IOWs  validate the file s format   Note  This may include a limit to the maximum line length  character encoding  etc  Define clearly  document  the code s handling of a missing final   n   Do not  as possible  generate a file the lacks the ending   n

User · Answer

Some tools expect this  For example  wc expects this     echo -n  Line not ending in a new line    wc -l 0   echo  Line ending with a new line    wc -l 1

User · Answer

Each line should be terminated in a newline character  including the last one  Some programs have problems processing the last line of a file if it isn t newline terminated     GCC warns about it not because it can t process the file  but because it has to as part of the standard      The C language standard says   A source file that is not empty shall end in a new-line character  which shall not be immediately preceded by a backslash character       Since this is a  shall  clause  we must emit a diagnostic message for a violation of this rule       This is in section 2 1 1 2 of the ANSI C 1989 standard  Section 5 1 1 2 of the ISO C 1999 standard  and probably also the ISO C 1990 standard     Reference  The GCC GNU mail archive

User · Answer

Because that   s how the POSIX standard defines a line         3 206 Line     A sequence of zero or more non-  lt newline gt  characters plus a terminating  lt newline gt  character     Therefore  lines not ending in a newline character aren t considered actual lines  That s why some programs have problems processing the last line of a file if it isn t newline terminated   There s at least one hard advantage to this guideline when working on a terminal emulator  All Unix tools expect this convention and work with it  For instance  when concatenating files with cat  a file terminated by newline will have a different effect than one without     more a txt foo   more b txt bar  more c txt baz   cat  a b c  txt foo barbaz  And  as the previous example also demonstrates  when displaying the file on the command line  e g  via more   a newline-terminated file results in a correct display  An improperly terminated file might be garbled  second line    For consistency  it   s very helpful to follow this rule     doing otherwise will incur extra work when dealing with the default Unix tools     Think about it differently  If lines aren   t terminated by newline  making commands such as cat useful is much harder  how do you make a command to concatenate files such that   it puts each file   s start on a new line  which is what you want 95  of the time  but it allows merging the last and first line of two files  as in the example above between b txt and c txt    Of course this is solvable but you need to make the usage of cat more complex  by adding positional command line arguments  e g  cat a txt --no-newline b txt c txt   and now the command rather than each individual file controls how it is pasted together with other files  This is almost certainly not convenient       Or you need to introduce a special sentinel character to mark a line that is supposed to be continued rather than terminated  Well  now you   re stuck with the same situation as on POSIX  except inverted  line continuation rather than line termination character      Now  on non POSIX compliant systems  nowadays that   s mostly Windows   the point is moot  files don   t generally end with a newline  and the  informal  definition of a line might for instance be    text that is separated by newlines     note the emphasis   This is entirely valid  However  for structured data  e g  programming code  it makes parsing minimally more complicated  it generally means that parsers have to be rewritten  If a parser was originally written with the POSIX definition in mind  then it might be easier to modify the token stream rather than the parser      in other words  add an    artificial newline    token to the end of the input

User · Answer

Presumably simply that some parsing code expected it to be there   I m not sure I would consider it a  rule   and it certainly isn t something I adhere to religiously  Most sensible code will know how to parse text  including encodings  line-by-line  any choice of line endings   with-or-without a newline on the last line   Indeed - if you end with a new line  is there  in theory  an empty final line between the EOL and the EOF  One to ponder

User · Answer

This originates from the very early days when simple terminals were used  The newline char was used to trigger a  flush  of the transferred data   Today  the newline char isn t required anymore  Sure  many apps still have problems if the newline isn t there  but I d consider that a bug in those apps   If however you have a text file format where you require the newline  you get simple data verification very cheap  if the file ends with a line that has no newline at the end  you know the file is broken  With only one extra byte for each line  you can detect broken files with high accuracy and almost no CPU time

[file] Why should text files end with a newline?

Examples related to file

Examples related to unix

Examples related to text-files

Examples related to newline