[file] Why should text files end with a newline?

I assume everyone here is familiar with the adage that all text files should end with a newline. I've known of this "rule" for years but I've always wondered — why?

This question is related to file unix text-files newline

The answer is


Basically there are many programs which will not process files correctly if they don't get the final EOL EOF.

GCC warns you about this because it's expected as part of the C standard. (section 5.1.1.2 apparently)

"No newline at end of file" compiler warning


Imagine that the file is being processed while the file is still being generated by another process.

It might have to do with that? A flag that indicates that the file is ready to be processed.


I've wondered this myself for years. But i came across a good reason today.

Imagine a file with a record on every line (ex: a CSV file). And that the computer was writing records at the end of the file. But it suddenly crashed. Gee was the last line complete? (not a nice situation)

But if we always terminate the last line, then we would know (simply check if last line is terminated). Otherwise we would probably have to discard the last line every time, just to be safe.


A separate use case: when your text file is version controlled (in this case specifically under git although it applies to others too). If content is added to the end of the file, then the line that was previously the last line will have been edited to include a newline character. This means that blameing the file to find out when that line was last edited will show the text addition, not the commit before that you actually wanted to see.


Each line should be terminated in a newline character, including the last one. Some programs have problems processing the last line of a file if it isn't newline terminated.

GCC warns about it not because it can't process the file, but because it has to as part of the standard.

The C language standard says A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character.

Since this is a "shall" clause, we must emit a diagnostic message for a violation of this rule.

This is in section 2.1.1.2 of the ANSI C 1989 standard. Section 5.1.1.2 of the ISO C 1999 standard (and probably also the ISO C 1990 standard).

Reference: The GCC/GNU mail archive.


IMHO, it's a matter of personal style and opinion.

In olden days, I didn't put that newline. A character saved means more speed through that 14.4K modem.

Later, I put that newline so that it's easier to select the final line using shift+downarrow.


Presumably simply that some parsing code expected it to be there.

I'm not sure I would consider it a "rule", and it certainly isn't something I adhere to religiously. Most sensible code will know how to parse text (including encodings) line-by-line (any choice of line endings), with-or-without a newline on the last line.

Indeed - if you end with a new line: is there (in theory) an empty final line between the EOL and the EOF? One to ponder...


It may be related to the difference between:

  • text file (each line is supposed to end in an end-of-line)
  • binary file (there are no true "lines" to speak of, and the length of the file must be preserved)

If each line does end in an end-of-line, this avoids, for instance, that concatenating two text files would make the last line of the first run into the first line of the second.

Plus, an editor can check at load whether the file ends in an end-of-line, saves it in its local option 'eol', and uses that when writing the file.

A few years back (2005), many editors (ZDE, Eclipse, Scite, ...) did "forget" that final EOL, which was not very appreciated.
Not only that, but they interpreted that final EOL incorrectly, as 'start a new line', and actually start to display another line as if it already existed.
This was very visible with a 'proper' text file with a well-behaved text editor like vim, compared to opening it in one of the above editors. It displayed an extra line below the real last line of the file. You see something like this:

1 first line
2 middle line
3 last line
4

There's also a practical programming issue with files lacking newlines at the end: The read Bash built-in (I don't know about other read implementations) doesn't work as expected:

printf $'foo\nbar' | while read line
do
    echo $line
done

This prints only foo! The reason is that when read encounters the last line, it writes the contents to $line but returns exit code 1 because it reached EOF. This breaks the while loop, so we never reach the echo $line part. If you want to handle this situation, you have to do the following:

while read line || [ -n "${line-}" ]
do
    echo $line
done < <(printf $'foo\nbar')

That is, do the echo if the read failed because of a non-empty line at end of file. Naturally, in this case there will be one extra newline in the output which was not in the input.


I personally like new lines at the end of source code files.

It may have its origin with Linux or all UNIX systems for that matter. I remember there compilation errors (gcc if I'm not mistaken) because source code files did not end with an empty new line. Why was it made this way one is left to wonder.


In addition to the above practical reasons, it wouldn't surprise me if the originators of Unix (Thompson, Ritchie, et al.) or their Multics predecessors realized that there is a theoretical reason to use line terminators rather than line separators: With line terminators, you can encode all possible files of lines. With line separators, there's no difference between a file of zero lines and a file containing a single empty line; both of them are encoded as a file containing zero characters.

So, the reasons are:

  1. Because that's the way POSIX defines it.
  2. Because some tools expect it or "misbehave" without it. For example, wc -l will not count a final "line" if it doesn't end with a newline.
  3. Because it's simple and convenient. On Unix, cat just works and it works without complication. It just copies the bytes of each file, without any need for interpretation. I don't think there's a DOS equivalent to cat. Using copy a+b c will end up merging the last line of file a with the first line of file b.
  4. Because a file (or stream) of zero lines can be distinguished from a file of one empty line.

It's very late here but I just faced one bug in file processing and that came because the files were not ending with empty newline. We were processing text files with sed and sed was omitting the last line from output which was causing invalid json structure and sending rest of the process to fail state.

All we were doing was:

There is one sample file say: foo.txt with some json content inside it.

[{
    someProp: value
},
{
    someProp: value
}] <-- No newline here

The file was created in widows machine and window scripts were processing that file using PowerShell commands. All good.

When we processed same file using sed command sed 's|value|newValue|g' foo.txt > foo.txt.tmp

The newly generated file was

[{
    someProp: value
},
{
    someProp: value

and boom, it failed the rest of the processes because of the invalid JSON.

So it's always a good practice to end your file with empty new line.


This answer is an attempt at a technical answer rather than opinion.

If we want to be POSIX purists, we define a line as:

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

An incomplete line as:

A sequence of one or more non- <newline> characters at the end of the file.

Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195

A text file as:

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397

A string as:

A contiguous sequence of bytes terminated by and including the first null byte.

Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_396

From this then, we can derive that the only time we will potentially encounter any type of issues are if we deal with the concept of a line of a file or a file as a text file (being that a text file is an organization of zero or more lines, and a line we know must terminate with a <newline>).

Case in point: wc -l filename.

From the wc's manual we read:

A line is defined as a string of characters delimited by a <newline> character.

What are the implications to JavaScript, HTML, and CSS files then being that they are text files?

In browsers, modern IDEs, and other front-end applications there are no issues with skipping EOL at EOF. The applications will parse the files properly. It has to since not all Operating Systems conform to the POSIX standard, so it would be impractical for non-OS tools (e.g. browsers) to handle files according to the POSIX standard (or any OS-level standard).

As a result, we can be relatively confident that EOL at EOF will have virtually no negative impact at the application level - regardless if it is running on a UNIX OS.

At this point we can confidently say that skipping EOL at EOF is safe when dealing with JS, HTML, CSS on the client-side. Actually, we can state that minifying any one of these files, containing no <newline> is safe.

We can take this one step further and say that as far as NodeJS is concerned it too cannot adhere to the POSIX standard being that it can run in non-POSIX compliant environments.

What are we left with then? System level tooling.

This means the only issues that may arise are with tools that make an effort to adhere their functionality to the semantics of POSIX (e.g. definition of a line as shown in wc).

Even so, not all shells will automatically adhere to POSIX. Bash for example does not default to POSIX behavior. There is a switch to enable it: POSIXLY_CORRECT.

Food for thought on the value of EOL being <newline>: https://www.rfc-editor.org/old/EOLstory.txt

Staying on the tooling track, for all practical intents and purposes, let's consider this:

Let's work with a file that has no EOL. As of this writing the file in this example is a minified JavaScript with no EOL.

curl http://cdnjs.cloudflare.com/ajax/libs/AniJS/0.5.0/anijs-min.js -o x.js
curl http://cdnjs.cloudflare.com/ajax/libs/AniJS/0.5.0/anijs-min.js -o y.js

$ cat x.js y.js > z.js

-rw-r--r--  1 milanadamovsky   7905 Aug 14 23:17 x.js
-rw-r--r--  1 milanadamovsky   7905 Aug 14 23:17 y.js
-rw-r--r--  1 milanadamovsky  15810 Aug 14 23:18 z.js

Notice the cat file size is exactly the sum of its individual parts. If the concatenation of JavaScript files is a concern for JS files, the more appropriate concern would be to start each JavaScript file with a semi-colon.

As someone else mentioned in this thread: what if you want to cat two files whose output becomes just one line instead of two? In other words, cat does what it's supposed to do.

The man of cat only mentions reading input up to EOF, not <newline>. Note that the -n switch of cat will also print out a non- <newline> terminated line (or incomplete line) as a line - being that the count starts at 1 (according to the man.)

-n Number the output lines, starting at 1.

Now that we understand how POSIX defines a line , this behavior becomes ambiguous, or really, non-compliant.

Understanding a given tool's purpose and compliance will help in determining how critical it is to end files with an EOL. In C, C++, Java (JARs), etc... some standards will dictate a newline for validity - no such standard exists for JS, HTML, CSS.

For example, instead of using wc -l filename one could do awk '{x++}END{ print x}' filename , and rest assured that the task's success is not jeopardized by a file we may want to process that we did not write (e.g. a third party library such as the minified JS we curld) - unless our intent was truly to count lines in the POSIX compliant sense.

Conclusion

There will be very few real life use cases where skipping EOL at EOF for certain text files such as JS, HTML, and CSS will have a negative impact - if at all. If we rely on <newline> being present, we are restricting the reliability of our tooling only to the files that we author and open ourselves up to potential errors introduced by third party files.

Moral of the story: Engineer tooling that does not have the weakness of relying on EOL at EOF.

Feel free to post use cases as they apply to JS, HTML and CSS where we can examine how skipping EOL has an adverse effect.


This originates from the very early days when simple terminals were used. The newline char was used to trigger a 'flush' of the transferred data.

Today, the newline char isn't required anymore. Sure, many apps still have problems if the newline isn't there, but I'd consider that a bug in those apps.

If however you have a text file format where you require the newline, you get simple data verification very cheap: if the file ends with a line that has no newline at the end, you know the file is broken. With only one extra byte for each line, you can detect broken files with high accuracy and almost no CPU time.


I was always under the impression the rule came from the days when parsing a file without an ending newline was difficult. That is, you would end up writing code where an end of line was defined by the EOL character or EOF. It was just simpler to assume a line ended with EOL.

However I believe the rule is derived from C compilers requiring the newline. And as pointed out on “No newline at end of file” compiler warning, #include will not add a newline.


Some tools expect this. For example, wc expects this:

$ echo -n "Line not ending in a new line" | wc -l
0
$ echo "Line ending with a new line" | wc -l
1

Why should (text) files end with a newline?

As well expressed by many, because:

  1. Many programs do not behave well, or fail without it.

  2. Even programs that well handle a file lack an ending '\n', the tool's functionality may not meet the user's expectations - which can be unclear in this corner case.

  3. Programs rarely disallow final '\n' (I do not know of any).


Yet this begs the next question:

What should code do about text files without a newline?

  1. Most important - Do not write code that assumes a text file ends with a newline. Assuming a file conforms to a format leads to data corruption, hacker attacks and crashes. Example:

    // Bad code
    while (fgets(buf, sizeof buf, instream)) {
      // What happens if there is no \n, buf[] is truncated leading to who knows what
      buf[strlen(buf) - 1] = '\0';  // attempt to rid trailing \n
      ...
    }
    
  2. If the final trailing '\n' is needed, alert the user to its absence and the action taken. IOWs, validate the file's format. Note: This may include a limit to the maximum line length, character encoding, etc.

  3. Define clearly, document, the code's handling of a missing final '\n'.

  4. Do not, as possible, generate a file the lacks the ending '\n'.


Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to text-files

How to edit a text file in my terminal Reading specific columns from a text file in python Read a local text file using Javascript Split text file into smaller multiple text file using command line How do I read a text file of about 2 GB? Java - How Can I Write My ArrayList to a file, and Read (load) that file to the original ArrayList? Copying from one text file to another using Python Getting all file names from a folder using C# Reading From A Text File - Batch How to import data from text file to mysql database

Examples related to newline

How can I insert a line break into a <Text> component in React Native? Print "\n" or newline characters as part of the output on terminal Using tr to replace newline with space How to write one new line in Bitbucket markdown? Line break in SSRS expression How to insert a new line in Linux shell script? Replace CRLF using powershell How to write new line character to a file in Java What is the newline character in the C language: \r or \n? How to print values separated by spaces instead of new lines in Python 2.7