[php] How do I remove  from the beginning of a file?

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: 

PHP removes all whitespace, so a random  in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.

I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.

If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.

This question is related to php utf-8 character-encoding byte-order-mark mojibake

The answer is


I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.


You can open it by PhpStorm and right-click on your file and click on Remove BOM...


Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.


Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.


For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is

Code:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

and if you are comfortable with the vi editor, open the file in vi:

vi /path-to-file-name/file.php

And enter the command to remove the BOM:

set nobomb

Save the file:

wq

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'


BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.

From Removing BOM from UTF-8:

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

I am sure it translates to PHP easily.


For me, this worked:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

If I remove this meta, the  appears again. Hope this helps someone...


In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.


  1. Copy the text of your filename.css file.
  2. Close your css file.
  3. Rename it filename2.css to avoid a filename clash.
  4. In MS Notepad or Wordpad, create a new file.
  5. Paste the text into it.
  6. Save it as filename.css, selecting UTF-8 from the encoding options.
  7. Upload filename.css.

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = ""
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = ""
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.


I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.

Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.


Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.


In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.


Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

I had the same problem with the BOM appearing in some of my PHP files ().

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.


You can use

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Replacing with awk seems to work, but it is not in place.


If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).


In PHP, you can do the following to remove all non characters including the character in question.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.


Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

  • Open these files in some proper editor (that recognizes BOM) like Eclipse.

  • Change the file's encoding to ISO (right click, properties).

  • Cut  from the beginning of the file, save

  • Change the file's encoding back to UTF-8

...and do not even think about using n...d again!


This works for me!

def removeBOMs(fileName):
     BOMs = ['',#Bytes as CP1252 characters
    'þÿ',
    'ÿþ',
    '^@^@þÿ',
    'ÿþ^@^@',
    '+/v',
    '÷dL',
    'Ýsfs',
    'Ýsfs',
    '^Nþÿ',
    'ûî(',
    '„1•3']
     inputFile = open(fileName, 'r')
     contents = inputFile.read()
     for BOM in BOMs:
         if not BOM in contents:#no BOM in the file...
             pass
         else:
             newContents = contents.replace(BOM,'', 1)
             newFile = open(fileName, 'w')
             newFile.write(newContents)
             return None

Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters

Examples related to byte-order-mark

Set Encoding of File to UTF8 With BOM in Sublime Text 3 Convert UTF-8 with BOM to UTF-8 with no BOM in Python Using PowerShell to write a file in UTF-8 without the BOM How to detect the character encoding of a text file? How can I output a UTF-8 CSV in PHP that Excel will read properly? How do I remove  from the beginning of a file? What's the difference between UTF-8 and UTF-8 without BOM? Write to UTF-8 file in Python

Examples related to mojibake

How to convert these strange characters? (ë, Ã, ì, ù, Ã) How do I remove  from the beginning of a file? "’" showing on page instead of " ' " How to replace � in a string