PHP Implementation of UTF 8 File BOM Automatic Detection and Removal Example

  • 2021-07-24 10:37:15
  • OfStack

This paper describes the method of PHP to realize the automatic detection and removal of UTF-8 file BOM. Share it for your reference. The specific implementation method is as follows:

The BOM message is a string of hidden characters at the beginning of the file, which is used by some editors to recognize that this is an UTF-8 encoded file. However, PHP will read these characters when reading the file, which leads to the problem that there are 1 unrecognized characters at the beginning of the file.

For example, the PHP file for generating pictures saved in UTF-8 format, because the hidden BOM information in the file header is also distributed, the generated picture data is incorrect and the browser cannot recognize it.

To detect whether an UTF-8 file contains BOM information, it is to detect whether the three characters at the beginning of the file are 0xEF, 0xBB and 0xBF. The following small program, the user traverses a directory of all files, and check whether added BOM.

<?php
// This file is used for quick testing UTF8 Is the encoded file added BOM And can be automatically removed
//By Bob Shen $basedir="."; // Modify the directory to be detected for this behavior, and the point indicates the current directory
$auto=1; // Automatically remove the discovered BOM Information. 1 So, 0 No. // With Do not change under if ($dh = opendir($basedir)) {
while (($file = readdir($dh)) !== false) {
if ($file!='.' && $file!='..' && !is_dir($basedir."/".$file)) echo "filename: $file ".checkBOM("$basedir/$file")." <br>";
}
closedir($dh);
} function checkBOM ($filename) {
global $auto;
$contents=file_get_contents($filename);
$charset[1]=substr($contents, 0, 1);
$charset[2]=substr($contents, 1, 1);
$charset[3]=substr($contents, 2, 1);
if (ord($charset[1])==239 && ord($charset[2])==187 && ord($charset[3])==191) {
if ($auto==1) {
$rest=substr($contents, 3);
rewrite ($filename, $rest);
return ("<font color=red>BOM found, automatically removed.</font>");
} else {
return ("<font color=red>BOM found.</font>");
}
}
else return ("BOM Not Found.");
} function rewrite ($filename, $data) {
$filenum=fopen($filename,"w");
flock($filenum,LOCK_EX);
fwrite($filenum,$data);
fclose($filenum);
}

Save the above code as del_bom. php, modify the directory to be detected and run it. This may help to detect which file has BOM information that causes all pages to start with 1 blank.

Save the following code as bom. php Remember to save it as utf8 format

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<h3><?echo $_POST["dir"];?> Test results under the directory </h3>
<?php
// This file is used for quick testing UTF8 The encoded file is It's not added BOM And can be automatically removed
//By bob Boss
// Revision of Wind Singing
$ Directory = str_replace(" ", "|", $_POST["dir"]);// Accept submitted path data
$basedir="$ Directory "; // Modify the directory to be detected for this behavior, and the point indicates the current directory
$auto=1; // Automatically remove the discovered BOM Information. 1 So, 0 No.
// Do not change the following
if ($dh = opendir($basedir)) {
while (($file = readdir($dh)) !== false) {
if ($file!='.' && $file!='..' && !is_dir($basedir."/".$file)) echo "filename: $file ".checkBOM("$basedir/$file")." <br>";
}
closedir($dh);
}
function checkBOM ($filename) {
global $auto;
$contents=file_get_contents($filename);
$charset[1]=substr($contents, 0, 1);
$charset[2]=substr($contents, 1, 1);
$charset[3]=substr($contents, 2, 1);
if (ord($charset[1])==239 && ord($charset[2])==187 && ord($charset[3])==191) {
if ($auto==1) {
$rest=substr($contents, 3);
rewrite ($filename, $rest);
return ("<font color=red>--Bom It has been cleared. </font>");
} else {
return ("<font color=red>--Bom found.</font>");
}
}
else return ("-- Not detected Bom.");
}
function rewrite ($filename, $data) {
$filenum=fopen($filename,"w");
flock($filenum,LOCK_EX);
fwrite($filenum,$data);
fclose($filenum);
}
?> <form action="" method="POST">
Directory : <input type="text" name="dir" />
<input type="submit" value=" Detection directory " >
</form>
Please Enter a folder name such as plugin/fanfou You don't need to add it later / . If you want to detect root input " . "   . Is a decimal point Submit either
</br>

I hope this article is helpful to everyone's PHP programming.


Related articles: