Use PHP to export principles and examples of Word documents

  • 2020-10-07 18:36:08
  • OfStack

The principle of

Generally, there are two ways to export an doc document, one is to use com and install it on the server as an extension of php, then create an com and call its methods. A server that has installed office can call an com called word.application to generate word documentation, but I don't recommend this because of the low execution efficiency (I tested 1 and the server actually opened an word client during code execution). Ideally, com would have no interface and do data conversion in the background, which would be nice, but these extensions usually cost money.
The second method is to use PHP to directly write the contents of our doc document into a file with the suffix doc. Using this approach requires no dependency on third party extensions and is efficient in execution.
word itself is quite powerful. It can open files in html format and retain the format, even if the suffix is doc. This provides us with convenience. However, there is one problem. Images in html files only have one address, and the real images are saved elsewhere. That is, if you write HTML to doc, then doc will not contain images. So how do we create an doc document that contains images? We can use the mht format, which is very similar to html.
The mht format is similar to html, except that in mht, files that are linked externally, such as images, Javascript, and CSS, are encoded by base64. Thus, a single mht file can hold all the resources in a web page, and of course, it will be larger than html.
Can mht format be recognized by word? I will save a web page as mht, then change the suffix doc, then open it with word, OK, word can also recognize mht file, and can display pictures.
Ok, now that doc recognizes mht, it's time to think about how to put images into mht. Since the address of the image in the html code is written in the img tag's src attribute, the image address can be obtained by extracting the src attribute value from the html code. Of course, it's possible that you're getting a relative path, but that's fine, just prefix it with URL and change it to absolute. With the image address, we can get the specific content of the image file through the file_get_content function, then call the base64_encode function to encode the file content into base64 code, and finally insert it into the appropriate location of mht file.
Finally, there are two ways to send the file to the client. One is to generate an doc document on the server, then record the address of the doc document, and finally send the file to the client via header("location: ES69en.doc "). You can ask the client to download the doc. Another option is to directly send the html request, modify the header part of HTML protocol, set its ES75en-ES76en as application/doc, and set content-ES80en as attachment, followed by the filename. After sending the html protocol, send the file content directly to the client, or ask the client to download the doc document.

implementation

Through the introduction of the above principles, I believe you should have a preliminary understanding of the implementation process. Below I give an export function, which can export the HTML code into an mht document with three parameters, the last two of which are optional parameters
content: The HTML code to convert
absolutePath: If the image addresses in the HTML code are relative paths, then this parameter is the absolute path that is missing in the HTML code.
isEraseLink: Whether to remove hyperlinks from the HTML code
Return the contents of a file with the value mht, which you can save via file_put_content as a file with the suffix doc
The main function of this function is to analyze all the image addresses in the HTML code and download them in turn. Once the content of the image is obtained, the MhtFileMaker class is called to add the image to the mht file. The details of the addition are encapsulated in the MhtFileMaker class.


/**
 *  According to the HTML The code for word The document content 
 *  create 1 An essential for mht This function analyzes the file content and downloads the image resources from the page remotely 
 *  This function depends on the class MhtFileMaker
 *  This function will analyze img Label, extract src Attribute value of. However, src Attribute values must be enclosed in quotes, otherwise they cannot be extracted 
 * 
 * @param string $content HTML content 
 * @param string $absolutePath  The absolute path of the web page. if HTML The path of the image in the content is relative, so you need to fill in this parameter for the function to automatically fill in the absolute path. This parameter is required in the end / The end of the 
 * @param bool $isEraseLink  Whether to remove the HTML Links in content 
* by www.ofstack.com
 */
function getWordDocument( $content , $absolutePath = "" , $isEraseLink = true )
{
    $mht = new MhtFileMaker();
    if ($isEraseLink)
        $content = preg_replace('/<a\s*.*?\s*>(\s*.*?\s*)<\/a>/i' , '$1' , $content);   // Remove the link 
    $images = array();
    $files = array();
    $matches = array();
    // This algorithm requires that src The value of the following attribute must be enclosed in quotes 
    if ( preg_match_all('/<img[.\n]*?src\s*?=\s*?[\"\'](.*?)[\"\'](.*?)\/>/i',$content ,$matches ) )
    {
        $arrPath = $matches[1];
        for ( $i=0;$i<count($arrPath);$i++)
        {
            $path = $arrPath[$i];
            $imgPath = trim( $path );
            if ( $imgPath != "" )
            {
                $files[] = $imgPath;
                if( substr($imgPath,0,7) == 'http://')
                {
                    // Absolute links, no prefixes 
                }
                else
                {
                    $imgPath = $absolutePath.$imgPath;
                }
                $images[] = $imgPath;
            }
        }
    }
    $mht->AddContents("tmp.html",$mht->GetMimeType("tmp.html"),$content);

    for ( $i=0;$i<count($images);$i++)
    {
        $image = $images[$i];
        if ( @fopen($image , 'r') )
        {
            $imgcontent = @file_get_contents( $image );
            if ( $content )
                $mht->AddContents($files[$i],$mht->GetMimeType($image),$imgcontent);
        }
        else
        {
            echo "file:".$image." not exist!<br />";
        }
    }

    return $mht->GetFile();
}


Usage:


$fileContent = getWordDocument($content,"https://www.ofstack.com/Music/etc/");
$fp = fopen("test.doc", 'w');
fwrite($fp, $fileContent);
fclose($fp);

The $content variable should be the HTML source code, followed by a link to the URL address that fills in the relative path of the image in the HTML code
Note that before you can use this function, you need to include the class MhtFileMaker, which helps us generate Mht documentation.


<?php
/***********************************************************************
Class:        Mht File Maker
Version:      1.2 beta
Author:       Wudi <wudicgi@yahoo.de>
Description:  The class can make .mht file.
***********************************************************************/
class MhtFileMaker{
    var $config = array();
    var $headers = array();
    var $headers_exists = array();
    var $files = array();
    var $boundary;
    var $dir_base;
    var $page_first;
    function MhtFile($config = array()){
    }
    function SetHeader($header){
        $this->headers[] = $header;
        $key = strtolower(substr($header, 0, strpos($header, ':')));
        $this->headers_exists[$key] = TRUE;
    }
    function SetFrom($from){
        $this->SetHeader("From: $from");
    }
    function SetSubject($subject){
        $this->SetHeader("Subject: $subject");
    }
    function SetDate($date = NULL, $istimestamp = FALSE){
        if ($date == NULL) {
            $date = time();
        }
        if ($istimestamp == TRUE) {
            $date = date('D, d M Y H:i:s O', $date);
        }
        $this->SetHeader("Date: $date");
    }
    function SetBoundary($boundary = NULL){
        if ($boundary == NULL) {
            $this->boundary = '--' . strtoupper(md5(mt_rand())) . '_MULTIPART_MIXED';
        } else {
            $this->boundary = $boundary;
        }
    }
    function SetBaseDir($dir){
        $this->dir_base = str_replace("\\", "/", realpath($dir));
    }
    function SetFirstPage($filename){
        $this->page_first = str_replace("\\", "/", realpath("{$this->dir_base}/$filename"));
    }
    function AutoAddFiles(){
        if (!isset($this->page_first)) {
            exit ('Not set the first page.');
        }
        $filepath = str_replace($this->dir_base, '', $this->page_first);
        $filepath = 'http://mhtfile' . $filepath;
        $this->AddFile($this->page_first, $filepath, NULL);
        $this->AddDir($this->dir_base);
    }
    function AddDir($dir){
        $handle_dir = opendir($dir);
        while ($filename = readdir($handle_dir)) {
            if (($filename!='.') && ($filename!='..') && ("$dir/$filename"!=$this->page_first)) {
                if (is_dir("$dir/$filename")) {
                    $this->AddDir("$dir/$filename");
                } elseif (is_file("$dir/$filename")) {
                    $filepath = str_replace($this->dir_base, '', "$dir/$filename");
                    $filepath = 'http://mhtfile' . $filepath;
                    $this->AddFile("$dir/$filename", $filepath, NULL);
                }
            }
        }
        closedir($handle_dir);
    }
    function AddFile($filename, $filepath = NULL, $encoding = NULL){
        if ($filepath == NULL) {
            $filepath = $filename;
        }
        $mimetype = $this->GetMimeType($filename);
        $filecont = file_get_contents($filename);
        $this->AddContents($filepath, $mimetype, $filecont, $encoding);
    }
    function AddContents($filepath, $mimetype, $filecont, $encoding = NULL){
        if ($encoding == NULL) {
            $filecont = chunk_split(base64_encode($filecont), 76);
            $encoding = 'base64';
        }
        $this->files[] = array('filepath' => $filepath,
                               'mimetype' => $mimetype,
                               'filecont' => $filecont,
                               'encoding' => $encoding);
    }
    function CheckHeaders(){
        if (!array_key_exists('date', $this->headers_exists)) {
            $this->SetDate(NULL, TRUE);
        }
        if ($this->boundary == NULL) {
            $this->SetBoundary();
        }
    }
    function CheckFiles(){
        if (count($this->files) == 0) {
            return FALSE;
        } else {
            return TRUE;
        }
    }
    function GetFile(){
        $this->CheckHeaders();
        if (!$this->CheckFiles()) {
            exit ('No file was added.');
        } //www.ofstack.com
        $contents = implode("\r\n", $this->headers);
        $contents .= "\r\n";
        $contents .= "MIME-Version: 1.0\r\n";
        $contents .= "Content-Type: multipart/related;\r\n";
        $contents .= "\tboundary=\"{$this->boundary}\";\r\n";
        $contents .= "\ttype=\"" . $this->files[0]['mimetype'] . "\"\r\n";
        $contents .= "X-MimeOLE: Produced By Mht File Maker v1.0 beta\r\n";
        $contents .= "\r\n";
        $contents .= "This is a multi-part message in MIME format.\r\n";
        $contents .= "\r\n";
        foreach ($this->files as $file) {
            $contents .= "--{$this->boundary}\r\n";
            $contents .= "Content-Type: $file[mimetype]\r\n";
            $contents .= "Content-Transfer-Encoding: $file[encoding]\r\n";
            $contents .= "Content-Location: $file[filepath]\r\n";
            $contents .= "\r\n";
            $contents .= $file['filecont'];
            $contents .= "\r\n";
        }
        $contents .= "--{$this->boundary}--\r\n";
        return $contents;
    }
    function MakeFile($filename){
        $contents = $this->GetFile();
        $fp = fopen($filename, 'w');
        fwrite($fp, $contents);
        fclose($fp);
    }
    function GetMimeType($filename){
        $pathinfo = pathinfo($filename);
        switch ($pathinfo['extension']) {
            case 'htm': $mimetype = 'text/html'; break;
            case 'html': $mimetype = 'text/html'; break;
            case 'txt': $mimetype = 'text/plain'; break;
            case 'cgi': $mimetype = 'text/plain'; break;
            case 'php': $mimetype = 'text/plain'; break;
            case 'css': $mimetype = 'text/css'; break;
            case 'jpg': $mimetype = 'image/jpeg'; break;
            case 'jpeg': $mimetype = 'image/jpeg'; break;
            case 'jpe': $mimetype = 'image/jpeg'; break;
            case 'gif': $mimetype = 'image/gif'; break;
            case 'png': $mimetype = 'image/png'; break;
            default: $mimetype = 'application/octet-stream'; break;
        }
        return $mimetype;
    }
}
?>

The above discussed the implementation of PHP exporting doc format through mht file. This method can solve a problem, is to make the export doc file contains pictures, of course, if you want to include more content, such as CSS stylesheets, only need to use the regular expression analysis HTML link tag in your code, extract css style file address, then reads and coding into base64, finally joined the mht file is ok.


Related articles: