PHP to read and write XML DOM implementation code

  • 2020-03-31 21:31:37
  • OfStack

Reading and writing extensible markup language (XML) in PHP might seem a bit scary. In fact, XML and all its associated technologies can be scary, but reading and writing XML in PHP isn't necessarily a scary task. First, you need to learn a little about XML -- what it is and what to do with it. Then you need to learn how to read and write XML in PHP, and there are many ways to do this.
This article provides a brief introduction to XML and then explains how to read and write XML in PHP.
What is XML?
XML is a data storage format. It does not define what data to hold, nor does it define the format of the data. XML simply defines tags and their attributes. A well-formed XML tag looks like this:
< Name> Jack Herrington< / name>
This < Name> The tag contains some text: Jack Herrington.
An XML tag without text looks like this:
< PowerUp / >
There is more than one way to write something in XML. For example, this tag forms the same output as the previous tag:
< PowerUp> < / powerUp>
You can also add attributes to XML tags. For example, this < Name> Tags contain the first and last attributes:
< Name the first = "Jack" last = "Herrington" / >
Special characters can also be encoded in XML. For example, the ampersand symbol can be coded like this:
&
XML files containing tags and attributes are well-formed if formatted as in the example, which means that the tags are symmetric and the characters are encoded correctly. Listing 1 is an example of well-formed XML.

Listing 1. Sample XML book list
 
<books> 
<book> 
<author>Jack Herrington</author> 
<title>PHP Hacks</title> 
<publisher>O'Reilly</publisher> 
</book> 
<book> 
<author>Jack Herrington</author> 
<title>Podcasting Hacks</title> 
<publisher>O'Reilly</publisher> 
</book> 
</books> 

The XML in listing 1 contains a list of books. The parent tag < Books> Contains a set of < Book> Marks, each < Book> The tag in turn contains < Author> , < Title> And < Publisher> The tag.
An XML document is correct when its markup structure and content are validated by an external schema file. Pattern files can be specified in different formats. For this article, all you need is well-formed XML.
If XML looks a lot like hypertext markup language (HTML), you're right. XML and HTML are both marker-based languages, and they have many similarities. However, it is important to note that while XML documents may be well-formed HTML, not all HTML documents are well-formed XML. The newline markup (br) is a good example of the difference between XML and HTML. This newline tag is well-formed HTML, but not well-formed XML:
< P> This is a paragraph< Br>
With a line break< / p>
The newline markup is well-formed XML and HTML:
< P> This is a paragraph< Br / >
With a line break< / p>
To write HTML as well-formed XML, follow the W3C committee's extensible hypertext markup language (XHTML) standard. All modern browsers render XHTML. Also, you can use XML tools to read XHTML and find the data in the document, which is much easier than parsing HTML.
The XML is read using the DOM library
The easiest way to read well-formed XML files is to use the document object model (DOM) library compiled into some PHP installation. The DOM library reads the entire XML document into memory and represents it with a tree of nodes, as shown in figure 1.
Figure 1. XML DOM tree for book XML
< img border = 0 SRC = "http://files.jb51.net/upload/201102/20110203151600791.gif" >
The books node at the top of the tree has two book child tags. Within each book, there are several nodes: author, publisher, and title. The author, publisher, and title nodes have text children, respectively.
The code to read the book XML file and display the content in the DOM is shown in listing 2.
Listing 2. Reading the book XML with the DOM
 
<?php 
$doc = new DOMDocument(); 
$doc->load( 'books.xml' ); 
$books = $doc->getElementsByTagName( "book" ); 
foreach( $books as $book ) 
{ 
$authors = $book->getElementsByTagName( "author" ); 
$author = $authors->item(0)->nodeValue; 
$publishers = $book->getElementsByTagName( "publisher" ); 
$publisher = $publishers->item(0)->nodeValue; 
$titles = $book->getElementsByTagName( "title" ); 
$title = $titles->item(0)->nodeValue; 
echo "$title - $author - $publishern"; 
} 
?> 

The script first creates a new DOMdocument object and loads the book XML into it using the load method. The script then USES the getElementsByName method to get a list of all the elements under the specified name.
In the loop of the book node, the script USES the getElementsByName method to get the nodeValue of the author, publisher, and title tags. NodeValue is the text in the node. The script then displays the values.
You can run a PHP script on the command line like this:
% e1 PHP. PHP
PHP Hacks - Jack herrington-o 'reilly
Podcasting Hacks - Jack herrington-o 'reilly
%
As you can see, each book block outputs one line. This is a good start. But what if you can't access the XML DOM library?
Read the XML with a SAX parser
Another way to read XML is to use the XML Simple API (SAX) parser. Most installations of PHP include a SAX parser. The SAX parser runs on the callback model. Each time a tag is turned on or off, or each time the parser sees text, the user-defined function is called back with information about the node or the text.
The nice thing about the SAX parser is that it's really lightweight. The parser does not hold content in memory for long, so it can be used with very large files. The downside is that writing SAX parser callbacks is a hassle. Listing 3 shows the code that USES SAX to read the book XML file and display the contents.
Listing 3. Reading the book XML with a SAX parser
 
<?php 
$g_books = array(); 
$g_elem = null; 
function startElement( $parser, $name, $attrs ) 
{ 
global $g_books, $g_elem; 
if ( $name == 'BOOK' ) $g_books []= array(); 
$g_elem = $name; 
} 
function endElement( $parser, $name ) 
{ 
global $g_elem; 
$g_elem = null; 
} 
function textData( $parser, $text ) 
{ 
global $g_books, $g_elem; 
if ( $g_elem == 'AUTHOR' || 
$g_elem == 'PUBLISHER' || 
$g_elem == 'TITLE' ) 
{ 
$g_books[ count( $g_books ) - 1 ][ $g_elem ] = $text; 
} 
} 
$parser = xml_parser_create(); 
xml_set_element_handler( $parser, "startElement", "endElement" ); 
xml_set_character_data_handler( $parser, "textData" ); 
$f = fopen( 'books.xml', 'r' ); 
while( $data = fread( $f, 4096 ) ) 
{ 
xml_parse( $parser, $data ); 
} 
xml_parser_free( $parser ); 
foreach( $g_books as $book ) 
{ 
echo $book['TITLE']." - ".$book['AUTHOR']." - "; 
echo $book['PUBLISHER']."n"; 
} 
?> 

The script first sets up the g_books array, which holds all books and book information in memory, and the g_elem variable holds the name of the tag that the script is currently working on. The script then defines the callback function. In this example, the callback functions are startElement, endElement, and textData. The startElement and endElement functions are called when the tag is opened and closed, respectively. Above the text between the start and end tags, call textData.
In this example, the startElement tag looks for the book tag and starts a new element in the book array. The textData function then looks at the current element to see if it is a publisher, title, or author tag. If so, the function puts the current text into the current book.
To keep the parsing going, the script creates the parser with the xml_parser_create function. Then, set the callback handle. After that, the script reads the file and sends chunks of the file to the parser. After the file is read, the xml_parser_free function deletes the parser. The end of the script prints the contents of the g_books array.
As you can see, this is much more difficult than writing the same functionality in the DOM. What if you don't have a DOM library or a SAX library? Is there an alternative?
--------------------------------------------------------------------------------
Back to the first page
XML is parsed with regular expressions
Sure, some engineers will criticize me for even mentioning this approach, but you can parse XML with regular expressions. Listing 4 shows an example of reading a book file using the preg_ function.
Listing 4. Reading the XML with a regular expression
 
<?php 
$xml = ""; 
$f = fopen( 'books.xml', 'r' ); 
while( $data = fread( $f, 4096 ) ) { $xml .= $data; } 
fclose( $f ); 
preg_match_all( "/<book>(.*?)</book>/s", 
$xml, $bookblocks ); 
foreach( $bookblocks[1] as $block ) 
{ 
preg_match_all( "/<author>(.*?)</author>/", 
$block, $author ); 
preg_match_all( "/<title>(.*?)</title>/", 
$block, $title ); 
preg_match_all( "/<publisher>(.*?)</publisher>/", 
$block, $publisher ); 
echo( $title[1][0]." - ".$author[1][0]." - ". 
$publisher[1][0]."n" ); 
} 
?> 


Notice how short this code is. At first, it reads the file into a large string. Each book item is then read with a regex function. Finally, loop through each book block with a foreach loop and extract author, title, and publisher.
So what are the drawbacks? The problem with using regular expression code to read XML is that it doesn't check first to make sure the XML is well-formed. This means that you don't know if the XML is well-formed until you read it. Also, some well-formed XML may not match regular expressions, so you will have to modify them later.
I never recommend using regular expressions to read XML, but sometimes it's the best way to be compatible because regular expression functions are always available. Do not use regular expressions to read XML directly from the user because you have no control over the format or structure of such XML. You should always use the DOM library or SAX parser to read the XML from the user.
--------------------------------------------------------------------------------
Back to the first page
Write XML in the DOM
Reading the XML is only part of the formula. How do you write XML? The best way to write XML is with DOM. Listing 5 shows how the DOM builds the book XML file.
Listing 5. Writing the book XML in the DOM
 
<?php 
$books = array(); 
$books [] = array( 
'title' => 'PHP Hacks', 
'author' => 'Jack Herrington', 
'publisher' => "O'Reilly" 
); 
$books [] = array( 
'title' => 'Podcasting Hacks', 
'author' => 'Jack Herrington', 
'publisher' => "O'Reilly" 
); 
$doc = new DOMDocument(); 
$doc->formatOutput = true; 
$r = $doc->createElement( "books" ); 
$doc->appendChild( $r ); 
foreach( $books as $book ) 
{ 
$b = $doc->createElement( "book" ); 
$author = $doc->createElement( "author" ); 
$author->appendChild( 
$doc->createTextNode( $book['author'] ) 
); 
$b->appendChild( $author ); 
$title = $doc->createElement( "title" ); 
$title->appendChild( 
$doc->createTextNode( $book['title'] ) 
); 
$b->appendChild( $title ); 
$publisher = $doc->createElement( "publisher" ); 
$publisher->appendChild( 
$doc->createTextNode( $book['publisher'] ) 
); 
$b->appendChild( $publisher ); 
$r->appendChild( $b ); 
} 
echo $doc->saveXML(); 
?> 


At the top of the script, the books array is loaded with some sample books. This data can come from either the user or the database.
After the sample book is loaded, the script creates a new DOMDocument and adds the root node books to it. The script then creates nodes for each book's author, title, and publisher, and adds text nodes for each node. The final step for each book node is to add it back to the root node books.
The end of the script outputs the XML to the console using the saveXML method. (you can also create an XML file using the save method.) The output of the script is shown in listing 6.
Listing 6. Output from the DOM build script
 
php e4.php 
<?xml version="1.0"?> 
<books> 
<book> 
<author>Jack Herrington</author> 
<title>PHP Hacks</title> 
<publisher>O'Reilly</publisher> 
</book> 
<book> 
<author>Jack Herrington</author> 
<title>Podcasting Hacks</title> 
<publisher>O'Reilly</publisher> 
</book> 
</books> 

The real value of using the DOM is that the XML it creates is always well-formed. But what if you can't create XML with the DOM?
--------------------------------------------------------------------------------
Back to the first page
Write XML in PHP
If the DOM is not available, you can write XML using PHP's text template. Listing 7 shows how PHP builds the book XML file.
Listing 7. Writing book XML in PHP
 
<?php 
$books = array(); 
$books [] = array( 
'title' => 'PHP Hacks', 
'author' => 'Jack Herrington', 
'publisher' => "O'Reilly" 
); 
$books [] = array( 
'title' => 'Podcasting Hacks', 
'author' => 'Jack Herrington', 
'publisher' => "O'Reilly" 
); 
?> 
<books> 
<?php 
foreach( $books as $book ) 
{ 
?> 
<book> 
<title><?php echo( $book['title'] ); ?></title> 
<author><?php echo( $book['author'] ); ?> 
</author> 
<publisher><?php echo( $book['publisher'] ); ?> 
</publisher> 
</book> 
<?php 
} 
?> 
</books> 


The top of the script is similar to a DOM script. Open the books tag at the bottom of the script, and iterate through each book, creating the book tag and all the internal title, author, and publisher tags.
The problem with this approach is coding entities. To ensure that the entities are properly encoded, you must call the htmlentities function on each item, as shown in listing 8.
Listing 8. Encoding entities using the htmlentities function
 
<books> 
<?php 
foreach( $books as $book ) 
{ 
$title = htmlentities( $book['title'], ENT_QUOTES ); 
$author = htmlentities( $book['author'], ENT_QUOTES ); 
$publisher = htmlentities( $book['publisher'], ENT_QUOTES ); 
?> 
<book> 
<title><?php echo( $title ); ?></title> 
<author><?php echo( $author ); ?> </author> 
<publisher><?php echo( $publisher ); ?> 
</publisher> 
</book> 
<?php 
} 
?> 
</books> 


That's what's so annoying about writing XML in basic PHP. You think you've created perfect XML, but when you try to work with the data, you immediately notice that some elements are incorrectly encoded.
--------------------------------------------------------------------------------
conclusion
There is always a lot of hype and confusion around XML. However, it's not as hard as you might think -- especially in a great language like PHP. Once you understand and implement XML correctly, you will find many powerful tools to use. XPath and XSLT are two such tools worth investigating.

Related articles: