Detailed explanation of the four methods of generating and parsing XML documents in java (introduction + comparison of advantages and disadvantages + examples)

  • 2020-05-19 04:48:30
  • OfStack

As we all know, there are more and more ways to parse XML, but there are only four mainstream methods: DOM, SAX, JDOM, and DOM4J

The jar package download addresses for these four methods are given first

DOM: it comes with the current Java JDK, in the xml-apis.jar bag

SAX: http: / / sourceforge net/projects/sax /

JDOM: http: / / jdom org/downloads/index html

DOM4J: http: / / sourceforge net/projects/dom4j /

1. Introduction and analysis of advantages and disadvantages

1. DOM (Document Object Model)

DOM is the official W3C standard for representing XML documents in a platform - and language-independent manner. DOM is a collection of nodes or pieces of information organized in a hierarchy. This hierarchy allows developers to look for specific information in the tree. Analyzing this structure usually requires loading the entire document and constructing the hierarchy before you can do anything. Because it is based on the information hierarchy, DOM is considered either tree-based or object-based.

"Advantages"

Allows applications to make changes to data and structure.

The access is bidirectional, can at any time in the tree up and down navigation, get and operate any part of the data.

"Defect"

Usually the entire XML document needs to be loaded to construct the hierarchy, which consumes a lot of resources.

2. SAX (Simple API for XML)

The advantages of SAX processing are very similar to those of streaming media. Analysis can begin immediately, rather than waiting for all the data to be processed. Also, because the application only checks the data as it reads, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application doesn't even have to parse the entire document; It can stop parsing when a condition is met. In general, SAX is also much faster than its replacement, DOM.

DOM or SAX? Choosing the DOM or SAX parsing model is a very important design decision for developers who need to write their own code to process XML documents. DOM USES a tree-like structure to access XML documents, while SAX USES an event model.

The DOM parser converts the XML document into a tree containing its content and can traverse the tree. The advantage of parsing the model with DOM is that it is easy to program, and the developer only needs to call the instructions to build the tree, and then use navigation APIs to access the required tree nodes to complete the task. You can easily add and modify elements in the tree. However, since the DOM parser needs to process the entire XML document, performance and memory requirements are high, especially when large XML files are encountered. Because of its traversal capabilities, the DOM parser is often used in services where XML documents need to be changed frequently.

The SAX parser adopts an event-based model, which can trigger 1 series of events when parsing XML documents. When a given tag is found, it can activate a callback method to tell that the label specified by the method has been found. SAX is generally less memory-intensive because it leaves it up to the developer to decide what tag to work with. SAX's extended capabilities are especially useful when the developer only needs to work with a portion of the data contained in the document. However, encoding can be difficult with the SAX parser, and it is difficult to access multiple data points in the same document at the same time.

"Advantages"

Without waiting for all the data to be processed, the analysis can begin immediately.

Only when reading the data to check the data, do not need to be saved in memory.

You can stop parsing when a condition is met without having to parse the entire document.

Higher efficiency and performance, can parse greater than the system memory of the document.

"Defect"

The application should be responsible for the processing logic of TAG (such as maintaining the parent/child relationship, etc.). The more complex the document, the more complex the program.

(2) one-way navigation, unable to locate the document level, it is difficult to access different parts of the same 1 document data at the same time, does not support XPath.

3. JDOM(Java-based Document Object Model)

The purpose of JDOM is to be an Java document-specific model that simplifies interaction with XML and is faster to implement than with DOM. Since it is the first Java specific model, JDOM1 has been greatly promoted and promoted. An JSR-102 request through the Java specification is being considered for its eventual use as an "Java standard extension". JDOM has been in development since early 2000.

There are two main differences between JDOM and DOM. First, JDOM USES only concrete classes, not interfaces. This simplifies API in some ways, but also limits flexibility. Second, API makes extensive use of Collections classes, simplifying the use of Java developers who are already familiar with these classes.

The JDOM documentation states that its purpose is "to solve 80% (or more) of the Java/XML problem with 20% (or less) of effort" (assuming 20% based on the learning curve). JDOM is certainly useful for most Java/XML applications, and most developers find API much easier to understand than DOM. JDOM also includes a fairly extensive review of program behavior to prevent users from doing anything meaningless in XML. However, it still requires that you fully understand XML in order to do something beyond the basics (or even understand the errors in some cases). This is probably more meaningful work than learning the DOM or JDOM interfaces.

JDOM itself does not contain parsers. It typically USES the SAX2 parser to parse and validate the input XML document (although it can also take the previously constructed DOM representation as input). It contains a number of converters to output the JDOM representation as an SAX2 event stream, DOM model, or XML text document. JDOM is an open source release under the Apache license variant.

"Advantages"

Using concrete classes instead of interfaces simplifies API of DOM.

A large number of Java collection class, convenient for Java developers.

"Defect"

There is no good flexibility.

(2) poor performance.

4. DOM4J(Document Object Model for Java)

Although DOM4J represents a completely independent development result, initially it was an intelligent branch of JDOM. It incorporates many features beyond the basic XML document representation, including integrated XPath support, XML Schema support, and event-based processing for large documents or streaming documents. It also provides the option to build the document representation, which has parallel access through the DOM4J API and standard DOM interfaces. It has been under development since the second half of 2000.

To support all of these capabilities, DOM4J USES interfaces and abstract base class methods. DOM4J makes extensive use of the Collections class in API, but in many cases it also provides some alternatives to allow for better performance or a more straightforward encoding method. The immediate benefit is that, while DOM4J pays the price of the more complex API, it provides much greater flexibility than JDOM.

When adding the goals of flexibility, XPath integration, and handling large documents, DOM4J's goals are the same as JDOM's: ease of use and intuitive manipulation for Java developers. It also aims to be a more complete solution than JDOM, achieving its goal of essentially addressing all Java/XML problems. In achieving this goal, it places less emphasis on preventing incorrect application behavior than JDOM.

DOM4J is a very, very good Java XML API with excellent performance, powerful features and extreme ease of use, and it is also an open source software. Nowadays you can see that more and more Java software is using DOM4J to read and write XML. It is worth mentioning that Sun JAXM is also using DOM4J.

"Advantages"

A large use of the Java collection class, Java developers, while providing some alternative methods to improve performance.

Support XPath.

Has very good performance.

"Defect"

A large number of interface, API is more complex.

2. Compare

1. DOM4J has the best performance, even the JAXM of Sun is used. Currently, DOM4J is widely used in many open source projects. For example, the well-known Hibernate also USES DOM4J to read the XML configuration file. If portability is not a concern, DOM4J is adopted.

2. JDOM and DOM performed poorly in performance tests and ran out of memory when testing 10M documents, but were portable. It is also worth considering DOM and JDOM for small documentation. Although the developers of JDOM have stated that they expect to focus on performance before the official release, it is not really recommended from a performance standpoint. In addition, DOM is still a very good choice. The DOM implementation is widely used in many programming languages. It is also the basis for many other standards related to XML, as it is formally recommended by W3C (as opposed to the non-standards-based Java model) and may be required in some types of projects (such as DOM in JavaScript).

3. SAX performs well, depending on its particular parsing mode -- event driven. An SAX detects the upcoming XML stream, but it is not loaded into memory (of course, when the XML stream is read in, some documents are temporarily hidden in memory).

My opinion: DOM4J is recommended if the XML documentation is large and portability is not considered. If the XML document is small, JDOM is recommended. Consider SAX if you need to process in a timely manner without having to save the data. But anyway, still that sentence: suit oneself is best, if time permits, suggest everybody to say these 4 kinds of methods try 1 time and then choose 1 kind of suit oneself can.

Example 3.

For the sake of space saving, the four methods and differences of creating XML documents are not given here, only the code of parsing XML documents is given, if a complete project is needed (building XML documents + parsing XML+ test comparison).

Here, the following XML content is analyzed as an example:


<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user id="0">
    <name>Alexia</name>
    <age>23</age>
    <sex>Female</sex>
  </user>
  <user id="1">
    <name>Edward</name>
    <age>24</age>
    <sex>Male</sex>
  </user>
  <user id="2">
    <name>wjm</name>
    <age>23</age>
    <sex>Female</sex>
  </user>
  <user id="3">
    <name>wh</name>
    <age>24</age>
    <sex>Male</sex>
  </user>
</users>

First, define the XML document parsing interface:


/**
 * @author Alexia
 *
 *  define XML Interface for document parsing 
 */
public interface XmlDocument {
  
  /**
   *  parsing XML The document 
   * 
   * @param fileName
   *       File full path name 
   */
  public void parserXml(String fileName);
}

1. DOM example


package com.xml;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

/**
 * @author Alexia
 * 
 * DOM  parsing XML The document 
 */
public class DomDemo implements XmlDocument {
  private Document document;

  public void parserXml(String fileName) {
    try {
      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      DocumentBuilder db = dbf.newDocumentBuilder();
      Document document = db.parse(fileName);
      NodeList users = document.getChildNodes();
      
      for (int i = 0; i < users.getLength(); i++) {
        Node user = users.item(i);
        NodeList userInfo = user.getChildNodes();
        
        for (int j = 0; j < userInfo.getLength(); j++) {
          Node node = userInfo.item(j);
          NodeList userMeta = node.getChildNodes();
          
          for (int k = 0; k < userMeta.getLength(); k++) {
            if(userMeta.item(k).getNodeName() != "#text")
              System.out.println(userMeta.item(k).getNodeName()
                  + ":" + userMeta.item(k).getTextContent());
          }
          
          System.out.println();
        }
      }
      
    } catch (FileNotFoundException e) {
      e.printStackTrace();
    } catch (ParserConfigurationException e) {
      e.printStackTrace();
    } catch (SAXException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

2. SAX example


package com.xml;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.StringWriter;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.AttributesImpl;
import org.xml.sax.helpers.DefaultHandler;

/**
 * @author Alexia
 * 
 * SAX  parsing XML The document 
 */
public class SaxDemo implements XmlDocument {

  public void parserXml(String fileName) {
    SAXParserFactory saxfac = SAXParserFactory.newInstance();

    try {
      SAXParser saxparser = saxfac.newSAXParser();
      InputStream is = new FileInputStream(fileName);
      saxparser.parse(is, new MySAXHandler());
    } catch (ParserConfigurationException e) {
      e.printStackTrace();
    } catch (SAXException e) {
      e.printStackTrace();
    } catch (FileNotFoundException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

class MySAXHandler extends DefaultHandler {
  boolean hasAttribute = false;
  Attributes attributes = null;

  public void startDocument() throws SAXException {
    // System.out.println(" The document began to print ");
  }

  public void endDocument() throws SAXException {
    // System.out.println(" The document is finished printing ");
  }

  public void startElement(String uri, String localName, String qName,
      Attributes attributes) throws SAXException {
    if (qName.equals("users")) {
      return;
    }
    if (qName.equals("user")) {
      return;
    }
    if (attributes.getLength() > 0) {
      this.attributes = attributes;
      this.hasAttribute = true;
    }
  }

  public void endElement(String uri, String localName, String qName)
      throws SAXException {
    if (hasAttribute && (attributes != null)) {
      for (int i = 0; i < attributes.getLength(); i++) {
        System.out.print(attributes.getQName(0) + ":"
            + attributes.getValue(0));
      }
    }
  }

  public void characters(char[] ch, int start, int length)
      throws SAXException {
    System.out.print(new String(ch, start, length));
  }
}

3. JDOM example


package com.xml;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.List;

import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.XMLOutputter;

/**
 * @author Alexia
 * 
 * JDOM  parsing XML The document 
 * 
 */
public class JDomDemo implements XmlDocument {

  public void parserXml(String fileName) {
    SAXBuilder builder = new SAXBuilder();

    try {
      Document document = builder.build(fileName);
      Element users = document.getRootElement();
      List userList = users.getChildren("user");

      for (int i = 0; i < userList.size(); i++) {
        Element user = (Element) userList.get(i);
        List userInfo = user.getChildren();

        for (int j = 0; j < userInfo.size(); j++) {
          System.out.println(((Element) userInfo.get(j)).getName()
              + ":" + ((Element) userInfo.get(j)).getValue());

        }
        System.out.println();
      }
    } catch (JDOMException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }

  }
}

4. DOM4J example


package com.xml;

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.Iterator;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;

/**
 * @author Alexia
 * 
 * Dom4j  parsing XML The document 
 */
public class Dom4jDemo implements XmlDocument {

  public void parserXml(String fileName) {
    File inputXml = new File(fileName);
    SAXReader saxReader = new SAXReader();

    try {
      Document document = saxReader.read(inputXml);
      Element users = document.getRootElement();
      for (Iterator i = users.elementIterator(); i.hasNext();) {
        Element user = (Element) i.next();
        for (Iterator j = user.elementIterator(); j.hasNext();) {
          Element node = (Element) j.next();
          System.out.println(node.getName() + ":" + node.getText());
        }
        System.out.println();
      }
    } catch (DocumentException e) {
      System.out.println(e.getMessage());
    }
  }

}


Related articles: