A brief discussion of the problems encountered using the Rapidxml library and the analysis process Shared by of

  • 2020-05-19 05:18:14
  • OfStack

There are many open source libraries for C++ to parse xml, so I won't list them here. Today, I will mainly talk about Rapidxml. I don't use this library very much.

The attached:

Official link: http: / / rapidxml sourceforge. net /

Official handbook: http: / / rapidxml sourceforge. net/manual html

I used it once before, but I met a "pit". I didn't find it in time when I was pressed for time, so I used this library again today. I can't step on such a pit for the second time, so I decided to find out what happened.

Write two examples:

Create xm:


void CreateXml()
{
  rapidxml::xml_document<> doc;
  
  auto nodeDecl = doc.allocate_node(rapidxml::node_declaration);
  nodeDecl->append_attribute(doc.allocate_attribute("version", "1.0"));
  nodeDecl->append_attribute(doc.allocate_attribute("encoding", "UTF-8"));
  doc.append_node(nodeDecl);// add xml The statement 
  
  auto nodeRoot = doc.allocate_node(rapidxml::node_element, "Root");// create 1 a Root node 
  nodeRoot->append_node(doc.allocate_node(rapidxml::node_comment, NULL, " A programming language "));// add 1 Comment the content to Root , note no name  So the first 2 The parameters for NULL
  auto nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C language");// create 1 a language node 
  nodeLangrage->append_attribute(doc.allocate_attribute("name", "C"));// add 1 a name Attributes to language
  nodeRoot->append_node(nodeLangrage); // add 1 a language to Root node 
  nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C++ language");// create 1 a language node 
  nodeLangrage->append_attribute(doc.allocate_attribute("name", "C++"));// add 1 a name Attributes to language
  nodeRoot->append_node(nodeLangrage); // add 1 a language to Root node 

  doc.append_node(nodeRoot);// add Root Node to Document
  std::string buffer;
  rapidxml::print(std::back_inserter(buffer), doc, 0);
  std::ofstream outFile("language.xml");
  outFile << buffer;
  outFile.close();
}

Results:


 <?xml version="1.0" encoding="UTF-8"?>
 <Root>
   <!-- A programming language -->
   <language name="C">This is C language</language>
   <language name="C++">This is C++ language</language>
 </Root>

Modify xml:


void MotifyXml()
{
  rapidxml::file<> requestFile("language.xml");// Load from file xml
  rapidxml::xml_document<> doc;
  doc.parse<0>(requestFile.data());// parsing xml

  auto nodeRoot = doc.first_node();// For the first 1 The nodes, which is Root node 
  auto nodeLanguage = nodeRoot->first_node("language");// To obtain Root The first 1 a language node 
  nodeLanguage->first_attribute("name")->value("Motify C");// Modify the language The node's name Properties for  Motify C
  std::string buffer;
  rapidxml::print(std::back_inserter(buffer), doc, 0);
  std::ofstream outFile("MotifyLanguage.xml");
  outFile << buffer;
  outFile.close();
}

Results:


 <Root>
   <language name="Motify C">This is C language</language>
   <language name="C++">This is C++ language</language>
 </Root>

According to the second result:

The first language's name property did change to the desired value, but it's not hard to see that the declarations and comments on xml have disappeared. What happened? This problem has plagued me 1 time, since it is open source library, then we see what he's done with 1, can be seen from the code suspicious place basically has two: print and parse, these two functions are need to provide a flag, what will you do this flag, from the perspective of the tutorial of official to all use of 0, since the final execution is print print we will start to debug tracking

Found where the print call was found:


template<class OutIt, class Ch> 
   inline OutIt print(OutIt out, const xml_node<Ch> &node, int flags = 0)
   {
     return internal::print_node(out, &node, flags, 0);
   }

Follow up:


// Print node
    template<class OutIt, class Ch>
    inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)
    {
      // Print proper node type
      switch (node->type())
      {

      // Document
      case node_document:
        out = print_children(out, node, flags, indent);
        break;

      // Element
      case node_element:
        out = print_element_node(out, node, flags, indent);
        break;
      
      // Data
      case node_data:
        out = print_data_node(out, node, flags, indent);
        break;
      
      // CDATA
      case node_cdata:
        out = print_cdata_node(out, node, flags, indent);
        break;

      // Declaration
      case node_declaration:
        out = print_declaration_node(out, node, flags, indent);
        break;

      // Comment
      case node_comment:
        out = print_comment_node(out, node, flags, indent);
        break;
      
      // Doctype
      case node_doctype:
        out = print_doctype_node(out, node, flags, indent);
        break;

      // Pi
      case node_pi:
        out = print_pi_node(out, node, flags, indent);
        break;

        // Unknown
      default:
        assert(0);
        break;
      }
      
      // If indenting not disabled, add line break after node
      if (!(flags & print_no_indenting))
        *out = Ch('\n'), ++out;

      // Return modified iterator
      return out;
    }

We follow up print_children and it turns out that this is actually a recursion, so let's follow up


// Print element node
template<class OutIt, class Ch>
inline OutIt print_element_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)
{
  assert(node->type() == node_element);

  // Print element name and attributes, if any
  if (!(flags & print_no_indenting))
  ...// Omitted code 
  
  return out;
}

We found an & in line 8 to see the definition of print_no_indenting:


// Printing flags
const int print_no_indenting = 0x1;  //!< Printer flag instructing the printer to suppress indenting of XML. See print() function.

From this we can analyze that the parse should have the same token definition according to the idea of development style unification 1

Omit the analysis parse process..

I also checked the official documents by the way, and it is exactly like what I expected. I posted the description of these signs in the header file, and you can refer to the official documents for detailed information


// Parsing flags

  //! Parse flag instructing the parser to not create data nodes. 
  //! Text of first data node will still be placed in value of parent element, unless rapidxml::parse_no_element_values flag is also specified.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_data_nodes = 0x1;      

  //! Parse flag instructing the parser to not use text of first data node as a value of parent element.
  //! Can be combined with other flags by use of | operator.
  //! Note that child data nodes of element node take precendence over its value when printing. 
  //! That is, if element has one or more child data nodes <em>and</em> a value, the value will be ignored.
  //! Use rapidxml::parse_no_data_nodes flag to prevent creation of data nodes if you want to manipulate data using values of elements.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_element_values = 0x2;
  
  //! Parse flag instructing the parser to not place zero terminators after strings in the source text.
  //! By default zero terminators are placed, modifying source text.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_string_terminators = 0x4;
  
  //! Parse flag instructing the parser to not translate entities in the source text.
  //! By default entities are translated, modifying source text.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_entity_translation = 0x8;
  
  //! Parse flag instructing the parser to disable UTF-8 handling and assume plain 8 bit characters.
  //! By default, UTF-8 handling is enabled.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_utf8 = 0x10;
  
  //! Parse flag instructing the parser to create XML declaration node.
  //! By default, declaration node is not created.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_declaration_node = 0x20;
  
  //! Parse flag instructing the parser to create comments nodes.
  //! By default, comment nodes are not created.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_comment_nodes = 0x40;
  
  //! Parse flag instructing the parser to create DOCTYPE node.
  //! By default, doctype node is not created.
  //! Although W3C specification allows at most one DOCTYPE node, RapidXml will silently accept documents with more than one.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_doctype_node = 0x80;
  
  //! Parse flag instructing the parser to create PI nodes.
  //! By default, PI nodes are not created.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_pi_nodes = 0x100;
  
  //! Parse flag instructing the parser to validate closing tag names. 
  //! If not set, name inside closing tag is irrelevant to the parser.
  //! By default, closing tags are not validated.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_validate_closing_tags = 0x200;
  
  //! Parse flag instructing the parser to trim all leading and trailing whitespace of data nodes.
  //! By default, whitespace is not trimmed. 
  //! This flag does not cause the parser to modify source text.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_trim_whitespace = 0x400;

  //! Parse flag instructing the parser to condense all whitespace runs of data nodes to a single space character.
  //! Trimming of leading and trailing whitespace of data is controlled by rapidxml::parse_trim_whitespace flag.
  //! By default, whitespace is not normalized. 
  //! If this flag is specified, source text will be modified.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_normalize_whitespace = 0x800;

  // Compound flags
  
  //! Parse flags which represent default behaviour of the parser. 
  //! This is always equal to 0, so that all other flags can be simply ored together.
  //! Normally there is no need to inconveniently disable flags by anding with their negated (~) values.
  //! This also means that meaning of each flag is a <i>negation</i> of the default setting. 
  //! For example, if flag name is rapidxml::parse_no_utf8, it means that utf-8 is <i>enabled</i> by default,
  //! and using the flag will disable it.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_default = 0;
  
  //! A combination of parse flags that forbids any modifications of the source text. 
  //! This also results in faster parsing. However, note that the following will occur:
  //! <ul>
  //! <li>names and values of nodes will not be zero terminated, you have to use xml_base::name_size() and xml_base::value_size() functions to determine where name and value ends</li>
  //! <li>entities will not be translated</li>
  //! <li>whitespace will not be normalized</li>
  //! </ul>
  //! See xml_document::parse() function.
  const int parse_non_destructive = parse_no_string_terminators | parse_no_entity_translation;
  
  //! A combination of parse flags resulting in fastest possible parsing, without sacrificing important data.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_fastest = parse_non_destructive | parse_no_data_nodes;
  
  //! A combination of parse flags resulting in largest amount of data being extracted. 
  //! This usually results in slowest parsing.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_full = parse_declaration_node | parse_comment_nodes | parse_doctype_node | parse_pi_nodes | parse_validate_closing_tags;

According to the information provided above, we change the previous source code:

will


 doc.parse<0>(requestFile.data());// parsing xml
 auto nodeRoot = doc.first_node("");// For the first 1 The nodes, which is Root node 
 

Instead of


 <?xml version="1.0" encoding="UTF-8"?>
 <Root>
   <!-- A programming language -->
   <language name="C">This is C language</language>
   <language name="C++">This is C++ language</language>
 </Root>
0

Here explain 1, parse joined the three logo, respectively is tell the parser to create a statement node, tell the parser to create a comment node, and do not want the parser to modify data transfer in, 2 is when a xml statement, the default first_node not we expect Root nodes, so by sending node name to find the node we need.

Note:

1. The library does not judge whether the addition item (node, attribute, etc.) exists at append

2. Iteration failure will be caused if items (nodes, attributes, etc.) are modified during loop traversal

Conclusion: there are always some unexpected problems with libraries written by others. So far, I have only encountered these problems. If there are any other problems, please feel free to add.

Thanks to the rapidxml authors for providing us with such an efficient and convenient tool.


Related articles: