python parses the xml method tutorial using the xml.dom module

  • 2020-06-01 10:20:33
  • OfStack

1. What is xml? What are the features?

The extensible markup language (xml), which can be used to tag data and define data types, is a source language that allows users to define their own markup language.

Example: del xml


<?xml version="1.0" encoding="utf-8"?>
<catalog>
 <maxid>4</maxid>
 <login username="pytest" passwd='123456'>
  <caption>Python</caption>
  <item id="4">
   <caption>test</caption>
  </item>
 </login>
 <item id="2">
  <caption>Zope</caption>
 </item>
</catalog>

Structurally, it looks a lot like HTML. But they are designed for different purposes. Hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, and its focus is on the content of the data.

So it has the following characteristics:

The & # 8226; It's made up of tag pairs, <aa></aa>

The & # 8226; Tags can have attributes: <aa id='123'></aa>

The & # 8226; Tag pairs can embed data: <aa>abc</aa>

The & # 8226; Tags can be embedded with subtags (hierarchical)

2. Get tag attributes


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document 

root = dom.documentElement    # get xml The document object 
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute 
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes 
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE

nodeType is the type of node. catalog is of type ELEMENT_NODE

There are now several:


'ATTRIBUTE_NODE'

'CDATA_SECTION_NODE'

'COMMENT_NODE'

'DOCUMENT_FRAGMENT_NODE'

'DOCUMENT_NODE'

'DOCUMENT_TYPE_NODE'

'ELEMENT_NODE'

'ENTITY_NODE'

'ENTITY_REFERENCE_NODE'

'NOTATION_NODE'

'PROCESSING_INSTRUCTION_NODE'

'TEXT_NODE' 

The results


nodeName: catalog

nodeValue: None

nodeType: 1

ELEMENT_NODE: 1 

3. Get subtags


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") 

root = dom.documentElement
bb = root.getElementsByTagName('maxid')
print type(bb)
print bb
b = bb[0]
print b.nodeName
print b.nodeValue

The results


<class 'xml.dom.minicompat.NodeList'>

[<DOM Element: maxid at 0x2707a48>]

maxid

None 

4. Get the tag attribute value


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") 

root = dom.documentElement
itemlist = root.getElementsByTagName('login')
item = itemlist[0]
print item.getAttribute("username")
print item.getAttribute("passwd")

itemlist = root.getElementsByTagName("item")
item = itemlist[0]     # Through the itemlist Position differentiation in 
print item.getAttribute("id") 

item2 = itemlist[1]     # Through the itemlist Position differentiation in 
print item2.getAttribute("id")

The results


pytest

123456

4

2 

5. Get the data between the label pairs


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") 

root = dom.documentElement
itemlist = root.getElementsByTagName('caption')

item = itemlist[0]
print item.firstChild.data

item2 = itemlist[1]
print item2.firstChild.data

The results


Python

test 

Example 6.


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document 

root = dom.documentElement    # get xml The document object 
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute 
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes 
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
0

Output name, email, age, sex

Reference code


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document 

root = dom.documentElement    # get xml The document object 
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute 
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes 
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
1

The results of


#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document 

root = dom.documentElement    # get xml The document object 
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute 
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes 
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
2

7. To summarize


minidom.parse(filename)

 Loads to read XML file 

 

doc.documentElement

 To obtain XML The document object 

 

node.getAttribute(AttributeName)

 To obtain XML Node attribute value 

 

node.getElementsByTagName(TagName)

 To obtain XML Node object collection 

 

node.childNodes # Returns a list of child nodes. 

 

node.childNodes[index].nodeValue

 To obtain XML Node values 

 

node.firstChild

# Access to the first 1 A node. Is equivalent to pagexml.childNodes[0]

 

doc = minidom.parse(filename)

doc.toxml('UTF-8')

 return Node The node's xml Represented text 

 

Node.attributes["id"]

a.name # That's the top  "id"

a.value # The value of the attribute  

 Accessing element attributes  

Related articles: