python parses the xml method tutorial using the xml.dom module
- 2020-06-01 10:20:33
- OfStack
1. What is xml? What are the features?
The extensible markup language (xml), which can be used to tag data and define data types, is a source language that allows users to define their own markup language.
Example: del xml
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<maxid>4</maxid>
<login username="pytest" passwd='123456'>
<caption>Python</caption>
<item id="4">
<caption>test</caption>
</item>
</login>
<item id="2">
<caption>Zope</caption>
</item>
</catalog>
Structurally, it looks a lot like HTML. But they are designed for different purposes. Hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, and its focus is on the content of the data.
So it has the following characteristics:
The & # 8226; It's made up of tag pairs,
<aa></aa>
The & # 8226; Tags can have attributes:
<aa id='123'></aa>
The & # 8226; Tag pairs can embed data:
<aa>abc</aa>
The & # 8226; Tags can be embedded with subtags (hierarchical)
2. Get tag attributes
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document
root = dom.documentElement # get xml The document object
print "nodeName:", root.nodeName # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
nodeType is the type of node. catalog is of type ELEMENT_NODE
There are now several:
'ATTRIBUTE_NODE'
'CDATA_SECTION_NODE'
'COMMENT_NODE'
'DOCUMENT_FRAGMENT_NODE'
'DOCUMENT_NODE'
'DOCUMENT_TYPE_NODE'
'ELEMENT_NODE'
'ENTITY_NODE'
'ENTITY_REFERENCE_NODE'
'NOTATION_NODE'
'PROCESSING_INSTRUCTION_NODE'
'TEXT_NODE'
The results
nodeName: catalog
nodeValue: None
nodeType: 1
ELEMENT_NODE: 1
3. Get subtags
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml")
root = dom.documentElement
bb = root.getElementsByTagName('maxid')
print type(bb)
print bb
b = bb[0]
print b.nodeName
print b.nodeValue
The results
<class 'xml.dom.minicompat.NodeList'>
[<DOM Element: maxid at 0x2707a48>]
maxid
None
4. Get the tag attribute value
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml")
root = dom.documentElement
itemlist = root.getElementsByTagName('login')
item = itemlist[0]
print item.getAttribute("username")
print item.getAttribute("passwd")
itemlist = root.getElementsByTagName("item")
item = itemlist[0] # Through the itemlist Position differentiation in
print item.getAttribute("id")
item2 = itemlist[1] # Through the itemlist Position differentiation in
print item2.getAttribute("id")
The results
pytest
123456
4
2
5. Get the data between the label pairs
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml")
root = dom.documentElement
itemlist = root.getElementsByTagName('caption')
item = itemlist[0]
print item.firstChild.data
item2 = itemlist[1]
print item2.firstChild.data
The results
Python
test
Example 6.
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document
root = dom.documentElement # get xml The document object
print "nodeName:", root.nodeName # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
0
Output name, email, age, sex
Reference code
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document
root = dom.documentElement # get xml The document object
print "nodeName:", root.nodeName # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
1
The results of
#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document
root = dom.documentElement # get xml The document object
print "nodeName:", root.nodeName # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE
2
7. To summarize
minidom.parse(filename)
Loads to read XML file
doc.documentElement
To obtain XML The document object
node.getAttribute(AttributeName)
To obtain XML Node attribute value
node.getElementsByTagName(TagName)
To obtain XML Node object collection
node.childNodes # Returns a list of child nodes.
node.childNodes[index].nodeValue
To obtain XML Node values
node.firstChild
# Access to the first 1 A node. Is equivalent to pagexml.childNodes[0]
doc = minidom.parse(filename)
doc.toxml('UTF-8')
return Node The node's xml Represented text
Node.attributes["id"]
a.name # That's the top "id"
a.value # The value of the attribute
Accessing element attributes