python encountered in the use xpath: less thanElement a at 0 x39a9a80greater than exactly?

  • 2020-06-23 00:56:06
  • OfStack

preface

In the process of learning python crawler, you will find 1 problem. I read the grammar and spoke in great detail. I also read it carefully. Without further ado, let's take a look at the details.

What is Element

Let's get back to the topic. After reading the complicated grammar, we can't wait to write something. Then some of you may have encountered this

< Element a at 0x39a9a80 >

Or Element a at 0x? The & # 63; The & # 63; The & # 63; The & # 63; The & # 63; The & # 63; , such a value, and then we take the problem to search, and then all of English ah, what a mess of 78 bad ah, English bad students will collapse, here, I will focus on the analysis of 1

In a sense, the value that you get when you print a variable, it's actually a list, and each value in the list is a dictionary

Look at the example of semi-finished driving, which proves that I am very good at combining learning with fun and solving daily needs personally. Funny face


from bs4 import BeautifulSoup
from lxml import etree
import requests
gjc='SHKD-700'
# define URL
html = "http://www.btanv.com/search/"+gjc+"-hot-desc-1"
# decoding URL
html = requests.get(html).content.decode('utf-8')
# Parsed into xml
dom_tree = etree.HTML(html)
# in xml Locate the node in, and return 1 A list of 
links = dom_tree.xpath("//a[@class='download']")
for index in range(len(links)):
 # links[index] Returns the 1 A dictionary 
 if (index % 2) == 0:
  print(links[index].tag)
  print(links[index].attrib)
  print(links[index].text)

Instance analysis

Now let's focus on this code,


  print(links[index])
  print(type(links[index]))
  print(links[index].tag)# To obtain <a> Tag name a
  print(links[index].attrib)# To obtain <a> Attributes of the tag href and class
  print(links[index].text)# To obtain <a> The text portion of the label 

It's printed out


<Element a at 0x3866a58>
<class 'lxml.etree._Element'>
a
{'href': 'magnet:?xt=urn:btih:7502edea0dfe9c2774f95118db3208a108fe10ca', 'class': 'download'}
 Magnetic link 

The html code for this node is


<a href="magnet:xt=urn:btih:7502edea0dfe9c2774f95118db3208a108fe10ca" rel="external nofollow" class="download"> Magnetic link </a>

See here you should be very beast blood boiling to understand the use of the three properties.

conclusion

The Element type is' lxml. etree. _Element', which in some sense is also a list The list needs to use tag\attrib\text3 different properties to get what we need tag gets the tag name is -- string The variable.attrib retrieves the dictionary attribute of the node tag a The.text variable gets the tag text -- the string

Welcome to collect thumb up, refuse to reprint, because at present I am also self-taught to explore forward, these are my current cognition of things, there must be inaccurate, do not want to mislead others


Related articles: