Python USES the Beautiful Soup module to modify the content method example
- 2020-05-27 06:09:18
- OfStack
preface
In addition to searching and navigating, the Beautiful Soup module can also modify the content of HTML/XML documents. This means the ability to add or remove tags, change tag names, change tag attribute values, modify text content, and so on. This article is very detailed to introduce Python using Beautiful Soup module to modify the content of the method, the following words do not say, to see the detailed introduction.
Modify the label
The sample HTML document used is as follows:
html_markup="""
<div class="ecopyramid">
<ul id="producers">
<li class="producerlist">
<div class="name">plants</div>
<div class="number">100000</div>
</li>
<li class="producerlist">
<div class="name">algae</div>
<div class="number">100000</div>
</li>
</ul>
</div>
"""
Modify the tag name
soup = BeautifulSoup(html_markup,'lxml')
producer_entries = soup.ul
print producer_entries.name
producer_entries.name = "div"
print producer_entries.prettify()
Modify the tag attribute value
# Modify tag attributes
# Update the tag's existing attribute values
producer_entries['id'] = "producers_new_value"
print producer_entries.prettify()
# Tag adds a new attribute value
producer_entries['class'] = "newclass"
print producer_entries.prettify()
# Deletes the tag attribute value
del producer_entries['class']
print producer_entries.prettify()
Add a new tag
We can use the new_tag method to generate a new tag and then use
append()
,
insert()
,
insert_after()
,
insert_before()
Method to add the label to the HTML tree.
For example, add an li tag to the ul tag of the HTML document above. The new li tag is first generated and then inserted into the HTML tree structure. And insert the corresponding div tag in the li tag.
# Add a new tag
# new_tag generate 1 a tag object
new_li_tag = soup.new_tag("li")
# A method to add attributes to a tag object
new_atag = soup.new_tag("a",href="www.example.com" rel="external nofollow" )
new_li_tag.attrs = {'class':'producerlist'}
soup = BeautifulSoup(html_markup,'lxml')
producer_entries = soup.ul
# use append() Method is added to the end
producer_entries.append(new_li_tag)
print producer_entries.prettify()
# Generate two div The label , Insert it into li In the label
new_div_name_tag = soup.new_tag("div")
new_div_name_tag['class'] = "name"
new_div_number_tag = soup.new_tag("div")
new_div_number_tag["class"] = "number"
# use insert() Method to specify the location of the insert
new_li_tag.insert(0,new_div_name_tag)
new_li_tag.insert(1,new_div_number_tag)
print new_li_tag.prettify()
Modify string content
Modify the string content can be used
new_string()
,
append()
,
insert()
Methods.
# Modify string content
# use .string Property modifies the string content
new_div_name_tag.string = 'new_div_name'
# use .append() Method to add string content
new_div_name_tag.append("producer")
# use soup The object's new_string() Method to generate a string
new_string_toappend = soup.new_string("producer")
new_div_name_tag.append(new_string_toappend)
# use insert() Methods insert
new_string_toinsert = soup.new_string("10000")
new_div_number_tag.insert(0,new_string_toinsert)
print producer_entries.prettify()
Delete label node
The Beautiful Soup module is provided
decompose()
and
extract()
Method to delete the node.
decompose()
Method deletes a node, which not only deletes the current node, but also deletes its child node 1 block.
extract()
The HTML tree () method is used to remove the node or string content from the HTML tree.
# Remove nodes
third_producer = soup.find_all("li")[2]
# use decompose() Methods to remove div node
div_name = third_producer.div
div_name.decompose()
print third_producer.prettify()
# use extract() Method to delete a node
third_producer_removed = third_producer.extract()
print soup.prettify()
Delete tag content
The tag may have an NavigableString object or an Tag object as its child nodes, and removing all of these child nodes can be used
clear()
Methods. This will remove all.content of the tag.
Other ways to modify content
In addition to the methods mentioned above, there are other methods to modify the content.
insert_after()
and
insert_before()
methods
The two methods above can insert a label or string before or after the label or string. The method can accept only one parameter, either an NavigableString object or an Tag object.
replace_with()
methods
This method replaces the original label or string with a new label or string content, and can receive one label or string as input.
wrap()
and
unwrap()
methods
wrap()
The method is to wrap a label or string with another label.
unwrap()
Method and
wrap()
The opposite is true.
# wrap() methods
li_tags = soup.find_all('li')
for li in li_tags:
new_div_tag = soup.new_tag('div')
li.wrap(new_div_tag)
print soup.prettify()
# unwrap() methods
li_tags = soup.find_all("li")
for li in li_tags:
li.div.unwrap()
print soup.prettify()
conclusion