Example of a method for Python to grab news headlines and links regularly
- 2020-05-30 20:26:59
- OfStack
This article demonstrates an example of how Python can regularly grab news headlines and links. I will share it with you for your reference as follows:
#-*-coding:utf-8-*-
import re
from urllib import urlretrieve
from urllib import urlopen
# Access to web information
doc = urlopen("http://www.itongji.cn/news/").read() # Of your own 1 Big data news website
# Grab news headlines and links
def extract_title(info):
pat = '<h3><a target=\"_blank\"(.*?)</a></h3>'
title = re.findall(pat, info)
titles='\n'.join(title)
#print titles
# Modify the specified string
titles1=titles.replace('class="title"','title')
titles2=titles1.replace('>',':')
titles3=titles2.replace('href','url:')
titles4=titles3.replace('="/','"http://www.itongji.cn/')
# Written to the file
save=open('xinwen.txt','w')
save.write(titles4)
save.close()
titles = extract_title(doc)
PS: here are two more handy regular expression tools for you to use:
JavaScript regular expression online testing tool:
http://tools.ofstack.com/regex/javascript
Online regular expression generation tool:
http://tools.ofstack.com/regex/create_reg
More about Python related content to view this site project: the Python regular expression usage summary ", "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using techniques", "Python string skills summary", "Python introduction and advanced tutorial" and "Python file and directory skills summary"
I hope this article is helpful to you Python programming.