Example of a method for Python to grab news headlines and links regularly

  • 2020-05-30 20:26:59
  • OfStack

This article demonstrates an example of how Python can regularly grab news headlines and links. I will share it with you for your reference as follows:


#-*-coding:utf-8-*-
import re
from urllib import urlretrieve
from urllib import urlopen
# Access to web information 
doc = urlopen("http://www.itongji.cn/news/").read() # Of your own 1 Big data news website 
# Grab news headlines and links 
def extract_title(info):
  pat = '<h3><a target=\"_blank\"(.*?)</a></h3>'
  title = re.findall(pat, info)
  titles='\n'.join(title)
  #print titles
# Modify the specified string 
  titles1=titles.replace('class="title"','title')
  titles2=titles1.replace('>',':')
  titles3=titles2.replace('href','url:')
  titles4=titles3.replace('="/','"http://www.itongji.cn/')
# Written to the file 
  save=open('xinwen.txt','w')
  save.write(titles4)
  save.close()
titles = extract_title(doc)

PS: here are two more handy regular expression tools for you to use:

JavaScript regular expression online testing tool:
http://tools.ofstack.com/regex/javascript

Online regular expression generation tool:
http://tools.ofstack.com/regex/create_reg

More about Python related content to view this site project: the Python regular expression usage summary ", "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using techniques", "Python string skills summary", "Python introduction and advanced tutorial" and "Python file and directory skills summary"

I hope this article is helpful to you Python programming.


Related articles: