python gets the code for how to encode web pages

  • 2020-05-27 05:56:35
  • OfStack

python gets the code for the implementation of web page encoding


<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">
  </span><span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">
python Development, automatic access to web code used chardet Library, character set detection, this class in python2.7 No, it needs to be downloaded from the official website. 
 I'll download it here chardet-2.3.0.tar.gz Compress the package file, only need to decompress the compressed package file chardet Files in python Install under the package 
python27/lib/site-packages/ Down, that's it. </span> 

Then import chardet

An automated detection function is written below to detect Url connections and return the encoding of url for the web page.


import chardet # Character set detection  
import urllib 
 
url="http://www.jd.com" 
 
 
def automatic_detect(url): 
  content=urllib.urlopen(url).read() 
  result=chardet.detect(content) 
 
  encoding=result['encoding'] 
 
  return encoding 
 
urls=['http://www.baidu.com','http://www.163.com','http://dangdang.com'] 
for url in urls: 
  print url,automatic_detect(url) 

The detect method of the chardet class is used above, the dictionary is returned, and the encoding encoding is pulled out

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: