Grab administrative code by Python

  • 2020-05-17 05:48:52
  • OfStack

preface

There are relatively neat administrative codes on the website of national bureau of statistics. For some websites, this is very basic data, so I wrote an Python program to capture this part of data.

Note: after grabbing down to carry out a simple manual sorting

Sample code:


# -*- coding:utf-8 -*-
'''
 Obtain the administrative code on the national bureau of statistics 
'''
import requests,re
base_url = 'http://www.stats.gov.cn/tjsj/tjbz/xzqhdm/201504/t20150415_712722.html'
 
def get_xzqh():
 html_data = requests.get(base_url).content
 pattern = re.compile('<p class="MsoNormal" style=".*?"><span lang="EN-US" style=".*?">(\d+)<span>.*?</span></span><span style=".*?">(.*?)</span></p>')
 areas = re.findall(pattern,html_data)
 print "code,name,level"
 for area in areas:
  print area[0],area[1].decode('utf-8').replace(u' ',''),area[1].decode('utf-8').count(u' ')
 
if __name__=='__main__':
 get_xzqh()

Matters needing attention:

In addition, there is another way to obtain information about country and region tables, which is the country and region information table provided by QQ software. (file name LocList.xml ), 1. The storage location is: C:\Program Files\Tencent\QQ\I18N\2052

You can install Chinese version QQ if you need Chinese version, and English version QQ if you need English version. The international edition is in the 1033 directory.

code is written according to the standard of ISO3166 and is easy to import into the database.

conclusion

The above is to use Python to obtain all the content of the administrative code, I hope the content of this article can help you to learn or use python, if you have any questions, you can leave a message to communicate.


Related articles: