Grab administrative code by Python
- 2020-05-17 05:48:52
- OfStack
preface
There are relatively neat administrative codes on the website of national bureau of statistics. For some websites, this is very basic data, so I wrote an Python program to capture this part of data.
Note: after grabbing down to carry out a simple manual sorting
Sample code:
# -*- coding:utf-8 -*-
'''
Obtain the administrative code on the national bureau of statistics
'''
import requests,re
base_url = 'http://www.stats.gov.cn/tjsj/tjbz/xzqhdm/201504/t20150415_712722.html'
def get_xzqh():
html_data = requests.get(base_url).content
pattern = re.compile('<p class="MsoNormal" style=".*?"><span lang="EN-US" style=".*?">(\d+)<span>.*?</span></span><span style=".*?">(.*?)</span></p>')
areas = re.findall(pattern,html_data)
print "code,name,level"
for area in areas:
print area[0],area[1].decode('utf-8').replace(u' ',''),area[1].decode('utf-8').count(u' ')
if __name__=='__main__':
get_xzqh()
Matters needing attention:
In addition, there is another way to obtain information about country and region tables, which is the country and region information table provided by QQ software. (file name
LocList.xml
), 1. The storage location is:
C:\Program Files\Tencent\QQ\I18N\2052
You can install Chinese version QQ if you need Chinese version, and English version QQ if you need English version. The international edition is in the 1033 directory.
code is written according to the standard of ISO3166 and is easy to import into the database.
conclusion
The above is to use Python to obtain all the content of the administrative code, I hope the content of this article can help you to learn or use python, if you have any questions, you can leave a message to communicate.