Explanation of python Crawler selenium Module
- 2021-10-16 01:58:02
- OfStack
Directory selenium module
selenium Basic Concept Basically uses browser-based automation operations selenium to process iframe: selenium simulates landing on QQ spatial headless browser and evasion detection
selenium module
selenium Basic Concept Basically uses browser-based automation operations selenium to process iframe: selenium simulates landing on QQ spatial headless browser and evasion detection
selenium module
selenium Basic Concepts
selenium Advantage
Easy access to dynamically loaded data in websites Convenient realization of simulated loginselenium usage process:
1. Environmental installation:
pip install selenium
2. Download a browser driver (Google Browser)
3. Instantiate a browser object
Basic use
Code
from selenium import webdriver
from lxml import etree
from time import sleep
if __name__ == '__main__':
bro = webdriver.Chrome(r"E:\google\Chrome\Application\chromedriver.exe")
bro.get(url='http://scxk.nmpa.gov.cn:81/xk/')
page_text = bro.page_source
tree = etree.HTML(page_text)
li_list = tree.xpath('//*[@id="gzlist"]/li')
for li in li_list:
name = li.xpath('./dl/@title')[0]
print(name)
sleep(5)
bro.quit()
Operation based on browser automation
Code
# Write operation code based on browser automation
- Initiate a request : get(url)
- Label positioning : find Series method
- Label interaction : send_ keys( 'xxx' )
- Execute js Program : excute_script('jsCod')
- Forward, backward : back(),forward( )
- Close the browser : quit()
Code
https://www.taobao.com/
from selenium import webdriver
from time import sleep
bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
bro.get(url='https://www.taobao.com/')
# Label positioning
search_input = bro.find_element_by_id('q')
sleep(2)
# Execute 1 Group js Code to make the scroll slide down
bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')
sleep(2)
# Label interaction
search_input.send_keys(' Women's clothing ')
button = bro.find_element_by_class_name('btn-search')
button.click()
bro.get('https://www.baidu.com')
sleep(2)
bro.back()
sleep(2)
bro.forward()
sleep(5)
bro.quit()
selenium Processing iframe:
- If the located tag exists in the iframe Tag, you must use the switch_to.frame(id)
- Action chain ( Drag ) : from selenium. webdriver import ActionChains
- Instantiation 1 Action chain objects : action = ActionChains (bro)
- click_and_hold(div) : Long press and click to operate
- move_by_offset(x,y)
- perform( ) Let the action chain execute immediately
- action.release( ) Release action chain objects
Code
https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable
from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains
bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
bro.switch_to.frame('iframeResult')
div = bro.find_element_by_id('draggable')
# Action chain
action = ActionChains(bro)
action.click_and_hold(div)
for i in range(5):
action.move_by_offset(17,0).perform()
sleep(0.3)
# Release action chain
action.release()
bro.quit()
selenium simulates landing in QQ space
Code
https://qzone.qq.com/
from selenium import webdriver
from time import sleep
bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
bro.get('https://qzone.qq.com/')
bro.switch_to.frame("login_frame")
switcher = bro.find_element_by_id('switcher_plogin')
switcher.click()
user_tag = bro.find_element_by_id('u')
password_tag = bro.find_element_by_id('p')
user_tag.send_keys('1234455')
password_tag.send_keys('qwer123')
sleep(1)
but = bro.find_element_by_id('login_button')
but.click()
Headless Browser and Evasion Detection
Code
from selenium import webdriver
from time import sleep
# Realize no visual interface
from selenium.webdriver.chrome.options import Options
# Realize evasion detection
from selenium.webdriver import ChromeOptions
# Realize no visual interface
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
# Realize evasion detection
option = ChromeOptions()
option.add_experimental_option('excludeSwitches',['enable-automation'])
bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe",chrome_options=chrome_options,options=option)
bro.get('https://www.baidu.com')
print(bro.page_source)
sleep(2)
bro.quit()