Explanation of python Crawler selenium Module

  • 2021-10-16 01:58:02
  • OfStack

Directory selenium module
selenium Basic Concept Basically uses browser-based automation operations selenium to process iframe: selenium simulates landing on QQ spatial headless browser and evasion detection

selenium module

selenium Basic Concepts

selenium Advantage

Easy access to dynamically loaded data in websites Convenient realization of simulated login

selenium usage process:

1. Environmental installation: pip install selenium

2. Download a browser driver (Google Browser)

3. Instantiate a browser object

Basic use

Code


from selenium import webdriver
from lxml import etree
from time import sleep

if __name__ == '__main__':

 bro = webdriver.Chrome(r"E:\google\Chrome\Application\chromedriver.exe")
 bro.get(url='http://scxk.nmpa.gov.cn:81/xk/')

 page_text = bro.page_source
 tree = etree.HTML(page_text)
 li_list = tree.xpath('//*[@id="gzlist"]/li')
 for li in li_list:
  name = li.xpath('./dl/@title')[0]
  print(name)
 sleep(5)
 bro.quit()

Operation based on browser automation

Code


# Write operation code based on browser automation 

-  Initiate a request : get(url)

-  Label positioning : find Series method 

-  Label interaction : send_ keys( 'xxx' )

-  Execute js Program : excute_script('jsCod')

-  Forward, backward : back(),forward( )

-  Close the browser : quit()

Code

https://www.taobao.com/


from selenium import webdriver
from time import sleep

bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")

bro.get(url='https://www.taobao.com/')

# Label positioning 
search_input = bro.find_element_by_id('q')
sleep(2)
# Execute 1 Group js Code to make the scroll slide down 
bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')
sleep(2)
# Label interaction 
search_input.send_keys(' Women's clothing ')
button = bro.find_element_by_class_name('btn-search')
button.click()

bro.get('https://www.baidu.com')
sleep(2)
bro.back()
sleep(2)
bro.forward()
sleep(5)
bro.quit()

selenium Processing iframe:


-  If the located tag exists in the iframe Tag, you must use the switch_to.frame(id)

-  Action chain ( Drag ) : from selenium. webdriver import ActionChains
	-  Instantiation 1 Action chain objects : action = ActionChains (bro)
	- click_and_hold(div) : Long press and click to operate 
	- move_by_offset(x,y)
	- perform( ) Let the action chain execute immediately 
	- action.release( ) Release action chain objects 

Code

https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable


from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains
bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")

bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')

bro.switch_to.frame('iframeResult')

div = bro.find_element_by_id('draggable')

# Action chain 
action = ActionChains(bro)
action.click_and_hold(div)

for i in range(5):
 action.move_by_offset(17,0).perform()
 sleep(0.3)

# Release action chain 
action.release()

bro.quit()

selenium simulates landing in QQ space

Code

https://qzone.qq.com/


from selenium import webdriver
from time import sleep


bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
bro.get('https://qzone.qq.com/')
bro.switch_to.frame("login_frame")

switcher = bro.find_element_by_id('switcher_plogin')
switcher.click()

user_tag = bro.find_element_by_id('u')
password_tag = bro.find_element_by_id('p')
user_tag.send_keys('1234455')
password_tag.send_keys('qwer123')
sleep(1)

but = bro.find_element_by_id('login_button')
but.click()

Headless Browser and Evasion Detection

Code


from selenium import webdriver
from time import sleep
# Realize no visual interface 
from selenium.webdriver.chrome.options import Options
# Realize evasion detection 
from selenium.webdriver import ChromeOptions

# Realize no visual interface 
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
# Realize evasion detection 
option = ChromeOptions()
option.add_experimental_option('excludeSwitches',['enable-automation'])

bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe",chrome_options=chrome_options,options=option)

bro.get('https://www.baidu.com')
print(bro.page_source)
sleep(2)
bro.quit()

Related articles: