Analysis on how to realize the principle of crawling data in php
- 2021-11-01 02:47:29
- OfStack
Official website site: Simple, flexible and powerful PHP collection tool, so that collection is simpler 1 point.
Brief introduction
QueryList uses jQuery selector to do collection, so that you bid farewell to complex regular expressions; QueryList has jQuery1-like DOM operation ability, Http network operation ability, garbled code resolution ability, content filtering ability and scalability; Can easily achieve such as: simulated login, fake browser, HTTP proxy and other complex network requests; It has rich plug-ins, supports multi-threaded collection and uses PhantomJS to collect JavaScript dynamic rendered pages.
Installation
Install via Composer:
composer require jaeger/querylist
Using tutorials:
Directly on the code:
<?php
include './vendor/autoload.php';
// Use composer Import Directory After Installation
use QL\QueryList;
// Using plug-ins
$html = file_get_contents('https://www.biqudu.com/14_14778/');
// Get the page manually
$data = QueryList::html($html);
// Get the content of the page
$data = QueryList::setHtml('https://www.biqudu.com/14_14778/');
// Equivalent to the above html()
$data->rules([
// Collect all a Labeled href Attribute
'link' => ['a','href'],
// Collect all a Text content of label
'text' => ['a','text']
]);
// Here $data = Objects after the content of the web page has been obtained above
// Set collection rules Replaces the traditional regularity
$data->query();
// Here $data = Objects after the content of the web page has been obtained above
// query Perform an operation
$data->getData();
// Here $data = Objects after the content of the web page has been obtained above
// Get data results
$data->all();
// Here $data = Objects after the content of the web page has been obtained above
// Convert data into 2 Dimensional array
print_r($data->all());
// Print results
The basic usage method above is like this, so that we can already capture 1 set of data