c USES nsoup to parse html's messy code solutions share nsoup tutorial
- 2020-06-03 08:09:43
- OfStack
Download address: http: / / nsoup codeplex. com /
It can be used simply as follows:
NSoup.Nodes.Document doc = NSoup.NSoupClient.Parse(HtmlString);
Working with pages on the Web:
NSoup.Nodes.Document doc = NSoup.NSoupClient.Connect("https://www.ofstack.com/").Get();
However, it is a pity that the default code of NSoup is UTF-8, and the Chinese code is confused (for the code of UTF-8, there will be no confusion, but some GB2312 may be confused, thank you for the reminder of forhells).
So far I have found two solutions:
1. Download the page source code and then process
WebClient webClient = new WebClient();
String HtmlString=Encoding.GetEncoding("utf-8").GetString(webClient.DownloadData("https://www.ofstack.com"));
NSoup.Nodes.Document doc = NSoup.NSoupClient.Parse(HtmlString);
2. Get a stream of web pages
WebRequest webRequest=WebRequest.Create("https://www.ofstack.com");
NSoup.Nodes.Document doc = NSoup.NSoupClient.Parse(webRequest.GetResponse().GetResponseStream(),"utf-8");
The second one is more convenient, but I think the first one is more appropriate. After all, NSoup is an Html parsing class, and it shouldn't be given the task of downloading web code.