c USES nsoup to parse html's messy code solutions share nsoup tutorial

  • 2020-06-03 08:09:43
  • OfStack

Download address: http: / / nsoup codeplex. com /

It can be used simply as follows:


NSoup.Nodes.Document doc = NSoup.NSoupClient.Parse(HtmlString);

Working with pages on the Web:


NSoup.Nodes.Document doc = NSoup.NSoupClient.Connect("https://www.ofstack.com/").Get();

However, it is a pity that the default code of NSoup is UTF-8, and the Chinese code is confused (for the code of UTF-8, there will be no confusion, but some GB2312 may be confused, thank you for the reminder of forhells).

So far I have found two solutions:

1. Download the page source code and then process


WebClient webClient = new WebClient();
String HtmlString=Encoding.GetEncoding("utf-8").GetString(webClient.DownloadData("https://www.ofstack.com"));
NSoup.Nodes.Document doc = NSoup.NSoupClient.Parse(HtmlString);

2. Get a stream of web pages


WebRequest webRequest=WebRequest.Create("https://www.ofstack.com");
NSoup.Nodes.Document doc = NSoup.NSoupClient.Parse(webRequest.GetResponse().GetResponseStream(),"utf-8");

The second one is more convenient, but I think the first one is more appropriate. After all, NSoup is an Html parsing class, and it shouldn't be given the task of downloading web code.


Related articles: