Implementation of web crawler C crawling web page Html source code based on C

  • 2021-09-11 21:05:46
  • OfStack

Recently just completed a simple web crawler, very confused at the beginning, do not know how to start, later found a lot of information, but really can reach my needs, useful information-code is hard to find. So I want to send this article so that some friends who want to do this function will take less detours.

The first is to grab the Html source code and select < ul class="post_list" > < /ul > Node href: To add using System. IO; using System. Net;


private void Search(string url)
{
 string rl;
 WebRequest Request = WebRequest.Create(url.Trim());
 
 WebResponse Response = Request.GetResponse();
 
 Stream resStream = Response.GetResponseStream();
 
 StreamReader sr = new StreamReader(resStream, Encoding.Default);
 StringBuilder sb = new StringBuilder();
 while ((rl = sr.ReadLine()) != null)
 {
  sb.Append(rl);
 }
 
 
 string str = sb.ToString().ToLower();
 
 string str_get = mid(str, "<ul class=\"post_list\">", "</ul>");
 
 
 int start = 0;
 while (true)
 {
  if (str_get == null)
   break;
  string strResult = mid(str_get, "href=\"", "\"", out start);
  if (strResult == null)
   break;
  else
  {
   lab[url] += strResult;
   str_get = str_get.Substring(start);
  }
 }
}
 
 
 
 
private string mid(string istr, string startString, string endString)
{
 int iBodyStart = istr.IndexOf(startString, 0);    // Start position 
 if (iBodyStart == -1)
  return null;
 iBodyStart += startString.Length;       // No. 1 1 Length from sub-character position 
 int iBodyEnd = istr.IndexOf(endString, iBodyStart);   // No. 1 2 The sub-character is in the first 1 The first position from the second character position 
 if (iBodyEnd == -1)
  return null;
 iBodyEnd += endString.Length;        // No. 1 2 Length from sub-character position 
 string strResult = istr.Substring(iBodyStart, iBodyEnd - iBodyStart - 1);
 return strResult;
}
 
 
private string mid(string istr, string startString, string endString, out int iBodyEnd)
{
 // Initialization out Parameter , Otherwise you can't return
 iBodyEnd = 0;
 
 int iBodyStart = istr.IndexOf(startString, 0);    // Start position 
 if (iBodyStart == -1)
  return null;
 iBodyStart += startString.Length;       // No. 1 1 Length from sub-character position 
 iBodyEnd = istr.IndexOf(endString, iBodyStart);   // No. 1 2 The sub-character is in the first 1 The first position from the second character position 
 if (iBodyEnd == -1)
  return null;
 iBodyEnd += endString.Length;        // No. 1 2 Length from sub-character position 
 string strResult = istr.Substring(iBodyStart, iBodyEnd - iBodyStart - 1);
 return strResult;
}

Well, that's all the code. If you want to run it, you have to modify some details yourself.

The above is the whole content of this paper, hoping to help everyone's study.


Related articles: