I Can Code Too: Fetch HTML source of a web page in c#.net

Wednesday, 17 August 2011

Fetch HTML source of a web page in c#.net

I am working with my team-members on a project of Web- Crawling.
The very first step of the project, deals with extracting the HTML code of the web page.
So, let's see how we can do it:
We need two extra namespaces than usual, they are:

using System.Net;
using System.IO;

and the code is:

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://lpu.in");
request.UserAgent = "LPU Crawler";
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string httptxt = reader.ReadToEnd();

U can change the url as u wish, and the name of the user agent too. "HttpWebRequest" belongs to the namespace "System.Net" and "Stream" to "System.IO"
"httptxt" now has html source of the webpage, play with it as you want to.