Using the WebClient class – with cookies!

Contrary to the somewhat crude HttpWebRequest class, the WebClient allows the developer to easily download and upload data and strings from and to webservers.

The rough edges of the HttpWebRequest class make developers jump to the WebClient (or HttpClient, since .NET 4.5) to significantly increase the ease-of-use. The WebClient has one drawback: it doesn’t support cookies!

Fortunately for us, the class isn’t sealed, which means we can extend it. If that is done in the following way, the WebClient gets support for cookies:

using System;
using System.IO;
using System.Net;

namespace WebClientWithCookies
{
  public class CookieAwareWebClient : WebClient
  {
    public CookieContainer CookieContainer { get; set; }

    public CookieAwareWebClient()
      : base()
    {
      CookieContainer = new CookieContainer();
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
      WebRequest request = base.GetWebRequest(address);
      
      HttpWebRequest webRequest = request as HttpWebRequest;
      if (webRequest != null)
      {
        webRequest.CookieContainer = CookieContainer;
      }

      return request;
    }
  }
}

The protected GetWebRequest method is called by the WebClient internally every time a WebRequest is needed. This means you can override it, call the base implementation and do your thing when an HttpWebRequest results from that. Alternatively an FtpWebRequest or FileWebRequest may be returned, in which case we don’t want to do anything but return that. The internals of HttpWebRequest and the according HttpWebResponse will keep the CookieContainer up-to-date: you don’t have to do anything about that in order for cookies to be set and sent.

One can now easily instantiate this custom WebClient and use it:

var webClient = new CookieAwareWebClient();
var loginPage = webClient.DownloadString(@"https://some-page-requiring-cookies");

Now when you want to make subsequent calls to the same site using said cookies, you’ll have to reuse the webClient instance. This is of course because each CookieAwareWebClient has its own CookieContainer instance. If reuse isn’t possible, for example because of some statelessness in the code using this WebClient, you can choose to simply serialize the CookieContainer to disk, and later restore it after instantating a new CookieAwareWebClient, by adding these two methods:

public void SaveCookies(string filename)
{
  var stream = File.Open(filename, FileMode.Create);
  var formatter = new BinaryFormatter();
  
  formatter.Serialize(stream, CookieContainer);
  stream.Close();
}
  
internal void LoadCookies(string filename)
{
  var stream = File.Open(filename, FileMode.Open);
  var formatter = new BinaryFormatter();
  
  CookieContainer = (CookieContainer)formatter.Deserialize(stream);
  stream.Close();
}

So, now you’re ready to log in to sites, while handling the uploading and downloading of data as simple strings, with (among others) the encoding magic being taken of by the WebClient implementation. Those strings are then ready to be parsed using, for example, the awesome Html Agility Pack.

Happy scraping!

This entry was posted in Tech and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>