How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0

Lately I’ve been working on my side project called Twime. So as part of that project, I wanted to add the ability to parse URLs, usernames and hashtags from the user’s Twitter timeline. Here’s how I went about doing that:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Text.RegularExpressions;

public static class HTMLParser
{
    public static string Link(this string s, string url)
    {
        return string.Format("<a href=\"{0}\" target=\"_blank\">{1}</a>", url, s);
    }
    public static string ParseURL(this string s)
    {
        return Regex.Replace(s, @"(http(s)?://)?([\w-]+\.)+[\w-]+(/\S\w[\w- ;,./?%&=]\S*)?", new MatchEvaluator(HTMLParser.URL));
    }
    public static string ParseUsername(this string s)
    {
        return Regex.Replace(s, "(@)((?:[A-Za-z0-9-_]*))", new MatchEvaluator(HTMLParser.Username));
    }
    public static string ParseHashtag(this string s)
    {
        return Regex.Replace(s, "(#)((?:[A-Za-z0-9-_]*))", new MatchEvaluator(HTMLParser.Hashtag));
    }
    private static string Hashtag(Match m)
    {
        string x = m.ToString();
        string tag = x.Replace("#", "%23");
        return x.Link("http://search.twitter.com/search?q=" + tag);
    }
    private static string Username(Match m)
    {
        string x = m.ToString();
        string username = x.Replace("@", "");
        return x.Link("http://twitter.com/" + username);
    }
    private static string URL(Match m)
    {
        string x = m.ToString();
        return x.Link(x);
    }
}

So as you can see I’m using the new Extension Methods feature in C# 3.0.

Now I can simply just call the extension methods like this:

string tweet = "Just blogged about how to parse HTML from the @twitter timeline - http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/ #programming";
Response.Write(tweet.ParseURL().ParseUsername().ParseHashtag());

and the result should looks something like this:

Just blogged about how to parse html from the @twitter timeline - http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/ #programming

Just be sure to call ParseURL method before ParseUsername and ParseHashtag. The other two methods will add URLs to the usernames and hashtags and you don’t want ParseURL to confuse those links with the original links present in the text.

This was inspired by Simon Whatley’s post about doing something similar using prototyping with JavaScript.

If you have any questions or comments, please post them below. If you liked this post, you can share it with your followers or follow me on Twitter!