06 May 2008
I recently discovered an absolutely amazing HTML parsing library for .NET called HtmlAgilityPack. It completely takes away the pain of parsing complicated HTML with regular expressions.
Here’s a very simple example of what you could do with it - I’m just extracting inner HTML from any element inside a HTML file which has a css class called “scrape” assigned to it:
using HtmlAgilityPack;
public partial class _Default : System.Web.UI.Page
protected void Page_Load(object sender, EventArgs e)
HtmlDocument doc = new HtmlDocument();
private void Parse(HtmlNode n)
foreach (HtmlAttribute atr in n.Attributes)
if (atr.Name == "class" && atr.Value == "scrape")
if (n.HasChildNodes)
foreach (HtmlNode cn in n.ChildNodes)
That’s just a very small part of what it could do. I’ll expand upon this and post a few more examples in the future showing some interesting things you could do with this.
23 Apr 2008
This is a simple little random fact generator which will show a new fact every time the page loads. After the initial load it will store the XML in the cache until the file is changed again.
XML: (Facts.xml)
<?xml version="1.0" encoding="utf-8" ?>
The numbers '172' can be found on the back of the U.S. $5 dollar
bill in the bushes at the base of the Lincoln Memorial.
President Kennedy was the fastest random speaker in the world
with upwards of 350 words per minute.
In the average lifetime, a person will walk the equivalent of 5
times around the equator.
Code: (RandomFact.ascx.cs)
using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Caching;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using System.Xml;
using System.IO;
using System.ComponentModel;
using System.Drawing.Design;
public partial class _controls_RandomFact : System.Web.UI.UserControl
private string _xmlDataSource;
public string XMLDataSource
get { return _xmlDataSource; }
set { _xmlDataSource = value; }
protected void Page_Load(object sender, EventArgs e)
litFact.Text = getRandomFact();
private string getRandomFact()
Random rndIndex = new Random();
XmlDocument xmlDocFacts = new XmlDocument();
string strFact = string.Empty;
if (Cache["xmlDocFacts"] != null)
xmlDocFacts = (XmlDocument)Cache["xmlDocFacts"];
Cache.Insert("xmlDocFacts", xmlDocFacts, new CacheDependency(Server.MapPath(XMLDataSource)));
XmlNodeList xmlNodesMessage = xmlDocFacts.SelectNodes("//fact");
int rnd = rndIndex.Next(0, xmlNodesMessage.Count);
strFact = Server.HtmlEncode(xmlNodesMessage[rnd].InnerText);
catch (Exception ex)
strFact = string.Format("<b>Error:</b> {0}", ex.Message);
return strFact;
Usage: (Default.aspx)
<uc1:RandomFact ID="RandomFact1" runat="server" XMLDataSource="App_Data/Facts.xml" />
22 Apr 2008
Based on the reader comments on my previous entry on this topic I was able to fix some of the issues that others were experiencing.
I changed how the output is read, instead of reading the entire stream at once, its now read line-by-line as ErrorDataReceived and OutputDataReceived events are raised. Also added an extra option in the command line (-ar 44100) to explicitly set the audio frequency to default since it wasn’t being applied to some video formats resulting in an error. And lastly, the console window is now set as hidden.
private void ConvertVideo(string srcURL, string destURL)
string ffmpegURL = "~/project/tools/ffmpeg.exe";
DirectoryInfo directoryInfo = new DirectoryInfo(Path.GetDirectoryName(Server.MapPath(ffmpegURL)));
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName = Server.MapPath(ffmpegURL);
startInfo.Arguments = string.Format("-i \"{0}\" -aspect 1.7777 -ar 44100 -f flv \"{1}\"", srcURL, destURL);
startInfo.WorkingDirectory = directoryInfo.FullName;
startInfo.UseShellExecute = false;
startInfo.RedirectStandardOutput = true;
startInfo.RedirectStandardInput = true;
startInfo.RedirectStandardError = true;
startInfo.CreateNoWindow = true;
startInfo.WindowStyle = ProcessWindowStyle.Hidden;
using (Process process = new Process())
process.StartInfo = startInfo;
process.EnableRaisingEvents = true;
process.ErrorDataReceived += new DataReceivedEventHandler(process_ErrorDataReceived);
process.OutputDataReceived += new DataReceivedEventHandler(process_OutputDataReceived);
process.Exited += new EventHandler(process_Exited);
catch (Exception ex)
lblError.Text = ex.ToString();
process.ErrorDataReceived -= new DataReceivedEventHandler(process_ErrorDataReceived);
process.OutputDataReceived -= new DataReceivedEventHandler(process_OutputDataReceived);
process.Exited -= new EventHandler(process_Exited);
void process_OutputDataReceived(object sender, DataReceivedEventArgs e)
if (e.Data != null)
lblStdout.Text += e.Data.ToString() + "<br />";
void process_ErrorDataReceived(object sender, DataReceivedEventArgs e)
if (e.Data != null)
lblStderr.Text += e.Data.ToString() + "<br />";
void process_Exited(object sender, EventArgs e)
//Post-processing code goes here
18 Apr 2008
Recently I was involved in a project where I had to make heavy use of AJAX. I realized there are a few simple things you could do to improve performance.
1) Combine scripts
<ajaxToolkit:ToolkitScriptManager ID="TSM1" runat="Server"
CombineScriptsHandlerUrl="~/CombineScriptsHandler.ashx" />
As the name of the property suggests, it will pretty much combine all the needed JS files into one which in turn will reduce the number of requests sent to the server. You can find a detailed discussion about this here.
It is pretty easy to implement; instead of using the regular ScriptManager, just switch to the ToolkitScriptManager which comes with the AjaxToolkit and then set its CombineScriptsHandlerUrl property as shown above and throw the CombineScriptsHandler.ashx (included in the “SampleWebSite” directory of AjaxControlToolkit’s release package) into the root.
2) Run in release mode
The debug versions of the AJAX library have their source formatting preserved, as well as some debug asserts. By running it in release mode you can shave off some bytes off your requests.
<ajaxToolkit:ToolkitScriptManager ID="TSM1" runat="Server"
EnablePartialRendering="true" ScriptMode="Release" />
Although, its important to note that some versions of Safari don’t seem to be compatible with this and could cause many strange side effects as this person and I have experienced in the past.
On a side note, ASP.NET AJAX Control Toolkit officially does not support Macs with PowerPC processors, its good to know that piece of information if a client ever demands an explanation as for why AJAX powered functionality seems to be broken or not functioning as expected in that environment.
3) Enable script caching and compression in web.config
<scriptResourceHandler enableCompression="true"
This will compress and cache all the script files which are embedded as resources in an assembly, localization objects, and scripts that are served by the script resource handler.
But like the previous tip, there is a exception to this one too. Some versions of IE6 have a bug where they cant’t handle GZIP’d script files correctly. The RTM version of ASP.NET AJAX works around this by explicitly not compressing files for these versions of IE. Although if you are still having a problem, it just might be a safe bet to explicitly set the enableCompression property to false in the web.config.
08 Apr 2008
In my previous post about exception logging, I show how to log several different parameters related to the exception in the database. Request.Browser.Crawler is one of them and its used to track browser crawlers. It warrants its own separate entry since it requires some extra bit of setup in the web.config to get it to work correctly.
You’ll have to add the following code in the section of your web.config file:
<!-- This section is used by Request.Browser.Crawler property to detect search engine crawlers -->
<!-- check Google (Yahoo uses this as well) -->
<case match="^Googlebot(\-Image)?/(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
<!-- check Alta Vista (Scooter) -->
<case match="^Scooter(/|-)(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
<!-- check Alta Vista (Mercator) -->
<case match="Mercator">
<!-- check Slurp (Yahoo uses this as well) -->
<case match="Slurp">
<!-- check MSN -->
<case match="MSNBOT">
<!-- check Northern Light -->
<case match="^Gulliver/(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
<!-- check Excite -->
<case match="ArchitextSpider">
<!-- Lycos -->
<case match="Lycos_Spider">
<!-- Ask Jeeves -->
<case match="Ask Jeeves">
<!-- check Fast -->
<case match="^FAST-WebCrawler/(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
<!-- IBM Research Web Crawler -->
<case match="http\:\/\/www\.almaden.ibm.com\/cs\/crawler">
Now what does it all mean? Well, IIS uses that information in the section of your config file to detect whether the client browser is a crawler or not. If you look at it closely, its basically a regular expression filter. I presume you could add more filters in a similar format to detect other kinds of crawlers.
Update: For the most accurate and updated version of browserCaps and other useful browser testing/detection resources you can go to one of these sites: