CSharpFeeds - All your C# feeds in one place.

Sponsors

Thursday, September 13, 2007

Resolve and shorten URLs in C#

by Mads Kristensen via .NET Slave on 9/13/2007 7:20:00 PM

Recently I’ve needed a method that would look at some text and automatically discover all URLs and turn them into hyperlinks. I’ve done that before so it was a matter of copy/paste. This time it was a little more complicated, because the resolved URLs could not be longer than 50 characters long. That was important because otherwise it would break the design. A long URL doesn’t word wrap so it would end up bleeding out of the design.

So, the challenge was to resolve the URLs and turn them into links, while keeping the anchor text at a max of 50 characters long. To shorten the URL is easy enough, but it all comes down to how you want it shortened.

The rules

1. If the URL is longer than 50 characters then remove “http://”.
2. If it still is longer than allowed it must compress the folder structure like shown below.

http://www.microsoft.com/windows/server/2003/compare.aspx -> http://www.microsoft.com/.../compare.aspx

3. If the URL is still longer, then it must look for query strings and fragments and remove them as well.

The code

private static readonly Regex regex = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
private static readonly string link = "<a href=\"{0}{1}\">{2}</a>";

public static string ResolveLinks(string body)
{
  if (string.IsNullOrEmpty(body))
    return body; 

  foreach (Match match in regex.Matches(body))
  {
    if (!match.Value.Contains("://"))
    {         
      body = body.Replace(match.Value, string.Format(link, "http://", match.Value, ShortenUrl(match.Value, 50)));
    }
    else
    {
      body = body.Replace(match.Value, string.Format(link, string.Empty, match.Value, ShortenUrl(match.Value, 50)));
    }
  }

  return body;
}

private static string ShortenUrl(string url, int max)
{
  if (url.Length <= max)
    return url;

  // Remove the protocal
  int startIndex = url.IndexOf("://");
  if (startIndex > -1)
    url = url.Substring(startIndex + 3);

  if (url.Length <= max)
    return url;

  // Remove the folder structure
  int firstIndex = url.IndexOf("/") + 1;
  int lastIndex = url.LastIndexOf("/");
  if (firstIndex < lastIndex)
    url = url.Replace(url.Substring(firstIndex, lastIndex - firstIndex), "...");

  if (url.Length <= max)
    return url;

  // Remove URL parameters
  int queryIndex = url.IndexOf("?");
  if (queryIndex > -1)
    url = url.Substring(0, queryIndex);

  if (url.Length <= max)
    return url;

  // Remove URL fragment
  int fragmentIndex = url.IndexOf("#");
  if (fragmentIndex > -1)
    url = url.Substring(0, fragmentIndex);

  if (url.Length <= max)
    return url;

  // Shorten page
  firstIndex = url.LastIndexOf("/") + 1;
  lastIndex = url.LastIndexOf(".");
  if (lastIndex - firstIndex > 10)
  {
    string page = url.Substring(firstIndex, lastIndex - firstIndex);
    int length = url.Length - max + 3;
    url = url.Replace(page, "..." + page.Substring(length));
  }

  return url;
}

Implementation

To use these methods, just call the ResolveLinks method like so:

string body = ResolveLinks(txtComment.Text);

It works on URLs with or without the http:// protocol prefix. In other words http://www.example.com/ and http://www.example.com/ resolves to the same URL. This technique is implemented in the comments on this blog. You can test it by writing a comment with a URL in it.

email it!bookmark it!digg it!

Original Post: Resolve and shorten URLs in C#

Subscribe

New Feed

Product Spotlight

Recently Updated Sources

Legal Note

The content of the postings is owned by the respective author. CSharpFeeds is not responsible for the contents of the postings. This site is automatically generated and cannot be reviewed for abusive content. If you find abusive content on CSharpFeeds, please contact us. Designated trademarks and brands are the property of their respective owners. All rights reserved.

Advertise with us