CSharpFeeds - All your C# feeds in one place.

Sponsors

Tuesday, October 31, 2006

Removing diacritics (accents) from strings

by fmarguerie via Fabrice's weblog on 10/31/2006 4:50:00 AM

It's often useful to remove diacritic marks (often called accent marks) from characters. You know: tilde, cédille, umlaut and friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à' becomes 'a'. This could be used for indexing or to build simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can play with String.Replace or regular expressions... But do you know .NET 2 has all that is required to make this easier?

You should use this kind of code for example:

public static String RemoveDiacritics(String s)
{
  String normalizedString = s.Normalize(NormalizationForm.FormD);
  StringBuilder stringBuilder = new StringBuilder();

  for (int i = 0; i < normalizedString.Length; i++)
  {
    Char c = normalizedString[i];
    if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
      stringBuilder.Append(c);
  }

  return stringBuilder.ToString();
}

This piece of code comes from Michael Kaplan's blog. I've been using it to clean my URL keys. It works perfectly and I suspect it's more efficient than other solutions.

email it!bookmark it!digg it!

Original Post: Removing diacritics (accents) from strings

Legal Note

The content of the postings is owned by the respective author. CSharpFeeds is not responsible for the contents of the postings. This site is automatically generated and cannot be reviewed for abusive content. If you find abusive content on CSharpFeeds, please contact us. Designated trademarks and brands are the property of their respective owners. All rights reserved.

Advertise with us