Net of c Chinese Character and Unicode Code Conversion Example

  • 2021-12-04 19:27:52
  • OfStack


{"Tilte": "\u535a\u5ba2\u56ed", "Href": "https://www.ofstack.com"}

json strings with such contents are often encountered, and the Chinese characters in them are originally converted into Unicode codes.

Unicode code:

Chinese characters are encoded by UNICODE, for example, "Wang" becomes "\ Wang" after encoding, UNICODE characters start with\ u, followed by four numbers or letters, all characters are hexadecimal numbers, and every two digits represent one number within 256. A Chinese character is composed of two characters, so it is easy to understand. "738b" is two characters, namely "73" and "8b". However, when converting the content of UNICODE character coding into Chinese characters, the characters are processed from the back to the front, so it is necessary to combine the characters in the order of "8b" and "73" to obtain Chinese characters.

Realization of Unicode/Chinese character mutual conversion:


/// <summary>
/// <summary>
///  String conversion Unicode
/// </summary>
/// <param name="source"> Source string </param>
/// <returns>Unicode Encoded string </returns>
public static string String2Unicode(string source)
{
 byte[] bytes = Encoding.Unicode.GetBytes(source);
 StringBuilder stringBuilder = new StringBuilder();
 for (int i = 0; i < bytes.Length; i += 2)
 {
  stringBuilder.AppendFormat("\\u{0}{1}", bytes[i + 1].ToString("x").PadLeft(2, '0'), bytes[i].ToString("x").PadLeft(2, '0'));
 }
 return stringBuilder.ToString();
}

/// <summary>
/// Unicode Transform string 
/// </summary>
/// <param name="source"> Pass by Unicode Encoded string </param>
/// <returns> Normal string </returns>
public static string Unicode2String(string source)
{
 return new Regex(@"\\u([0-9A-F]{4})", RegexOptions.IgnoreCase | RegexOptions.Compiled).Replace(
     source, x => string.Empty + Convert.ToChar(Convert.ToUInt16(x.Result("$1"), 16)));
}

Related articles: