Characters and string handling in C (ANSI characters and Unicode characters)

  • 2020-04-02 03:01:17
  • OfStack

As we know, the C language USES the char data type to represent an 8-bit ANSI character. When a string is declared in the code by default, the C compiler converts the characters in the string into an array of 8-bit char data types:


// An 8-bit character
char c = 'A';
// An array of 99 8-bit character and 8-bit terminating zero
char szBuffer[100] = "A String";

Microsoft's C/C++ compiler defines a built-in data type, wchar_t, which represents a 16-bit Unicode (utf-16) character. The compiler defines this parameter type only when it specifies the /Zc:wchar_t compiler switch.

Declare Unicode characters and strings as follows:


// A 16-bit character
wchar_t c= L'A';
// An array up to 99 16-bit characters and a 16-bit terminating zero
wchar_t szBuffer[100] = L"A String";

The uppercase L before the string tells the compiler that the string should compile a Unicode string.

In addition, when writing code, you can use ANSI or Unicode characters/strings to make it compilable. Winnt.h defines the following types and macros:


#ifdef UNICODE
typedef WCHAR TCHAR, *PTCHAR, PTSTR;
typedef CONST WCHAR *PCTSTR;
#define __TEXT(quote) L##quote
#else
typedef CHAR TCHAR, *PTCHAR, PTSTR;
typedef CONST CHAR *PCTSTR;
#define __TEXT(quote) quote
#endif
#define TEXT(quote) __TEXT(quote)

Using these types and macros to write code can be compiled using either ANSI or Unicode characters, as shown below:


// If UNICODE define, a 16-bit character; else an 8-bit character
TCHAR c = TEXT('A');
// If UNICODE define, an array of 16-bit character; else 8-bit character
TCHAR szBuffer[100] = TEXT("A String");

That's all for this article and I hope you enjoy it.


Related articles: