Characters and string handling in C (ANSI characters and Unicode characters)
- 2020-04-02 03:01:17
- OfStack
As we know, the C language USES the char data type to represent an 8-bit ANSI character. When a string is declared in the code by default, the C compiler converts the characters in the string into an array of 8-bit char data types:
// An 8-bit character
char c = 'A';
// An array of 99 8-bit character and 8-bit terminating zero
char szBuffer[100] = "A String";
Microsoft's C/C++ compiler defines a built-in data type, wchar_t, which represents a 16-bit Unicode (utf-16) character. The compiler defines this parameter type only when it specifies the /Zc:wchar_t compiler switch.
Declare Unicode characters and strings as follows:
// A 16-bit character
wchar_t c= L'A';
// An array up to 99 16-bit characters and a 16-bit terminating zero
wchar_t szBuffer[100] = L"A String";
The uppercase L before the string tells the compiler that the string should compile a Unicode string.
In addition, when writing code, you can use ANSI or Unicode characters/strings to make it compilable. Winnt.h defines the following types and macros:
#ifdef UNICODE
typedef WCHAR TCHAR, *PTCHAR, PTSTR;
typedef CONST WCHAR *PCTSTR;
#define __TEXT(quote) L##quote
#else
typedef CHAR TCHAR, *PTCHAR, PTSTR;
typedef CONST CHAR *PCTSTR;
#define __TEXT(quote) quote
#endif
#define TEXT(quote) __TEXT(quote)
Using these types and macros to write code can be compiled using either ANSI or Unicode characters, as shown below:
// If UNICODE define, a 16-bit character; else an 8-bit character
TCHAR c = TEXT('A');
// If UNICODE define, an array of 16-bit character; else 8-bit character
TCHAR szBuffer[100] = TEXT("A String");
That's all for this article and I hope you enjoy it.