Summary of c++ character types wchar_t char WCHAR

  • 2020-05-17 05:58:38
  • OfStack

1. wchar_t,char,WCHAR

ANSI: char, string manipulation functions such as strcat(),strcpy(), strlen(), str.

UNICODE: wchar_t is the data type of the Unicode character, which is actually defined in:

typedef unsigned short wchar_t;

In addition, there is a definition in the header file: typedef wchar_t WCHAR; So WCHAR is actually wchar_t

wchar_t can use string manipulation functions such as wcscat(),wcscpy(),wcslen(), and so on, starting with wcs. In order for the compiler to recognize the Unicode string, it is necessary to prefix it with 1 "L", for example: wchar_t *szTest=L"This is a Unicode string.";

2, TCHAR

The _UNICODE macro (with underline) is provided in C and UNICODE macro (without underline) is provided in Windows. As long as the _UNICODE macro and UNICODE macro are specified, the system will automatically switch to UNICODE version. Otherwise, the system will compile and run according to ANSI. Just defining macros does not automatically convert them. It also requires character definition support for series 1.

1. TCHAR

If the UNICODE macro is defined, TCHAR is defined as wchar_t.

typedef wchar_t TCHAR;

Otherwise TCHAR is defined as char typedef char TCHAR;

2. LPTSTR

If the UNICODE macro is defined, LPTSTR is defined as LPWSTR.

typedef LPTSTR LPWSTR;

Otherwise TCHAR is defined as char typedef LPTSTR LPSTR;

Note: when using string constants, use either _TEXT(" MyStr ") or _T("") to support automatic conversion of the system.

3, BSTR

BSTR is a string with a length prefix, which is mainly managed by the operating system, so api is used. It is mainly used to deal with VB (string in VB refers to it). There are many API functions to operate, such as SysAllocString,SysFreeString and so on.

The class that encapsulates it in vc is _bstr_t, CComBSTR in ATL, and so on.

An BSTR consists of a header and a string containing information about the length of the string, which can contain the embedded null value.

BSTR is passed as a pointer. (a pointer is a variable that contains the memory address of another variable, not data.) BSTR is Unicode, that is, two bytes per character. BSTR usually ends with the null character of two bytes. wstr is a wide character, representing 1 character in double bytes. bstr is for compatibility with the original basic character, and its first 4 bytes are its length, ending with '\0'.

4. The type definition of the further 1 step string and its pointer

Since the list of functions in the Win32 API document USES the common name of the function (for example, "SetWindowText"), all strings are defined in TCHAR. (except for API, which applies only to Unicode, introduced in XP). Below is a list of some commonly used typedefs, which you can see in msdn.

type Meaning in MBCS builds Meaning in Unicode builds
WCHAR wchar_t wchar_t
LPSTR char* char*
LPCSTR const char* const char*
LPWSTR wchar_t* wchar_t*
LPCWSTR wchar_t* wchar_t*
TCHAR TCHAR char wchar_t
LPTSTR TCHAR* TCHAR*
LPCTSTR const TCHAR* const TCHAR*

5. Switch from one to the other

(1) char* converted to CString

If you convert char* to CString, you can use CString::Format in addition to direct assignment. Such as:

char chArray[] = "This is a test";
char * p = "This is a test";

or

LPSTR p = "This is a test";

Or in the application for which Unicode is defined

TCHAR * p = _T("This is a test");

or

LPTSTR p = _T("This is a test");
CString theString = chArray;
theString.Format(_T("%s"), chArray);
theString = p;

(2) convert CString to char*

When converting the CString class to the char*(LPSTR) type, the following three methods are often used:

Method 1, use a cast.

Such as:

CString theString( "This is a test" );
LPTSTR lpsz =(LPTSTR)(LPCTSTR)theString;

Method 2, using strcpy.

Such as:

CString theString( "This is a test" );
LPTSTR lpsz = new TCHAR[theString.GetLength()+1];
_tcscpy(lpsz, theString);

It should be noted that the second parameter of strcpy(or _tcscpy of the removable value Unicode/MBCS) is const wchar_t* (Unicode) or const char* (ANSI), which will be automatically converted by the system compiler.

Method 3, using CString::GetBuffer.

Such as:

CString s(_T("This is a test "));
LPTSTR p = s.GetBuffer();
// add code that USES p here
if(p != NULL) *p = _T('\0');
s.ReleaseBuffer();
// release in time after use so that other CString member functions can be used

(3) BSTR is converted to char*

Method 1, using ConvertBSTRToString.

Such as:

#include
#pragma comment(lib, "comsupp.lib")
int _tmain(int argc, _TCHAR* argv[]){
BSTR bstrText = ::SysAllocString(L"Test");
char* lpszText2 = _com_util::ConvertBSTRToString(bstrText);
SysFreeString (bstrText); // use up and release
delete[] lpszText2;
return 0;
}

Method 2, overloaded with the assignment operator of _bstr_t.

Such as:

_bstr_t b = bstrText;
char* lpszText2 = b;

(4) char* converted to BSTR

Method 1, API functions such as SysAllocString are used.

Such as:

BSTR bstrText = ::SysAllocString(L"Test");
BSTR bstrText = ::SysAllocStringLen(L"Test",4);
BSTR bstrText = ::SysAllocStringByteLen("Test",4);

Method 2, using either COleVariant or _variant_t.

Such as:

//COleVariant strVar("This is a test");
_variant_t strVar("This is a test");
BSTR bstrText = strVar.bstrVal;

Method 3, using _bstr_t, is one of the simplest.

Such as:

BSTR bstrText = _bstr_t("This is a test");

Method 4, CComBSTR.

Such as:

BSTR bstrText = CComBSTR("This is a test");

or

CComBSTR bstr("This is a test");
BSTR bstrText = bstr.m_str;

Method 5, ConvertStringToBSTR.

Such as:

char* lpszText = "Test";
BSTR bstrText = _com_util::ConvertStringToBSTR(lpszText);

(5) CString is converted to BSTR

This is usually done by using CStringT::AllocSysString.

Such as:

CString str("This is a test");
BSTR bstrText = str.AllocSysString();
...
SysFreeString (bstrText); // use up and release

(6) BSTR is converted to CString

1. Generally, the following methods can be used:

BSTR bstrText = ::SysAllocString(L"Test");
CStringA str;
str.Empty();
str = bstrText;

or

CStringA str(bstrText);

(7) conversion between ANSI, Unicode and wide characters

Method 1: convert ANSI characters to Unicode characters using MultiByteToWideChar, and WideCharToMultiByte characters to ANSI characters using WideCharToMultiByte.

Method 2 USES "_T" to convert ANSI to "1-type" strings, "L" to convert ANSI to Unicode, and S to convert ANSI strings to String* objects in a managed C++ environment. Such as:

TCHAR tstr[] = _T("this is a test");
wchar_t wszStr[] = L"This is a test";
String* str = S "This is a test";

Method 3, using ATL 7.0's transform macros and classes. ATL7.0 improves and adds many string conversion macros and provides corresponding classes on the basis of the original 3.0. It has the unified 1 form as shown in figure 3:

Among them, the first C means "class", so that ATL 3.0 macro distinction, the second C means constant, 2 means "to", EX means to open up a fixed size buffer. SourceType and DestinationType can be A, T, W, and OLE, meaning ANSI, Unicode, "1-like" type, and OLE string, respectively. For example, CA2CT converts ANSI to a string constant of type 1.

Here's some sample code:


LPTSTR tstr= CA2TEX<16>("this is a test");
LPCTSTR tcstr= CA2CT("this is a test");
wchar_t wszStr[] = L"This is a test";
char* chstr = CW2A(wszStr);


Related articles: