Unicode and ANSI modes

As not all platforms supported by wxWidgets support Unicode (fully) yet, in many cases it is unwise to write a program which can only work in Unicode environment. A better solution is to write programs in such way that they may be compiled either in ANSI (traditional) mode or in the Unicode one.

This can be achieved quite simply by using the means provided by wxWidgets. Basically, there are only a few things to watch out for:

Let's look at them in order. First of all, each character in an Unicode program takes 2 bytes instead of usual one, so another type should be used to store the characters (char only holds 1 byte usually). This type is called wchar_t which stands for wide-character type.

Also, the string and character constants should be encoded using wide characters (wchar_t type) which typically take $2$ or $4$ bytes instead of char which only takes one. This is achieved by using the standard C (and C++) way: just put the letter 'L' after any string constant and it becomes a long constant, i.e. a wide character one. To make things a bit more readable, you are also allowed to prefix the constant with 'L' instead of putting it after it.

Of course, the usual standard C functions don't work with wchar_t strings, so another set of functions exists which do the same thing but accept wchar_t * instead of char *. For example, a function to get the length of a wide-character string is called wcslen() (compare with strlen() - you see that the only difference is that the "str" prefix standing for "string" has been replaced with "wcs" standing for "wide-character string").

And finally, the standard preprocessor tokens enumerated above expand to ANSI strings but it is more likely that Unicode strings are wanted in the Unicode build. wxWidgets provides the macros __TFILE__, __TDATE__ and __TTIME__ which behave exactly as the standard ones except that they produce ANSI strings in ANSI build and Unicode ones in the Unicode build.

To summarize, here is a brief example of how a program which can be compiled in both ANSI and Unicode modes could look like:

#ifdef __UNICODE__
    wchar_t wch = L'*';
    const wchar_t *ws = L"Hello, world!";
    int len = wcslen(ws);

    wprintf(L"Compiled at %s\n", __TDATE__);
#else // ANSI
    char ch = '*';
    const char *s = "Hello, world!";
    int len = strlen(s);

    printf("Compiled at %s\n", __DATE__);
#endif // Unicode/ANSI

Of course, it would be nearly impossibly to write such programs if it had to be done this way (try to imagine the number of #ifdef UNICODE an average program would have had!). Luckily, there is another way - see the next section.

ymasuda 平成17年11月19日