[olug] OT? C programming question.

Rob Townley rob.townley at gmail.com
Wed Sep 1 21:37:51 UTC 2010


Since this may be an international application, he may want to verify
that he is not outputting unicode characters to something expecting
single byte characters.  One simple but not all inclusive test is to
write 10 characters to a file, if the size of the file is about 20
bytes, it is unicode.  The first 127 or so characters of UTF-8 are
8bit and after that, they are 16bit.  I found out the hardway that
there are a few characters (maybe 4) in the lower 127 that are not as
standardized such as the ~tilde.

I would not be surprised if Ubuntu uses some form of unicode by
default.  My Fedora 13 shell uses UTF-8, but the program below will
use ANSI.  But this depends on so many factors.   Or he has multiple
languages installed which is so easy in Ubuntu.

Just because it is a terminal, does not mean it is ASCII!  He should
probably call setlocale to be certain it is all 8bit byte ASCII or
whatever the equipment expects.  Manually computing the length of the
strings output by write --- just asking for problems.  That is what
strlen and its internationalized versions are designed for.  There is
no longer necessarily a one-to-one correspondence of number of
characters to number of bytes and there hasn't been for about a
decade.

`locale` will tell you what character set your _shell_ is using.

The following code says what locale your _program_ is running with:
#include <langinfo.h>
int main ()
{
  printf("CODESET USED = \"%s\" \n", nl_langinfo(CODESET) );
  return 0;
}

man nl_langinfo
man utf-8
man charset
man unicode

fwrite, fprintf, fopen

locale        (run from bash prompt, gives language and collating type)
locale -m   (run from bash prompt, gives a list of all locales)



More information about the OLUG mailing list