From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) Description of problem: This happens for many encodings but the test case uses ISO- 10646/UCS2/. When converting from UCS4-LE to ISO-10646/UCS2/ iconv consumes certain characters without producing anything in the output buffer and also without giving any errors. This happens for example for characters 0xE0000 to 0xE007F. Please see the attached C program: #include <stdio.h> #include <wchar.h> #include <iconv.h> int main() { wchar_t inbuf[16] = { 0xE0000, 0, }; char outbuf[16] = { 0, }; char *in_ptr = (char *) inbuf; size_t in_size = sizeof(wchar_t); char *out_ptr = outbuf; size_t out_size = sizeof outbuf; iconv_t enc = iconv_open("ISO-10646/UCS2/", "UCS-4LE"); int n = iconv(enc, &in_ptr, &in_size, &out_ptr, &out_size); printf("n = %d in_size = %d out_size = %d\n", n, in_size, out_size); iconv_close(enc); return 0; } Version-Release number of selected component (if applicable): glibc-2.3.3-27 How reproducible: Always Steps to Reproduce: 1. gcc -o iconv_bug2 iconv_bug2.c 2. ./iconv_bug2 Actual Results: n = 0 in_size = 0 out_size = 16 Expected Results: Depending on the encoding, either an error (n = - 1) or some output produced (out_size < 16). Additional info:
Created attachment 102608 [details] test case File name is iconv_bug2.c
This is no bug. The UCS4 values in the range from e0000 to e007f are tags. They produce no output but are always recognized.
The Unicode spec. recommends ignoring the tag characters when processing text, but that really seems like something that should apply to code that has some understanding of the characters, not something that is converting between different encodings. Converting encodings should not be a lossy conversion, otherwise it would mean that converting back and forth does not round-trip safely, and anything which might actually want to use tag characters can't use these conversion routines because the tag characters will be gone.
The code is correct. Everything you perceive as a problem is your opinion and I don't agree with it. Don't reopen this bug again, there is no bug.