Bug 129657 - iconv consumes characters without any error or output
iconv consumes characters without any error or output
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Depends On:
  Show dependency treegraph
Reported: 2004-08-11 11:25 EDT by Boleslaw Ciesielski
Modified: 2007-11-30 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-10-05 04:50:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
test case (496 bytes, text/plain)
2004-08-11 11:27 EDT, Boleslaw Ciesielski
no flags Details

  None (edit)
Description Boleslaw Ciesielski 2004-08-11 11:25:26 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
SV1; .NET CLR 1.1.4322)

Description of problem:
This happens for many encodings but the test case uses ISO-
10646/UCS2/. When converting from UCS4-LE to ISO-10646/UCS2/ iconv 
consumes certain characters without producing anything in the output 
buffer and also without giving any errors. This happens for example 
for characters 0xE0000 to 0xE007F.

Please see the attached C program:

#include <stdio.h>
#include <wchar.h>
#include <iconv.h>

int main()
  wchar_t inbuf[16] = { 0xE0000, 0, };
  char outbuf[16] = { 0, };
  char *in_ptr = (char *) inbuf;
  size_t in_size = sizeof(wchar_t);
  char *out_ptr = outbuf;
  size_t out_size = sizeof outbuf;
  iconv_t enc = iconv_open("ISO-10646/UCS2/", "UCS-4LE");
  int n = iconv(enc, &in_ptr, &in_size, &out_ptr, &out_size);
  printf("n = %d  in_size = %d  out_size = %d\n", n, in_size, 

  return 0;

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. gcc -o iconv_bug2 iconv_bug2.c
2. ./iconv_bug2


Actual Results:  n = 0  in_size = 0  out_size = 16

Expected Results:  Depending on the encoding, either an error (n = -
1) or some output produced (out_size < 16).

Additional info:
Comment 1 Boleslaw Ciesielski 2004-08-11 11:27:02 EDT
Created attachment 102608 [details]
test case

File name is iconv_bug2.c
Comment 2 Ulrich Drepper 2004-09-30 05:38:43 EDT
This is no bug.  The UCS4 values in the range from e0000 to e007f are
tags.  They produce no output but are always recognized.
Comment 3 Boleslaw Ciesielski 2004-10-05 04:45:43 EDT
The Unicode spec. recommends ignoring the tag characters when 
processing text, but that really seems like something that should 
apply to code that has some understanding of the characters, not 
something that is converting between different encodings.
Converting encodings should not be a lossy conversion, otherwise it 
would mean that converting back and forth does not round-trip safely, 
and anything which might actually want to use tag characters can't 
use these conversion routines because the tag characters will be gone.
Comment 4 Ulrich Drepper 2004-10-05 04:50:23 EDT
The code is correct.  Everything you perceive as a problem is your
opinion and I don't agree with it.  Don't reopen this bug again, there
is no bug.

Note You need to log in before you can comment on or make changes to this bug.