Bug 129657 - iconv consumes characters without any error or output
Summary: iconv consumes characters without any error or output
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 2
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-08-11 15:25 UTC by Boleslaw Ciesielski
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-10-05 08:50:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test case (496 bytes, text/plain)
2004-08-11 15:27 UTC, Boleslaw Ciesielski
no flags Details

Description Boleslaw Ciesielski 2004-08-11 15:25:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
SV1; .NET CLR 1.1.4322)

Description of problem:
This happens for many encodings but the test case uses ISO-
10646/UCS2/. When converting from UCS4-LE to ISO-10646/UCS2/ iconv 
consumes certain characters without producing anything in the output 
buffer and also without giving any errors. This happens for example 
for characters 0xE0000 to 0xE007F.

Please see the attached C program:

#include <stdio.h>
#include <wchar.h>
#include <iconv.h>

int main()
{
  wchar_t inbuf[16] = { 0xE0000, 0, };
  char outbuf[16] = { 0, };
  char *in_ptr = (char *) inbuf;
  size_t in_size = sizeof(wchar_t);
  char *out_ptr = outbuf;
  size_t out_size = sizeof outbuf;
  iconv_t enc = iconv_open("ISO-10646/UCS2/", "UCS-4LE");
  int n = iconv(enc, &in_ptr, &in_size, &out_ptr, &out_size);
  printf("n = %d  in_size = %d  out_size = %d\n", n, in_size, 
out_size);

  iconv_close(enc);
  return 0;
}


Version-Release number of selected component (if applicable):
glibc-2.3.3-27

How reproducible:
Always

Steps to Reproduce:
1. gcc -o iconv_bug2 iconv_bug2.c
2. ./iconv_bug2

    

Actual Results:  n = 0  in_size = 0  out_size = 16

Expected Results:  Depending on the encoding, either an error (n = -
1) or some output produced (out_size < 16).

Additional info:

Comment 1 Boleslaw Ciesielski 2004-08-11 15:27:02 UTC
Created attachment 102608 [details]
test case

File name is iconv_bug2.c

Comment 2 Ulrich Drepper 2004-09-30 09:38:43 UTC
This is no bug.  The UCS4 values in the range from e0000 to e007f are
tags.  They produce no output but are always recognized.

Comment 3 Boleslaw Ciesielski 2004-10-05 08:45:43 UTC
The Unicode spec. recommends ignoring the tag characters when 
processing text, but that really seems like something that should 
apply to code that has some understanding of the characters, not 
something that is converting between different encodings.
Converting encodings should not be a lossy conversion, otherwise it 
would mean that converting back and forth does not round-trip safely, 
and anything which might actually want to use tag characters can't 
use these conversion routines because the tag characters will be gone.

Comment 4 Ulrich Drepper 2004-10-05 08:50:23 UTC
The code is correct.  Everything you perceive as a problem is your
opinion and I don't agree with it.  Don't reopen this bug again, there
is no bug.


Note You need to log in before you can comment on or make changes to this bug.