Bug 457506 - Can't read multibyte characters from wide-oriented unbuffered stream
Summary: Can't read multibyte characters from wide-oriented unbuffered stream
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 9
Hardware: i686
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-08-01 07:54 UTC by Watanabe, Yuki
Modified: 2009-03-02 03:50 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-03-02 03:50:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Proposed patch (2.61 KB, patch)
2008-08-27 09:52 UTC, Denys Vlasenko
no flags Details | Diff
Updated patch (2.63 KB, patch)
2008-08-28 13:07 UTC, Denys Vlasenko
no flags Details | Diff

Description Watanabe, Yuki 2008-08-01 07:54:16 UTC
Description of problem:
If a stream is set unbuffered by the "setvbuf" function, wide-oriented reading
functions like "fgetwc" fail to read multibyte characters.

Version-Release number of selected component (if applicable):
glibc-2.8-8 (i686)

Here is a test program:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
	wint_t c;
	setlocale(LC_ALL, "en_US.utf8");
	setvbuf(stdin, NULL, _IONBF, 0);
	while ((c = fgetwc(stdin)) != WEOF) {
		fputwc(c, stdout);
		fflush(stdout);
	}
	if (ferror(stdin))
		perror(NULL);
}

This program simply reads wide characters from stdin and echoes them back to stdout.
If you input singlebyte (ascii) characters, they are read and echoed properly.
But if you input multibyte characters, fgetwc returns WEOF with errno set to
EILSEQ; the characters are not echoed.
It seems that fgetwc is always reading only one byte, regardless of multibyte
characters. It is expected that fgetwc reads more than one bytes if needed to
make one wide character from multiple bytes.

Comment 1 Denys Vlasenko 2008-08-26 14:51:11 UTC
glibc-2.8/libio/wfileops.c:

  status = (*cd->__codecvt_do_in) (cd, &fp->_wide_data->_IO_state,
                                   fp->_IO_read_ptr, fp->_IO_read_end,
                                   &read_ptr_copy,
                                   fp->_wide_data->_IO_read_end,
                                   fp->_wide_data->_IO_buf_end,
                                   &fp->_wide_data->_IO_read_end);

  fp->_IO_read_ptr = (char *) read_ptr_copy;
  if (fp->_wide_data->_IO_read_end == fp->_wide_data->_IO_buf_base)
    {
      if (status == __codecvt_error || fp->_IO_read_end == fp->_IO_buf_end)
        {
          __set_errno (EILSEQ);
          fp->_flags |= _IO_ERR_SEEN;
          return WEOF;
        }

Unbuffered file uses fp->_shortbuf[1] as a one-byte buffer. But in above code fragment, this buffer is used to try to decode a wchar. With 1-byte buffer, wide char doesn't fix there, and we hit "fp->_IO_read_end == fp->_IO_buf_end" case. In other words, "the buffer is full but we still can't decode a single wchar_t, abort!".

Growing _shortbuf might break stuff elsewhere. I am leaning towards using a local small buffer instead.

Comment 2 Denys Vlasenko 2008-08-27 09:52:34 UTC
Created attachment 315076 [details]
Proposed patch

This what is working for me with this patch:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
    wint_t c;
    setlocale(LC_ALL, "en_US.utf8");
    setvbuf(stdin, NULL, _IONBF, 0);
    while ((c = fgetwc(stdin)) != WEOF) {
        fputwc(c, stdout);
        fflush(stdout);
    }
    if (ferror(stdin))
        perror(NULL);
}

# cat zz.txt
qwerty
йцукен

# LD_LIBRARY_PATH=. ./a.out <zz.txt
qwerty
йцукен

Comment 3 Denys Vlasenko 2008-08-27 10:13:36 UTC
Yuki, can you confirm attached patch works for you?

Comment 4 Watanabe, Yuki 2008-08-28 08:23:52 UTC
Sorry, I don't know how to build glibc from source and test it.

Comment 5 Denys Vlasenko 2008-08-28 13:07:45 UTC
Created attachment 315214 [details]
Updated patch

Comment 6 Watanabe, Yuki 2008-12-20 13:05:49 UTC
Sorry for a long absence.

I built the library with the patch #315214 applied and
verified it works for me.

Comment 7 Ulrich Drepper 2009-02-04 21:28:31 UTC
I fixed this upstream now.  With a different patch.  Should be in the next rawhide build.

Comment 8 lexual 2009-03-01 09:00:01 UTC
Anyone confirm this has hit rawhide & can be closed?

Comment 9 Watanabe, Yuki 2009-03-02 03:50:19 UTC
I tried rawhide 2.9.90-7 and verified it works.


Note You need to log in before you can comment on or make changes to this bug.