Bug 457506

Summary: Can't read multibyte characters from wide-oriented unbuffered stream
Product: [Fedora] Fedora Reporter: Watanabe, Yuki <magicant.starmen>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 9CC: drepper, dvlasenk, lex.lists
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-02 03:50:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed patch
none
Updated patch none

Description Watanabe, Yuki 2008-08-01 07:54:16 UTC
Description of problem:
If a stream is set unbuffered by the "setvbuf" function, wide-oriented reading
functions like "fgetwc" fail to read multibyte characters.

Version-Release number of selected component (if applicable):
glibc-2.8-8 (i686)

Here is a test program:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
	wint_t c;
	setlocale(LC_ALL, "en_US.utf8");
	setvbuf(stdin, NULL, _IONBF, 0);
	while ((c = fgetwc(stdin)) != WEOF) {
		fputwc(c, stdout);
		fflush(stdout);
	}
	if (ferror(stdin))
		perror(NULL);
}

This program simply reads wide characters from stdin and echoes them back to stdout.
If you input singlebyte (ascii) characters, they are read and echoed properly.
But if you input multibyte characters, fgetwc returns WEOF with errno set to
EILSEQ; the characters are not echoed.
It seems that fgetwc is always reading only one byte, regardless of multibyte
characters. It is expected that fgetwc reads more than one bytes if needed to
make one wide character from multiple bytes.

Comment 1 Denys Vlasenko 2008-08-26 14:51:11 UTC
glibc-2.8/libio/wfileops.c:

  status = (*cd->__codecvt_do_in) (cd, &fp->_wide_data->_IO_state,
                                   fp->_IO_read_ptr, fp->_IO_read_end,
                                   &read_ptr_copy,
                                   fp->_wide_data->_IO_read_end,
                                   fp->_wide_data->_IO_buf_end,
                                   &fp->_wide_data->_IO_read_end);

  fp->_IO_read_ptr = (char *) read_ptr_copy;
  if (fp->_wide_data->_IO_read_end == fp->_wide_data->_IO_buf_base)
    {
      if (status == __codecvt_error || fp->_IO_read_end == fp->_IO_buf_end)
        {
          __set_errno (EILSEQ);
          fp->_flags |= _IO_ERR_SEEN;
          return WEOF;
        }

Unbuffered file uses fp->_shortbuf[1] as a one-byte buffer. But in above code fragment, this buffer is used to try to decode a wchar. With 1-byte buffer, wide char doesn't fix there, and we hit "fp->_IO_read_end == fp->_IO_buf_end" case. In other words, "the buffer is full but we still can't decode a single wchar_t, abort!".

Growing _shortbuf might break stuff elsewhere. I am leaning towards using a local small buffer instead.

Comment 2 Denys Vlasenko 2008-08-27 09:52:34 UTC
Created attachment 315076 [details]
Proposed patch

This what is working for me with this patch:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
    wint_t c;
    setlocale(LC_ALL, "en_US.utf8");
    setvbuf(stdin, NULL, _IONBF, 0);
    while ((c = fgetwc(stdin)) != WEOF) {
        fputwc(c, stdout);
        fflush(stdout);
    }
    if (ferror(stdin))
        perror(NULL);
}

# cat zz.txt
qwerty
йцукен

# LD_LIBRARY_PATH=. ./a.out <zz.txt
qwerty
йцукен

Comment 3 Denys Vlasenko 2008-08-27 10:13:36 UTC
Yuki, can you confirm attached patch works for you?

Comment 4 Watanabe, Yuki 2008-08-28 08:23:52 UTC
Sorry, I don't know how to build glibc from source and test it.

Comment 5 Denys Vlasenko 2008-08-28 13:07:45 UTC
Created attachment 315214 [details]
Updated patch

Comment 6 Watanabe, Yuki 2008-12-20 13:05:49 UTC
Sorry for a long absence.

I built the library with the patch #315214 applied and
verified it works for me.

Comment 7 Ulrich Drepper 2009-02-04 21:28:31 UTC
I fixed this upstream now.  With a different patch.  Should be in the next rawhide build.

Comment 8 lexual 2009-03-01 09:00:01 UTC
Anyone confirm this has hit rawhide & can be closed?

Comment 9 Watanabe, Yuki 2009-03-02 03:50:19 UTC
I tried rawhide 2.9.90-7 and verified it works.