Bug 457506 - Can't read multibyte characters from wide-oriented unbuffered stream
Can't read multibyte characters from wide-oriented unbuffered stream
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
9
i686 Linux
low Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-01 03:54 EDT by Watanabe, Yuki
Modified: 2009-03-01 22:50 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-01 22:50:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (2.61 KB, patch)
2008-08-27 05:52 EDT, Denys Vlasenko
no flags Details | Diff
Updated patch (2.63 KB, patch)
2008-08-28 09:07 EDT, Denys Vlasenko
no flags Details | Diff

  None (edit)
Description Watanabe, Yuki 2008-08-01 03:54:16 EDT
Description of problem:
If a stream is set unbuffered by the "setvbuf" function, wide-oriented reading
functions like "fgetwc" fail to read multibyte characters.

Version-Release number of selected component (if applicable):
glibc-2.8-8 (i686)

Here is a test program:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
	wint_t c;
	setlocale(LC_ALL, "en_US.utf8");
	setvbuf(stdin, NULL, _IONBF, 0);
	while ((c = fgetwc(stdin)) != WEOF) {
		fputwc(c, stdout);
		fflush(stdout);
	}
	if (ferror(stdin))
		perror(NULL);
}

This program simply reads wide characters from stdin and echoes them back to stdout.
If you input singlebyte (ascii) characters, they are read and echoed properly.
But if you input multibyte characters, fgetwc returns WEOF with errno set to
EILSEQ; the characters are not echoed.
It seems that fgetwc is always reading only one byte, regardless of multibyte
characters. It is expected that fgetwc reads more than one bytes if needed to
make one wide character from multiple bytes.
Comment 1 Denys Vlasenko 2008-08-26 10:51:11 EDT
glibc-2.8/libio/wfileops.c:

  status = (*cd->__codecvt_do_in) (cd, &fp->_wide_data->_IO_state,
                                   fp->_IO_read_ptr, fp->_IO_read_end,
                                   &read_ptr_copy,
                                   fp->_wide_data->_IO_read_end,
                                   fp->_wide_data->_IO_buf_end,
                                   &fp->_wide_data->_IO_read_end);

  fp->_IO_read_ptr = (char *) read_ptr_copy;
  if (fp->_wide_data->_IO_read_end == fp->_wide_data->_IO_buf_base)
    {
      if (status == __codecvt_error || fp->_IO_read_end == fp->_IO_buf_end)
        {
          __set_errno (EILSEQ);
          fp->_flags |= _IO_ERR_SEEN;
          return WEOF;
        }

Unbuffered file uses fp->_shortbuf[1] as a one-byte buffer. But in above code fragment, this buffer is used to try to decode a wchar. With 1-byte buffer, wide char doesn't fix there, and we hit "fp->_IO_read_end == fp->_IO_buf_end" case. In other words, "the buffer is full but we still can't decode a single wchar_t, abort!".

Growing _shortbuf might break stuff elsewhere. I am leaning towards using a local small buffer instead.
Comment 2 Denys Vlasenko 2008-08-27 05:52:34 EDT
Created attachment 315076 [details]
Proposed patch

This what is working for me with this patch:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main(void)
{
    wint_t c;
    setlocale(LC_ALL, "en_US.utf8");
    setvbuf(stdin, NULL, _IONBF, 0);
    while ((c = fgetwc(stdin)) != WEOF) {
        fputwc(c, stdout);
        fflush(stdout);
    }
    if (ferror(stdin))
        perror(NULL);
}

# cat zz.txt
qwerty
йцукен

# LD_LIBRARY_PATH=. ./a.out <zz.txt
qwerty
йцукен
Comment 3 Denys Vlasenko 2008-08-27 06:13:36 EDT
Yuki, can you confirm attached patch works for you?
Comment 4 Watanabe, Yuki 2008-08-28 04:23:52 EDT
Sorry, I don't know how to build glibc from source and test it.
Comment 5 Denys Vlasenko 2008-08-28 09:07:45 EDT
Created attachment 315214 [details]
Updated patch
Comment 6 Watanabe, Yuki 2008-12-20 08:05:49 EST
Sorry for a long absence.

I built the library with the patch #315214 applied and
verified it works for me.
Comment 7 Ulrich Drepper 2009-02-04 16:28:31 EST
I fixed this upstream now.  With a different patch.  Should be in the next rawhide build.
Comment 8 lexual 2009-03-01 04:00:01 EST
Anyone confirm this has hit rawhide & can be closed?
Comment 9 Watanabe, Yuki 2009-03-01 22:50:19 EST
I tried rawhide 2.9.90-7 and verified it works.

Note You need to log in before you can comment on or make changes to this bug.