Description of problem: If a stream is set unbuffered by the "setvbuf" function, wide-oriented reading functions like "fgetwc" fail to read multibyte characters. Version-Release number of selected component (if applicable): glibc-2.8-8 (i686) Here is a test program: #include <stdio.h> #include <locale.h> #include <wchar.h> int main(void) { wint_t c; setlocale(LC_ALL, "en_US.utf8"); setvbuf(stdin, NULL, _IONBF, 0); while ((c = fgetwc(stdin)) != WEOF) { fputwc(c, stdout); fflush(stdout); } if (ferror(stdin)) perror(NULL); } This program simply reads wide characters from stdin and echoes them back to stdout. If you input singlebyte (ascii) characters, they are read and echoed properly. But if you input multibyte characters, fgetwc returns WEOF with errno set to EILSEQ; the characters are not echoed. It seems that fgetwc is always reading only one byte, regardless of multibyte characters. It is expected that fgetwc reads more than one bytes if needed to make one wide character from multiple bytes.
glibc-2.8/libio/wfileops.c: status = (*cd->__codecvt_do_in) (cd, &fp->_wide_data->_IO_state, fp->_IO_read_ptr, fp->_IO_read_end, &read_ptr_copy, fp->_wide_data->_IO_read_end, fp->_wide_data->_IO_buf_end, &fp->_wide_data->_IO_read_end); fp->_IO_read_ptr = (char *) read_ptr_copy; if (fp->_wide_data->_IO_read_end == fp->_wide_data->_IO_buf_base) { if (status == __codecvt_error || fp->_IO_read_end == fp->_IO_buf_end) { __set_errno (EILSEQ); fp->_flags |= _IO_ERR_SEEN; return WEOF; } Unbuffered file uses fp->_shortbuf[1] as a one-byte buffer. But in above code fragment, this buffer is used to try to decode a wchar. With 1-byte buffer, wide char doesn't fix there, and we hit "fp->_IO_read_end == fp->_IO_buf_end" case. In other words, "the buffer is full but we still can't decode a single wchar_t, abort!". Growing _shortbuf might break stuff elsewhere. I am leaning towards using a local small buffer instead.
Created attachment 315076 [details] Proposed patch This what is working for me with this patch: #include <stdio.h> #include <locale.h> #include <wchar.h> int main(void) { wint_t c; setlocale(LC_ALL, "en_US.utf8"); setvbuf(stdin, NULL, _IONBF, 0); while ((c = fgetwc(stdin)) != WEOF) { fputwc(c, stdout); fflush(stdout); } if (ferror(stdin)) perror(NULL); } # cat zz.txt qwerty йцукен # LD_LIBRARY_PATH=. ./a.out <zz.txt qwerty йцукен
Yuki, can you confirm attached patch works for you?
Sorry, I don't know how to build glibc from source and test it.
Created attachment 315214 [details] Updated patch
Sorry for a long absence. I built the library with the patch #315214 applied and verified it works for me.
I fixed this upstream now. With a different patch. Should be in the next rawhide build.
Anyone confirm this has hit rawhide & can be closed?
I tried rawhide 2.9.90-7 and verified it works.