During recent OpenVZ/Virtuozzo kernel testing our kernel team noticed a bug which also takes place in RHEL4 kernel 2.6.9-67.0.20.ELsmp x86_64 kernel: if read() is going to read < 8 bytes and fails it returns success and a big value as a number of bytes read. A testcase: [root@aaa finist]# cat read_test.c #define _GNU_SOURCE /* for O_DIRECTORY */ #include <stdio.h> #include <errno.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> int main() { void *buf = (void *) -1; char a; int ret, fd3; char fname[] = "read02.tmp"; // Note: PROT_NONE used => pages may not be accessed buf = mmap(0, 1, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); if (buf == MAP_FAILED) { perror("mmap failed"); return 1; } // prepare a file with 1 byte in it fd3 = open(fname, O_RDWR | O_CREAT, 0666); if (fd3 < 0) { perror("open"); return 1; } if (write(fd3, "A", 1) != 1) { perror("read"); return 1; } close(fd3); fd3 = open(fname, O_RDWR, 0666); if (fd3 < 0) { perror("open"); return 1; } // try to read from the file to the buffer which cannot be accessed ret = read(fd3, buf, 1); printf("%d %d %m\n", ret, errno); return 0; } [root@aaa finist]# gcc read_test.c [root@aaa finist]# ./a.out -2024112128 0 Success [root@aaa finist]# uname -a Linux aaa 2.6.9-67.0.20.ELsmp #1 SMP Wed Jun 18 12:35:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux At the moment old kernel works as expected: [root@aaa finist]# uname -a Linux aaa 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux [root@aaa finist]# ./a.out -1 14 Bad address ############################################################################ This bug is due to patch linux-2.6.9-x86_64-copy_user-zero-tail.patch which appeared first in 2.6.9-67.0.20 kernel. The problem is the following: left = __copy_to_user(desc->arg.buf, kaddr + offset, size); __copy_to_user() returns wrong value. the appropriate piece of code: arch/x86_64/lib/copy_user.S: /* rdi destination * rsi source * rdx count * * Output: * eax uncopied bytes or 0 if successfull. */ copy_user_generic_c: xorq %rax,%rax movl %edx,%ecx shrl $3,%ecx andl $7,%edx .Lc1: rep movsq movl %edx,%ecx .Lc2: rep movsb ret .Lc1e: movq %rcx,%rsi .Lc3: rep stosq .Lc2e: movl %edx,%ecx .Lc4: rep stosb .Lc3e: leaq (%rdx,%rsi,8),%rax ret When (size == 1) we execute the following strings: .Lc2e: movl %edx,%ecx .Lc4: rep stosb .Lc3e: leaq (%rdx,%rsi,8),%rax ret So, %rsi contains the source address but not the number of octets of uncopied bytes. Marat Stanichecnko (mstanichenko), an Virtuozzo/OpenVZ developer, prepared a patch to fix this issue, attached.
Created attachment 311467 [details] patch fixes copy_user_generic return value on error path
The patch was mistakenly done with -p0. Sorry for that.
Comment on attachment 311467 [details] patch fixes copy_user_generic return value on error path the patch is incorrect, al is used by stosb.
Created attachment 311573 [details] v2.: patch fixes copy_user_generic return value on error path
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Updating PM score.
*** This bug has been marked as a duplicate of bug 453053 ***