From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060224 Ubuntu/dapper Firefox/1.5.0.1 Description of problem: While using the POSIX asynchronous I/O routines provided by glibc (specifically, provided by librtkaio.so), aio_return sometimes returns 0 for a given aiocb despite an earlier call to aio_error for that aiocb having already returned 0. I'll attach a testcase to reproduce this bug shortly. The length of time to reproduce the problem is variable--the earliest I've seen it happen is after around 6 million aio requests have been proceesed... sometimes I've had to left the test running for a few hours. The incorrect behavior appears to be the result of a race while updating the aiocb, as inspection of the internals of the aiocb after aio_return has returned 0 has shown that sometimes __return_value is 0, and a second inspection of the aiocb internals reveals the expected value stored in __return_value. I've also observed cases where aio_return has returned 0, but __return_value already holds the expected value by the time the testcase inspects it. The real application this behaviour was originally observed in had previously called aio_suspend to wait for the aio to complete--it was not blindly polling aio_error like the testcase. Version-Release number of selected component (if applicable): glibc-2.3.2-95.37 How reproducible: Always Steps to Reproduce: 1. Compile the attached testcase 2. Run the attached testcase 3. Be patient Actual Results: Testcase will eventually report a situation where aio_return has returned 0, and then aio_return has also returned 0. Expected Results: In this situation, aio_return should have returned the number of bytes written. Additional info: At first glance this appears to be the same problem as reported in bug #107015. I was unsure if the more correct thing to do was to reopen that bug report (especially since I'm reporting against RHEL 3 and that bug was opened against RHEL 2.1). Current running kernel-smp-2.4.21-37.0.1.EL (Linux hostA.example.com 2.4.21-37.0.1.ELsmp #1 SMP Wed Jan 11 18:44:17 EST 2006 i686 i686 i386 GNU/Linux). The hardware is a 4-way Pentium 3 700MHz with 4GB memory. The testcase reproducing this problem was run on an ext3 filesystem mounted on top of a ~140GB RAID0 volume provided by the disk controller. The volume consists of 2x 72GB SCSI disks attached via an HP Smart Array 5300 (firmware 1.76) using the in-kernel cciss driver. The problem was originally discovered in a real application running on RHEL 3AS Update 4. I then reproduced the problem on a similar system running the same versions, then upgraded that system to RHEL 3AS Update 6 and was still able to reproduce the problem. Unable to reproduce this problem on an up-to-date Fedora rawhide installation, nor on an up-to-date SLES9 installation with recent service packs applied.
Created attachment 125693 [details] testcase
I've been able to reproduce this bug fairly consistently on one of my systems (i.e. within 4 iterations) using redhatready's STORAGE2 test, and occasionally on another box. STORAGE2 failed because dt was failing with error status 254 (i.e. spurious end of file). My investigation lead to the same conclusion as kinetik. There's a basically a race in /lib/tls/librt.so.1's kernel_callback(): int aio_error (aiocbp) const struct aiocb *aiocbp; { int ret = aiocbp->__error_code; if (ret == EINPROGRESS) { __aio_read_one_event (); ret = aiocbp->__error_code; } return ret; } ssize_t aio_return (aiocbp) struct aiocb *aiocbp; { if (aiocbp->__error_code == EINPROGRESS) __aio_read_one_event (); return aiocbp->__return_value; } static void kernel_callback (kctx_t ctx, struct kiocb *kiocb, long res, long res2) { struct requestlist *req = (struct requestlist *)kiocb; => req->aiocbp->aiocb.__error_code = 0; req->aiocbp->aiocb.__return_value = res; if (res < 0 && res > -1000) { req->aiocbp->aiocb.__error_code = -res; req->aiocbp->aiocb.__return_value = -1; } __aio_notify (req); assert (req->running == allocated); req->running = done; __aio_remove_krequest (req); __aio_free_request (req); } Explaination: kernel_callback() updates __error_code before __return_value. So, if a user calls aio_error() between these two instructions, he gets a return status 0. That's != EINPROGRESS, so he calls aoi_return() to get the number of bytes transferred / error code. aio_return() returns a nul __return_value because it was zeroed out in the issuing path. And in posix, 0 bytes transfered means end of file. dt (in this case) spaces out because the IO is nowhere near the end of the file. The proper code should be something like this: req->aiocbp->aiocb.__return_value = res; write_memory_barrier(); req->aiocbp->aiocb.__error_code = 0; I've done a binary patch of /lib/tls/librt.so.1 to reverse the two instructions: 000036d0 <kernel_callback>: 36d0: 55 push %ebp 36d1: 89 e5 mov %esp,%ebp 36d3: 56 push %esi 36d4: 8b 75 0c mov 0xc(%ebp),%esi 36d7: 8b 55 10 mov 0x10(%ebp),%edx 36da: 8b 4e 58 mov 0x58(%esi),%ecx 36dd: 85 d2 test %edx,%edx 36df: c7 41 60 00 00 00 00 movl $0x0,0x60(%ecx) 36e6: 8b 4e 58 mov 0x58(%esi),%ecx 36e9: 89 51 64 mov %edx,0x64(%ecx) Patched: => 36df: 89 51 64 mov %edx,0x64(%ecx) => 36e2: 8b 4e 58 mov 0x58(%esi),%ecx => 36e5: c7 41 60 00 00 00 00 movl $0x0,0x60(%ecx) That has passed over 600 iterations of STORAGE2 with no spurious end of file or any other error vs one failure within 4 iterations previously. There are a couple of other places in the code where __error_code is updated before __return_status. These would need to be fixed as well. But the most damaging one is the one in kernel_callback(). BTW, looking at the code, the same defect is also present in RHEL4.
Created attachment 132724 [details] glibc-rtkaio-errretval-order.patch Untested patch that could fix this.
I made the patch public.
I rebuilt glibc 2.3.2-95.39 from the SPRM with the patch provided by Jakub included. I've run my simple reproducer against a local install of the patched glibc (using "LD_LIBRARY_PATH=$(pwd) ./ld-linux.so.2 ~/aio") for around four hours without seeing a recurrence of the race condition. It seems that this patch resolves the problem I reported.
Yep, the patch fixes it for me too. Thanks.
This should be fixed before we go into maintenance and work is already done. PM ACK for 3.9
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0471.html