Red Hat Bugzilla – Bug 184086
aio_return incorrectly returns 0 sometimes
Last modified: 2007-11-30 17:07:09 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:184.108.40.206) Gecko/20060224 Ubuntu/dapper Firefox/220.127.116.11
Description of problem:
While using the POSIX asynchronous I/O routines provided by glibc (specifically, provided by librtkaio.so), aio_return sometimes returns 0 for a given aiocb despite an earlier call to aio_error for that aiocb having already returned 0.
I'll attach a testcase to reproduce this bug shortly. The length of time to reproduce the problem is variable--the earliest I've seen it happen is after around 6 million aio requests have been proceesed... sometimes I've had to left the test running for a few hours.
The incorrect behavior appears to be the result of a race while updating the aiocb, as inspection of the internals of the aiocb after aio_return has returned 0 has shown that sometimes __return_value is 0, and a second inspection of the aiocb internals reveals the expected value stored in __return_value. I've also observed cases where aio_return has returned 0, but __return_value already holds the expected value by the time the testcase inspects it.
The real application this behaviour was originally observed in had previously called aio_suspend to wait for the aio to complete--it was not blindly polling aio_error like the testcase.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Compile the attached testcase
2. Run the attached testcase
3. Be patient
Actual Results: Testcase will eventually report a situation where aio_return has returned 0, and then aio_return has also returned 0.
Expected Results: In this situation, aio_return should have returned the number of bytes written.
At first glance this appears to be the same problem as reported in bug #107015. I was unsure if the more correct thing to do was to reopen that bug report (especially since I'm reporting against RHEL 3 and that bug was opened against RHEL 2.1).
Current running kernel-smp-2.4.21-37.0.1.EL (Linux hostA.example.com 2.4.21-37.0.1.ELsmp #1 SMP Wed Jan 11 18:44:17 EST 2006 i686 i686 i386 GNU/Linux).
The hardware is a 4-way Pentium 3 700MHz with 4GB memory.
The testcase reproducing this problem was run on an ext3 filesystem mounted on top of a ~140GB RAID0 volume provided by the disk controller. The volume consists of 2x 72GB SCSI disks attached via an HP Smart Array 5300 (firmware 1.76) using the in-kernel cciss driver.
The problem was originally discovered in a real application running on RHEL 3AS Update 4. I then reproduced the problem on a similar system running the same versions, then upgraded that system to RHEL 3AS Update 6 and was still able to reproduce the problem.
Unable to reproduce this problem on an up-to-date Fedora rawhide installation, nor on an up-to-date SLES9 installation with recent service packs applied.
Created attachment 125693 [details]
I've been able to reproduce this bug fairly consistently on one of my systems
(i.e. within 4 iterations) using redhatready's STORAGE2 test, and occasionally
on another box. STORAGE2 failed because dt was failing with error status 254
(i.e. spurious end of file). My investigation lead to the same conclusion as
kinetik. There's a basically a race in /lib/tls/librt.so.1's kernel_callback():
const struct aiocb *aiocbp;
int ret = aiocbp->__error_code;
if (ret == EINPROGRESS)
ret = aiocbp->__error_code;
struct aiocb *aiocbp;
if (aiocbp->__error_code == EINPROGRESS)
kernel_callback (kctx_t ctx, struct kiocb *kiocb, long res, long res2)
struct requestlist *req = (struct requestlist *)kiocb;
=> req->aiocbp->aiocb.__error_code = 0;
req->aiocbp->aiocb.__return_value = res;
if (res < 0 && res > -1000)
req->aiocbp->aiocb.__error_code = -res;
req->aiocbp->aiocb.__return_value = -1;
assert (req->running == allocated);
req->running = done;
Explaination: kernel_callback() updates __error_code before __return_value. So,
if a user calls aio_error() between these two instructions, he gets a return
status 0. That's != EINPROGRESS, so he calls aoi_return() to get the number of
bytes transferred / error code. aio_return() returns a nul __return_value
because it was zeroed out in the issuing path. And in posix, 0 bytes transfered
means end of file. dt (in this case) spaces out because the IO is nowhere near
the end of the file.
The proper code should be something like this:
req->aiocbp->aiocb.__return_value = res;
req->aiocbp->aiocb.__error_code = 0;
I've done a binary patch of /lib/tls/librt.so.1 to reverse the two instructions:
36d0: 55 push %ebp
36d1: 89 e5 mov %esp,%ebp
36d3: 56 push %esi
36d4: 8b 75 0c mov 0xc(%ebp),%esi
36d7: 8b 55 10 mov 0x10(%ebp),%edx
36da: 8b 4e 58 mov 0x58(%esi),%ecx
36dd: 85 d2 test %edx,%edx
36df: c7 41 60 00 00 00 00 movl $0x0,0x60(%ecx)
36e6: 8b 4e 58 mov 0x58(%esi),%ecx
36e9: 89 51 64 mov %edx,0x64(%ecx)
=> 36df: 89 51 64 mov %edx,0x64(%ecx)
=> 36e2: 8b 4e 58 mov 0x58(%esi),%ecx
=> 36e5: c7 41 60 00 00 00 00 movl $0x0,0x60(%ecx)
That has passed over 600 iterations of STORAGE2 with no spurious end of file or
any other error vs one failure within 4 iterations previously.
There are a couple of other places in the code where __error_code is updated
before __return_status. These would need to be fixed as well. But the most
damaging one is the one in kernel_callback().
BTW, looking at the code, the same defect is also present in RHEL4.
Created attachment 132724 [details]
Untested patch that could fix this.
I made the patch public.
I rebuilt glibc 2.3.2-95.39 from the SPRM with the patch provided by Jakub
included. I've run my simple reproducer against a local install of the patched
glibc (using "LD_LIBRARY_PATH=$(pwd) ./ld-linux.so.2 ~/aio") for around four
hours without seeing a recurrence of the race condition. It seems that this
patch resolves the problem I reported.
Yep, the patch fixes it for me too. Thanks.
This should be fixed before we go into maintenance and work is already done.
PM ACK for 3.9
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.