Bug 184086

Summary: aio_return incorrectly returns 0 sometimes
Product: Red Hat Enterprise Linux 3 Reporter: Matthew Gregan [:kinetik] <kinetik>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: high    
Version: 3.0CC: drepper, olivier, rhentosh, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0471 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-11 18:49:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 179629    
Attachments:
Description Flags
testcase
none
glibc-rtkaio-errretval-order.patch none

Description Matthew Gregan [:kinetik] 2006-03-06 05:34:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060224 Ubuntu/dapper Firefox/1.5.0.1

Description of problem:
While using the POSIX asynchronous I/O routines provided by glibc (specifically, provided by librtkaio.so), aio_return sometimes returns 0 for a given aiocb despite an earlier call to aio_error for that aiocb having already returned 0.

I'll attach a testcase to reproduce this bug shortly.  The length of time to reproduce the problem is variable--the earliest I've seen it happen is after around 6 million aio requests have been proceesed... sometimes I've had to left the test running for a few hours.

The incorrect behavior appears to be the result of a race while updating the aiocb, as inspection of the internals of the aiocb after aio_return has returned 0 has shown that sometimes __return_value is 0, and a second inspection of the aiocb internals reveals the expected value stored in __return_value.  I've also observed cases where aio_return has returned 0, but __return_value already holds the expected value by the time the testcase inspects it.

The real application this behaviour was originally observed in had previously called aio_suspend to wait for the aio to complete--it was not blindly polling aio_error like the testcase.

Version-Release number of selected component (if applicable):
glibc-2.3.2-95.37

How reproducible:
Always

Steps to Reproduce:
1. Compile the attached testcase
2. Run the attached testcase
3. Be patient


Actual Results:  Testcase will eventually report a situation where aio_return has returned 0, and then aio_return has also returned 0.

Expected Results:  In this situation, aio_return should have returned the number of bytes written.

Additional info:

At first glance this appears to be the same problem as reported in bug #107015.  I was unsure if the more correct thing to do was to reopen that bug report (especially since I'm reporting against RHEL 3 and that bug was opened against RHEL 2.1).

Current running kernel-smp-2.4.21-37.0.1.EL (Linux hostA.example.com 2.4.21-37.0.1.ELsmp #1 SMP Wed Jan 11 18:44:17 EST 2006 i686 i686 i386 GNU/Linux).

The hardware is a 4-way Pentium 3 700MHz with 4GB memory.

The testcase reproducing this problem was run on an ext3 filesystem mounted on top of a ~140GB RAID0 volume provided by the disk controller.  The volume consists of 2x 72GB SCSI disks attached via an HP Smart Array 5300 (firmware 1.76) using the in-kernel cciss driver.

The problem was originally discovered in a real application running on RHEL 3AS Update 4.  I then reproduced the problem on a similar system running the same versions, then upgraded that system to RHEL 3AS Update 6 and was still able to reproduce the problem.

Unable to reproduce this problem on an up-to-date Fedora rawhide installation, nor on an up-to-date SLES9 installation with recent service packs applied.

Comment 1 Matthew Gregan [:kinetik] 2006-03-06 05:36:33 UTC
Created attachment 125693 [details]
testcase

Comment 2 Olivier Lecomte 2006-06-22 01:42:45 UTC
I've been able to reproduce this bug fairly consistently on one of my systems
(i.e. within 4 iterations) using redhatready's STORAGE2 test, and occasionally
on another box. STORAGE2 failed because dt was failing with error status 254
(i.e. spurious end of file). My investigation lead to the same conclusion as
kinetik. There's a basically a race in /lib/tls/librt.so.1's kernel_callback():

    int
    aio_error (aiocbp)
	 const struct aiocb *aiocbp;
    {
      int ret = aiocbp->__error_code;

      if (ret == EINPROGRESS)
	{
	  __aio_read_one_event ();
	  ret = aiocbp->__error_code;
	}
      return ret;
    }

    ssize_t
    aio_return (aiocbp)
    struct aiocb *aiocbp;
    {
      if (aiocbp->__error_code == EINPROGRESS)
	__aio_read_one_event ();
      return aiocbp->__return_value;
    }

    static void
    kernel_callback (kctx_t ctx, struct kiocb *kiocb, long res, long res2)
    {
      struct requestlist *req = (struct requestlist *)kiocb;

=>    req->aiocbp->aiocb.__error_code = 0;
      req->aiocbp->aiocb.__return_value = res;
      if (res < 0 && res > -1000)
	{
	  req->aiocbp->aiocb.__error_code = -res;
	  req->aiocbp->aiocb.__return_value = -1;
	}
      __aio_notify (req);
      assert (req->running == allocated);
      req->running = done;
      __aio_remove_krequest (req);
      __aio_free_request (req);
    }

Explaination: kernel_callback() updates __error_code before __return_value. So,
if a user calls aio_error() between these two instructions, he gets a return
status 0. That's != EINPROGRESS, so he calls aoi_return() to get the number of
bytes transferred / error code. aio_return() returns a nul __return_value
because it was zeroed out in the issuing path. And in posix, 0 bytes transfered
means end of file. dt (in this case) spaces out because the IO is nowhere near
the end of the file.

The proper code should be something like this:

      req->aiocbp->aiocb.__return_value = res;
      write_memory_barrier();
      req->aiocbp->aiocb.__error_code = 0;

I've done a binary patch of /lib/tls/librt.so.1 to reverse the two instructions:

000036d0 <kernel_callback>:
    36d0:	55			push   %ebp
    36d1:	89 e5			mov    %esp,%ebp
    36d3:	56			push   %esi
    36d4:	8b 75 0c		mov    0xc(%ebp),%esi
    36d7:	8b 55 10		mov    0x10(%ebp),%edx
    36da:	8b 4e 58		mov    0x58(%esi),%ecx
    36dd:	85 d2			test   %edx,%edx
    36df:	c7 41 60 00 00 00 00	movl   $0x0,0x60(%ecx)
    36e6:	8b 4e 58		mov    0x58(%esi),%ecx
    36e9:	89 51 64		mov    %edx,0x64(%ecx)

Patched:

=>  36df:	89 51 64		mov    %edx,0x64(%ecx)
=>  36e2:	8b 4e 58		mov    0x58(%esi),%ecx
=>  36e5:	c7 41 60 00 00 00 00	movl   $0x0,0x60(%ecx)

That has passed over 600 iterations of STORAGE2 with no spurious end of file or
any other error vs one failure within 4 iterations previously.

There are a couple of other places in the code where __error_code is updated
before __return_status. These would need to be fixed as well. But the most
damaging one is the one in kernel_callback().

BTW, looking at the code, the same defect is also present in RHEL4.

Comment 4 Jakub Jelinek 2006-07-20 06:24:57 UTC
Created attachment 132724 [details]
glibc-rtkaio-errretval-order.patch

Untested patch that could fix this.

Comment 7 Ulrich Drepper 2006-07-31 23:38:18 UTC
I made the patch public.

Comment 8 Matthew Gregan [:kinetik] 2006-08-07 04:06:34 UTC
I rebuilt glibc 2.3.2-95.39 from the SPRM with the patch provided by Jakub
included.  I've run my simple reproducer against a local install of the patched
glibc (using "LD_LIBRARY_PATH=$(pwd) ./ld-linux.so.2 ~/aio") for around four
hours without seeing a recurrence of the race condition.  It seems that this
patch resolves the problem I reported.

Comment 10 Olivier Lecomte 2006-08-28 17:00:16 UTC
Yep, the patch fixes it for me too. Thanks.

Comment 11 Daniel Riek 2006-12-17 03:56:19 UTC
This should be fixed before we go into maintenance and work is already done.

PM ACK for 3.9

Comment 16 Red Hat Bugzilla 2007-06-11 18:49:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0471.html