184086 – aio_return incorrectly returns 0 sometimes

Bug 184086 - aio_return incorrectly returns 0 sometimes

Summary: aio_return incorrectly returns 0 sometimes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	179629
TreeView+	depends on / blocked

Reported:	2006-03-06 05:34 UTC by Matthew Gregan [:kinetik]
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2007-0471
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-06-11 18:49:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
testcase (1.23 KB, text/plain) 2006-03-06 05:36 UTC, Matthew Gregan [:kinetik]	no flags	Details
glibc-rtkaio-errretval-order.patch (1.86 KB, patch) 2006-07-20 06:24 UTC, Jakub Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0471	0	normal	SHIPPED_LIVE	glibc bug fix update	2007-06-07 22:12:42 UTC

Description Matthew Gregan [:kinetik] 2006-03-06 05:34:45 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060224 Ubuntu/dapper Firefox/1.5.0.1

Description of problem:
While using the POSIX asynchronous I/O routines provided by glibc (specifically, provided by librtkaio.so), aio_return sometimes returns 0 for a given aiocb despite an earlier call to aio_error for that aiocb having already returned 0.

I'll attach a testcase to reproduce this bug shortly. The length of time to reproduce the problem is variable--the earliest I've seen it happen is after around 6 million aio requests have been proceesed... sometimes I've had to left the test running for a few hours.

The incorrect behavior appears to be the result of a race while updating the aiocb, as inspection of the internals of the aiocb after aio_return has returned 0 has shown that sometimes __return_value is 0, and a second inspection of the aiocb internals reveals the expected value stored in __return_value. I've also observed cases where aio_return has returned 0, but __return_value already holds the expected value by the time the testcase inspects it.

The real application this behaviour was originally observed in had previously called aio_suspend to wait for the aio to complete--it was not blindly polling aio_error like the testcase.

Version-Release number of selected component (if applicable):
glibc-2.3.2-95.37

How reproducible:
Always

Steps to Reproduce:
1. Compile the attached testcase
2. Run the attached testcase
3. Be patient

Actual Results: Testcase will eventually report a situation where aio_return has returned 0, and then aio_return has also returned 0.

Expected Results: In this situation, aio_return should have returned the number of bytes written.

Additional info:

At first glance this appears to be the same problem as reported in bug #107015. I was unsure if the more correct thing to do was to reopen that bug report (especially since I'm reporting against RHEL 3 and that bug was opened against RHEL 2.1).

Current running kernel-smp-2.4.21-37.0.1.EL (Linux hostA.example.com 2.4.21-37.0.1.ELsmp #1 SMP Wed Jan 11 18:44:17 EST 2006 i686 i686 i386 GNU/Linux).

The hardware is a 4-way Pentium 3 700MHz with 4GB memory.

The testcase reproducing this problem was run on an ext3 filesystem mounted on top of a ~140GB RAID0 volume provided by the disk controller. The volume consists of 2x 72GB SCSI disks attached via an HP Smart Array 5300 (firmware 1.76) using the in-kernel cciss driver.

The problem was originally discovered in a real application running on RHEL 3AS Update 4. I then reproduced the problem on a similar system running the same versions, then upgraded that system to RHEL 3AS Update 6 and was still able to reproduce the problem.

Unable to reproduce this problem on an up-to-date Fedora rawhide installation, nor on an up-to-date SLES9 installation with recent service packs applied.

Comment 1 Matthew Gregan [:kinetik] 2006-03-06 05:36:33 UTC

Created attachment 125693 [details]
testcase

Comment 2 Olivier Lecomte 2006-06-22 01:42:45 UTC

I've been able to reproduce this bug fairly consistently on one of my systems
(i.e. within 4 iterations) using redhatready's STORAGE2 test, and occasionally
on another box. STORAGE2 failed because dt was failing with error status 254
(i.e. spurious end of file). My investigation lead to the same conclusion as
kinetik. There's a basically a race in /lib/tls/librt.so.1's kernel_callback():

    int
    aio_error (aiocbp)
	 const struct aiocb *aiocbp;
    {
      int ret = aiocbp->__error_code;

      if (ret == EINPROGRESS)
	{
	  __aio_read_one_event ();
	  ret = aiocbp->__error_code;
	}
      return ret;
    }

    ssize_t
    aio_return (aiocbp)
    struct aiocb *aiocbp;
    {
      if (aiocbp->__error_code == EINPROGRESS)
	__aio_read_one_event ();
      return aiocbp->__return_value;
    }

    static void
    kernel_callback (kctx_t ctx, struct kiocb *kiocb, long res, long res2)
    {
      struct requestlist *req = (struct requestlist *)kiocb;

=>    req->aiocbp->aiocb.__error_code = 0;
      req->aiocbp->aiocb.__return_value = res;
      if (res < 0 && res > -1000)
	{
	  req->aiocbp->aiocb.__error_code = -res;
	  req->aiocbp->aiocb.__return_value = -1;
	}
      __aio_notify (req);
      assert (req->running == allocated);
      req->running = done;
      __aio_remove_krequest (req);
      __aio_free_request (req);
    }

Explaination: kernel_callback() updates __error_code before __return_value. So,
if a user calls aio_error() between these two instructions, he gets a return
status 0. That's != EINPROGRESS, so he calls aoi_return() to get the number of
bytes transferred / error code. aio_return() returns a nul __return_value
because it was zeroed out in the issuing path. And in posix, 0 bytes transfered
means end of file. dt (in this case) spaces out because the IO is nowhere near
the end of the file.

The proper code should be something like this:

      req->aiocbp->aiocb.__return_value = res;
      write_memory_barrier();
      req->aiocbp->aiocb.__error_code = 0;

I've done a binary patch of /lib/tls/librt.so.1 to reverse the two instructions:

000036d0 <kernel_callback>:
    36d0:	55			push   %ebp
    36d1:	89 e5			mov    %esp,%ebp
    36d3:	56			push   %esi
    36d4:	8b 75 0c		mov    0xc(%ebp),%esi
    36d7:	8b 55 10		mov    0x10(%ebp),%edx
    36da:	8b 4e 58		mov    0x58(%esi),%ecx
    36dd:	85 d2			test   %edx,%edx
    36df:	c7 41 60 00 00 00 00	movl   $0x0,0x60(%ecx)
    36e6:	8b 4e 58		mov    0x58(%esi),%ecx
    36e9:	89 51 64		mov    %edx,0x64(%ecx)

Patched:

=>  36df:	89 51 64		mov    %edx,0x64(%ecx)
=>  36e2:	8b 4e 58		mov    0x58(%esi),%ecx
=>  36e5:	c7 41 60 00 00 00 00	movl   $0x0,0x60(%ecx)

That has passed over 600 iterations of STORAGE2 with no spurious end of file or
any other error vs one failure within 4 iterations previously.

There are a couple of other places in the code where __error_code is updated
before __return_status. These would need to be fixed as well. But the most
damaging one is the one in kernel_callback().

BTW, looking at the code, the same defect is also present in RHEL4.

Comment 4 Jakub Jelinek 2006-07-20 06:24:57 UTC

Created attachment 132724 [details]
glibc-rtkaio-errretval-order.patch

Untested patch that could fix this.

Comment 7 Ulrich Drepper 2006-07-31 23:38:18 UTC

I made the patch public.

Comment 8 Matthew Gregan [:kinetik] 2006-08-07 04:06:34 UTC

I rebuilt glibc 2.3.2-95.39 from the SPRM with the patch provided by Jakub
included.  I've run my simple reproducer against a local install of the patched
glibc (using "LD_LIBRARY_PATH=$(pwd) ./ld-linux.so.2 ~/aio") for around four
hours without seeing a recurrence of the race condition.  It seems that this
patch resolves the problem I reported.

Comment 10 Olivier Lecomte 2006-08-28 17:00:16 UTC

Yep, the patch fixes it for me too. Thanks.

Comment 11 Daniel Riek 2006-12-17 03:56:19 UTC

This should be fixed before we go into maintenance and work is already done.

PM ACK for 3.9

Comment 16 Red Hat Bugzilla 2007-06-11 18:49:07 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0471.html

Note You need to log in before you can comment on or make changes to this bug.