Bug 446599 - jbd races lead to EIO for O_DIRECT
Summary: jbd races lead to EIO for O_DIRECT
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Bryn M. Reeves
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-15 09:55 UTC by Bryn M. Reeves
Modified: 2018-10-19 18:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 19:46:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
testcase to trigger O_DIRECT EIO problem (580 bytes, text/x-csrc)
2008-05-15 09:55 UTC, Bryn M. Reeves
no flags Details
Patch correcting jbd races (5.67 KB, patch)
2008-06-10 18:21 UTC, Bryn M. Reeves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Bryn M. Reeves 2008-05-15 09:55:38 UTC
Description of problem:
When running the attached test case on an ext3 file system eventually one of the
processes using direct I/O (O_DIRECT) will fail with EIO.

This has been reported to occur during for e.g. database load operations. 

This only occurs on kernels that include the patch:

linux-2.6-fs-jbd-wait-for-t_sync_datalist-buf-to-complete.patch

Version-Release number of selected component (if applicable):
2.6.18-53.1.13 onwards

How reproducible:
100%

Steps to Reproduce:
1. Compile the attached testcase with:
$ gcc -Wall -D_GNU_SOURCE -o testcase testcase.c

3. Create a testfile:
dd if=/dev/zero of=testfile bs=64k count=1000

3. Run multiple copies of the test in parallel with half using direct I/O and
half using buffered I/O, e.g.:
# ./testcase & ./testcase -d & ./testcase & ./testcase -d & ./testcase &
./testcase -d & ./testcase & ./testcase -d

  
Actual results:
[1] 18481
[2] 18482
[3] 18483
[4] 18484
[5] 18485
[6] 18486
[7] 18487
write failed: Input/output error


Expected results:
Test runs indefinitely without error

Additional info:
Several upstream threads discussing this:

http://lkml.org/lkml/2008/5/1/160
http://lkml.org/lkml/2008/5/12/193

Comment 1 Bryn M. Reeves 2008-05-15 09:55:39 UTC
Created attachment 305460 [details]
testcase to trigger O_DIRECT EIO problem

Comment 3 Issue Tracker 2008-05-20 19:06:40 UTC
Mirroring events from IT


This event sent from IssueTracker by balkov 
 issue 172641

Comment 4 Ben 2008-05-27 18:00:26 UTC
IT is refusing to mirror even when done manually...

----- Additional Comments From mranweil.com (prefers email at
mjr.com)  2008-05-27 13:30 EDT -------
The testcase ran fine over the long weekend with the patch version 7.  Elmar -
it fix for you, too?

Comment 5 Bryn M. Reeves 2008-06-10 18:21:26 UTC
Created attachment 308846 [details]
Patch correcting jbd races

This is the final version of the patch pushed upstream by IBM. Now in -mm &
expected to be merged in 2.6.26.

Comment 7 RHEL Program Management 2008-07-25 17:03:51 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 8 Brad Peters 2008-08-05 00:03:00 UTC
Posted for review - pending PM ack based on Joe K.'s request

http://post-office.corp.redhat.com/archives/rhkernel-list/2008-August/msg00097.html

Comment 10 RHEL Program Management 2008-08-07 22:14:53 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Don Zickus 2008-08-13 16:07:17 UTC
in kernel-2.6.18-104.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 12 Don Zickus 2008-08-13 17:25:46 UTC
in kernel-2.6.18-104.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 19 errata-xmlrpc 2009-01-20 19:46:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.