Bug 147485 - OOPS in kjournald
OOPS in kjournald
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
: 145790 150590 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2005-02-08 09:20 EST by Daniel W. Ottey
Modified: 2007-11-30 17:07 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-16 13:21:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
"messages" file containing the OOPS in kjournald (line 582) (100.77 KB, text/plain)
2005-02-08 09:20 EST, Daniel W. Ottey
no flags Details
Another log (6.63 KB, text/plain)
2005-03-17 15:11 EST, Daniel W. Ottey
no flags Details
Patch to fix race in journal_unmap_buffer() (1.36 KB, patch)
2005-03-18 09:48 EST, Stephen Tweedie
no flags Details | Diff

  None (edit)
Description Daniel W. Ottey 2005-02-08 09:20:02 EST
Description of problem:
While performing an I/O stress test against 16 partitions over an Emulex 1050
(lpfc) adapter, we received a OOPS in kjournald.

This OOPS was prefaced with an error "Unable to handle kernel NULL pointer
dereference (address 0000000000000018)."

After the OOPS, the lpfc driver reported many errors: "SCSI layer issued abort

When attempting to access the system in the morning, it was unresponsive.  We
could not log on the console, nor could we log in remotely via SSH.

Version-Release number of selected component (if applicable):

How reproducible:
At least once

Steps to Reproduce:
1.  Perform I/O stress test against 16 partitions over an Emulex 1050 adapter.
2.  Each of eight disks contained two partitions - one formatted as ext2, the
other ext3.
3.  The root partition was in an LVM on a SCSI disk attached to an LSI320 raid
Actual results:
OOPS in kjournald

Expected results:

Additional info:
Comment 1 Daniel W. Ottey 2005-02-08 09:20:03 EST
Created attachment 110798 [details]
"messages" file containing the OOPS in kjournald (line 582)
Comment 3 Stephen Tweedie 2005-02-28 10:49:09 EST
What IO stress was being used in this case?
Comment 4 Daniel W. Ottey 2005-03-08 12:15:23 EST
The stress test performs file reads and writes to the filesystems described. 
Various block sizes (64K, 32K, and 4K) are used.
Comment 6 Daniel W. Ottey 2005-03-17 15:11:58 EST
Created attachment 112107 [details]
Another log

We are seeing these errors when performing I/O stress via any of our 3 Emulex
adapters (9802, 1050 and 10000).
Comment 7 Stephen Tweedie 2005-03-18 09:48:47 EST
Created attachment 112125 [details]
Patch to fix race in journal_unmap_buffer()

This patch fixes a race condition between journal_unmap_buffer() and
journal_commit_transaction().  It involves journal_put_journal_head() being
called without any locking, and thus hitting a small window in kjournald where
the buffer's b_transaction can be temporarily NULL.  If that triggers, the
journal_unmap_buffer() ends up throwing away the journal_head that is still in
use by journal_commit_transaction.
Comment 8 Stephen Tweedie 2005-03-18 09:50:02 EST
This patch has fixed several "oops in kjournald" footprints in testing, and has
been committed for RHEL4 U1.
Comment 9 Stephen Tweedie 2005-03-18 09:54:59 EST
*** Bug 145790 has been marked as a duplicate of this bug. ***
Comment 10 Stephen Tweedie 2005-03-18 09:58:39 EST
*** Bug 147443 has been marked as a duplicate of this bug. ***
Comment 11 Stephen Tweedie 2005-03-18 10:01:13 EST
*** Bug 150590 has been marked as a duplicate of this bug. ***
Comment 12 Daniel W. Ottey 2005-03-19 09:43:39 EST
Thank you for the patch.  We will test it this weekend.  I'm told our error is
usually seen within 8-14 hours of running.
Comment 13 Daniel W. Ottey 2005-03-29 16:26:10 EST
The patch appears to work great.  We have not seen the kjournald errors while
running with the patched kernel.  Unisys will open an Issue Tracker item in
hopes of getting a "hotfix" kernel to release to customers.

I am personally not sure of what steps need to be taken to do that, but
none-the-less the Issue Tracker item will be our first step.

Thank you.
Comment 15 Jason Baron 2006-03-16 13:21:59 EST
This patch was committed on March 5th 2005...closing...

Note You need to log in before you can comment on or make changes to this bug.