Red Hat Bugzilla – Bug 147485
OOPS in kjournald
Last modified: 2007-11-30 17:07:16 EST
Description of problem:
While performing an I/O stress test against 16 partitions over an Emulex 1050
(lpfc) adapter, we received a OOPS in kjournald.
This OOPS was prefaced with an error "Unable to handle kernel NULL pointer
dereference (address 0000000000000018)."
After the OOPS, the lpfc driver reported many errors: "SCSI layer issued abort
When attempting to access the system in the morning, it was unresponsive. We
could not log on the console, nor could we log in remotely via SSH.
Version-Release number of selected component (if applicable):
At least once
Steps to Reproduce:
1. Perform I/O stress test against 16 partitions over an Emulex 1050 adapter.
2. Each of eight disks contained two partitions - one formatted as ext2, the
3. The root partition was in an LVM on a SCSI disk attached to an LSI320 raid
OOPS in kjournald
Created attachment 110798 [details]
"messages" file containing the OOPS in kjournald (line 582)
What IO stress was being used in this case?
The stress test performs file reads and writes to the filesystems described.
Various block sizes (64K, 32K, and 4K) are used.
Created attachment 112107 [details]
We are seeing these errors when performing I/O stress via any of our 3 Emulex
adapters (9802, 1050 and 10000).
Created attachment 112125 [details]
Patch to fix race in journal_unmap_buffer()
This patch fixes a race condition between journal_unmap_buffer() and
journal_commit_transaction(). It involves journal_put_journal_head() being
called without any locking, and thus hitting a small window in kjournald where
the buffer's b_transaction can be temporarily NULL. If that triggers, the
journal_unmap_buffer() ends up throwing away the journal_head that is still in
use by journal_commit_transaction.
This patch has fixed several "oops in kjournald" footprints in testing, and has
been committed for RHEL4 U1.
*** Bug 145790 has been marked as a duplicate of this bug. ***
*** Bug 147443 has been marked as a duplicate of this bug. ***
*** Bug 150590 has been marked as a duplicate of this bug. ***
Thank you for the patch. We will test it this weekend. I'm told our error is
usually seen within 8-14 hours of running.
The patch appears to work great. We have not seen the kjournald errors while
running with the patched kernel. Unisys will open an Issue Tracker item in
hopes of getting a "hotfix" kernel to release to customers.
I am personally not sure of what steps need to be taken to do that, but
none-the-less the Issue Tracker item will be our first step.
This patch was committed on March 5th 2005...closing...