Bug 162814 - Assertion failure in log_do_checkpoint
Assertion failure in log_do_checkpoint
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
: 167343 200434 (view as bug list)
Depends On: 123137
Blocks: 168429
  Show dependency treegraph
 
Reported: 2005-07-08 17:23 EDT by Stephen Tweedie
Modified: 2010-10-21 23:08 EDT (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-07 14:17:12 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Stephen Tweedie 2005-07-08 17:23:13 EDT
+++ This bug was initially created as a clone of Bug #123137 +++

Description of problem:
Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361:
"drop_count != 0 || cleanup_ret != 0"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:361!

Version-Release number of selected component (if applicable):
2.6.5-1.327

How reproducible:
Rare

System was a dual Xeon with AMI Megaraid RAID controller.  File
systems are Ext3.

I'll attach the oops output in a second.
Comment 3 Dave Jones 2005-09-04 23:53:35 EDT
*** Bug 167343 has been marked as a duplicate of this bug. ***
Comment 4 Jeff Welden 2005-09-12 19:29:40 EDT
There is a one-line fix for this by Jan Kara in the Vanilla Linux Kernel with
2.6.11.12:
    http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11.12

Additional discussion:
    http://lkml.org/lkml/2005/6/1/34
    http://marc.theaimsgroup.com/?l=linux-kernel&m=111761151011571&w=2

Is it possible for you to create a patch for this for 2.6.9-11 EL smp kernel?
Comment 5 Need Real Name 2005-09-14 18:52:55 EDT
I've tried this patch, and it DOES seem to fix this problem!   Well done! 
Hopefully RedHat will create a kernel update ASAP.
Comment 6 Need Real Name 2005-10-03 16:07:40 EDT
This patch has been in production for 3 weeks now without a single problem. 
These machines would PANIC almost daily before, mostly at night when we were
running backups.  

Maybe this problem is mostly associated with high-end hardware, like DL380s, but
I would think that RedHat would be interested in fixing such a serious problem,
especially ones that affect their target hardware.

Sofar, I've heard nothing to show that RedHat interested in fixing this.

Will this patch be included in a future kernel?
Comment 7 Stephen Tweedie 2005-10-05 16:24:57 EDT
Yes, this fix looks good, and it matches the upstream fix.  It will be queued
subject to the usual internal review for the U3 kernel.

I have a kernel built based on U2 plus 3 filesystem fixes:
* readahead fixes for random >4k read performance
* ext3 performance fix for very slow performance when writing large files on
huge filesystems
* this log_do_checkpoint fix.

i686 and x86_64 kernels are available from:

http://people.redhat.com/sct/.private/test-kernels/kernel-2.6.9-22.EL.sct.4/
Comment 9 Stephen Tweedie 2005-11-07 14:12:33 EST
Fix committed for inclusion in U3.
Comment 12 Red Hat Bugzilla 2006-03-07 14:17:12 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html
Comment 15 Jason Baron 2006-07-27 15:37:41 EDT
*** Bug 200434 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.