Bug 162814 - Assertion failure in log_do_checkpoint
Summary: Assertion failure in log_do_checkpoint
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
: 167343 200434 (view as bug list)
Depends On: 123137
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-07-08 21:23 UTC by Stephen Tweedie
Modified: 2018-10-19 19:17 UTC (History)
5 users (show)

Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 19:17:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:808 0 normal SHIPPED_LIVE Important: kernel security update 2005-10-27 04:00:00 UTC
Red Hat Product Errata RHSA-2006:0132 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3 2006-03-09 16:31:00 UTC

Description Stephen Tweedie 2005-07-08 21:23:13 UTC
+++ This bug was initially created as a clone of Bug #123137 +++

Description of problem:
Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361:
"drop_count != 0 || cleanup_ret != 0"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:361!

Version-Release number of selected component (if applicable):
2.6.5-1.327

How reproducible:
Rare

System was a dual Xeon with AMI Megaraid RAID controller.  File
systems are Ext3.

I'll attach the oops output in a second.

Comment 3 Dave Jones 2005-09-05 03:53:35 UTC
*** Bug 167343 has been marked as a duplicate of this bug. ***

Comment 4 Jeff Welden 2005-09-12 23:29:40 UTC
There is a one-line fix for this by Jan Kara in the Vanilla Linux Kernel with
2.6.11.12:
    http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11.12

Additional discussion:
    http://lkml.org/lkml/2005/6/1/34
    http://marc.theaimsgroup.com/?l=linux-kernel&m=111761151011571&w=2

Is it possible for you to create a patch for this for 2.6.9-11 EL smp kernel?

Comment 5 Need Real Name 2005-09-14 22:52:55 UTC
I've tried this patch, and it DOES seem to fix this problem!   Well done! 
Hopefully RedHat will create a kernel update ASAP.


Comment 6 Need Real Name 2005-10-03 20:07:40 UTC
This patch has been in production for 3 weeks now without a single problem. 
These machines would PANIC almost daily before, mostly at night when we were
running backups.  

Maybe this problem is mostly associated with high-end hardware, like DL380s, but
I would think that RedHat would be interested in fixing such a serious problem,
especially ones that affect their target hardware.

Sofar, I've heard nothing to show that RedHat interested in fixing this.

Will this patch be included in a future kernel?

Comment 7 Stephen Tweedie 2005-10-05 20:24:57 UTC
Yes, this fix looks good, and it matches the upstream fix.  It will be queued
subject to the usual internal review for the U3 kernel.

I have a kernel built based on U2 plus 3 filesystem fixes:
* readahead fixes for random >4k read performance
* ext3 performance fix for very slow performance when writing large files on
huge filesystems
* this log_do_checkpoint fix.

i686 and x86_64 kernels are available from:

http://people.redhat.com/sct/.private/test-kernels/kernel-2.6.9-22.EL.sct.4/

Comment 9 Stephen Tweedie 2005-11-07 19:12:33 UTC
Fix committed for inclusion in U3.

Comment 12 Red Hat Bugzilla 2006-03-07 19:17:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html


Comment 15 Jason Baron 2006-07-27 19:37:41 UTC
*** Bug 200434 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.