162814 – Assertion failure in log_do_checkpoint

Bug 162814 - Assertion failure in log_do_checkpoint

Summary: Assertion failure in log_do_checkpoint

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Stephen Tweedie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	167343 200434 (view as bug list)
Depends On:	123137
Blocks:	168429
TreeView+	depends on / blocked

Reported:	2005-07-08 21:23 UTC by Stephen Tweedie
Modified:	2018-10-19 19:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHSA-2006-0132
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-03-07 19:17:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2005:808	0	normal	SHIPPED_LIVE	Important: kernel security update	2005-10-27 04:00:00 UTC
Red Hat Product Errata	RHSA-2006:0132	0	qe-ready	SHIPPED_LIVE	Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3	2006-03-09 16:31:00 UTC

Description Stephen Tweedie 2005-07-08 21:23:13 UTC

+++ This bug was initially created as a clone of Bug #123137 +++

Description of problem:
Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361:
"drop_count != 0 || cleanup_ret != 0"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:361!

Version-Release number of selected component (if applicable):
2.6.5-1.327

How reproducible:
Rare

System was a dual Xeon with AMI Megaraid RAID controller.  File
systems are Ext3.

I'll attach the oops output in a second.

Comment 3 Dave Jones 2005-09-05 03:53:35 UTC

*** Bug 167343 has been marked as a duplicate of this bug. ***

Comment 4 Jeff Welden 2005-09-12 23:29:40 UTC

There is a one-line fix for this by Jan Kara in the Vanilla Linux Kernel with
2.6.11.12:
    http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11.12

Additional discussion:
    http://lkml.org/lkml/2005/6/1/34
    http://marc.theaimsgroup.com/?l=linux-kernel&m=111761151011571&w=2

Is it possible for you to create a patch for this for 2.6.9-11 EL smp kernel?

Comment 5 Need Real Name 2005-09-14 22:52:55 UTC

I've tried this patch, and it DOES seem to fix this problem!   Well done! 
Hopefully RedHat will create a kernel update ASAP.

Comment 6 Need Real Name 2005-10-03 20:07:40 UTC

This patch has been in production for 3 weeks now without a single problem. 
These machines would PANIC almost daily before, mostly at night when we were
running backups.  

Maybe this problem is mostly associated with high-end hardware, like DL380s, but
I would think that RedHat would be interested in fixing such a serious problem,
especially ones that affect their target hardware.

Sofar, I've heard nothing to show that RedHat interested in fixing this.

Will this patch be included in a future kernel?

Comment 7 Stephen Tweedie 2005-10-05 20:24:57 UTC

Yes, this fix looks good, and it matches the upstream fix.  It will be queued
subject to the usual internal review for the U3 kernel.

I have a kernel built based on U2 plus 3 filesystem fixes:
* readahead fixes for random >4k read performance
* ext3 performance fix for very slow performance when writing large files on
huge filesystems
* this log_do_checkpoint fix.

i686 and x86_64 kernels are available from:

http://people.redhat.com/sct/.private/test-kernels/kernel-2.6.9-22.EL.sct.4/

Comment 9 Stephen Tweedie 2005-11-07 19:12:33 UTC

Fix committed for inclusion in U3.

Comment 12 Red Hat Bugzilla 2006-03-07 19:17:12 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Comment 15 Jason Baron 2006-07-27 19:37:41 UTC

*** Bug 200434 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.