Bug 238001

Summary: kernel panic Assertion failure in log_do_checkpoint()
Product: Red Hat Enterprise Linux 4 Reporter: Mark Tinberg <mtinberg>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: esandeen, esm
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-31 18:20:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Mark Tinberg 2007-04-26 14:39:51 UTC
Description of problem:

Apr 25 22:58:26 hostname kernel: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:363: 
"drop_count != 0 || cleanup_ret != 0"

Version-Release number of selected component (if applicable):

Linux version 2.6.9-42.0.10.ELsmp (brewbuilder@ls20-bc1-14.build.redhat.com) (gcc version 3.4.6 
20060404 (Red Hat 3.4.6-3)) #1 SMP Fri Feb 16 17:13:42 EST 2007

How reproducible:

I've had two kernel panics this morning on this system both inside the ext3 filesystem according to the 
traceback on the screen (which I unfortunately did not transcribe)

Steps to Reproduce:

No idea.
  
Actual results:

Kernel panic

Expected results:

Kernel not panicing, it's all OK.

Additional info:


Host is a Dell PowerEdge 2850 with PERC 5/i 5.0.1-0030 with one 36G raid 1 for the OS and a 270G 
raid 10 for the data.

Comment 1 Ed Marshall 2007-04-26 16:14:43 UTC
We just got this exact same error:

Apr 26 10:01:09 hostname kernel: Assertion failure in log_do_checkpoint() at fs/j
bd/checkpoint.c:363: "drop_count != 0 || cleanup_ret != 0"

uname -a reports:

Linux biscuit.subscribermail.com 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:13:42
EST 2007 x86_64 x86_64 x86_64 GNU/Linux

No traceback logged, and console is in another facility, so I'm afraid I can't
give you any other information right now. System is a Penguin Computing Altus
1600 SAS:
http://www.penguincomputing.com/index.php?option=com_content&task=view&id=341&Itemid=489

Comment 2 graeme 2007-05-08 16:43:05 UTC
Hi,

I got this error on an mail store acting as an nfshead running the same kernel.
any ideas how to solve this one?
I have a diskdump if required.

syslog entries below: 

May  8 12:23:52 ruchba kernel: Assertion failure in log_do_checkpoint() at
fs/jbd/checkpoint.c:363: "drop_count != 0 || cleanup_ret != 0"
May  8 12:23:52 ruchba kernel: ------------[ cut here ]------------
May  8 12:23:52 ruchba kernel: ------------[ cut here ]------------
May  8 12:23:52 ruchba kernel: kernel BUG at include/asm/spinlock.h:109!
May  8 12:23:52 ruchba kernel: invalid operand: 0000 [#1]
May  8 12:23:52 ruchba kernel: SMP 
May  8 12:23:52 ruchba kernel: Modules linked in: nfsd exportfs lockd nfs_acl
sunrpc emcpdm(U) emcpgpx(U) emcpmpx(U) emcp(U) emcplib(U) ide_dump cciss_dump
scsi_dump diskdump zlib_deflate i2c_dev i2c_core sg md5 ipv6 iptable_filter
ip_tables dm_mirror button battery ac uhci_hcd ehci_hcd hw_random tg3 8021q
bonding(U) floppy ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc cciss sd_mod
scsi_mod
May  8 12:23:52 ruchba kernel: CPU:    0
May  8 12:23:52 ruchba kernel: EIP:    0060:[<c02d36dc>]    Tainted: P      VLI
May  8 12:23:52 ruchba kernel: EFLAGS: 00010002   (2.6.9-42.0.10.ELsmp) 


Comment 3 graeme 2007-05-09 08:49:58 UTC
Hi All, the particular spinlock entry does to an extent resemble the error
experienced in Bug 238001

The above bug claims that upgrading to errata:
http://rhn.redhat.com/errata/RHBA-2007-0304.html fixes the problem.

If I dont hear anything from support im going to update, will let you know how
it goes.

Comment 4 graeme 2007-05-11 08:40:41 UTC
Sorry thats Bug 191831, not 238001.

Comment 5 Eric Sandeen 2007-05-29 16:46:58 UTC
These look very similar to bug 205610, which was dup'd to bug 224638, and has a
fix committed in 2.6.9-55.2 for RHEL4U6.

If anyone has a reproducible testcase it'd be good to test a kernel with that
change in place.  Barring evidence to the contrary, I think that this is a dup
of 205610/224638.

Thanks,
-Eric

Comment 6 Eric Sandeen 2007-05-31 18:20:30 UTC
Duping to bug 224638.  If this problem persists in 2.6.9-55.2 and beyond, please
re-open.

Thanks,
-Eric

*** This bug has been marked as a duplicate of 224638 ***