+++ This bug was initially created as a clone of Bug #123137 +++ Description of problem: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret != 0" ------------[ cut here ]------------ kernel BUG at fs/jbd/checkpoint.c:361! Version-Release number of selected component (if applicable): 2.6.5-1.327 How reproducible: Rare System was a dual Xeon with AMI Megaraid RAID controller. File systems are Ext3. I'll attach the oops output in a second.
*** Bug 167343 has been marked as a duplicate of this bug. ***
There is a one-line fix for this by Jan Kara in the Vanilla Linux Kernel with 2.6.11.12: http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11.12 Additional discussion: http://lkml.org/lkml/2005/6/1/34 http://marc.theaimsgroup.com/?l=linux-kernel&m=111761151011571&w=2 Is it possible for you to create a patch for this for 2.6.9-11 EL smp kernel?
I've tried this patch, and it DOES seem to fix this problem! Well done! Hopefully RedHat will create a kernel update ASAP.
This patch has been in production for 3 weeks now without a single problem. These machines would PANIC almost daily before, mostly at night when we were running backups. Maybe this problem is mostly associated with high-end hardware, like DL380s, but I would think that RedHat would be interested in fixing such a serious problem, especially ones that affect their target hardware. Sofar, I've heard nothing to show that RedHat interested in fixing this. Will this patch be included in a future kernel?
Yes, this fix looks good, and it matches the upstream fix. It will be queued subject to the usual internal review for the U3 kernel. I have a kernel built based on U2 plus 3 filesystem fixes: * readahead fixes for random >4k read performance * ext3 performance fix for very slow performance when writing large files on huge filesystems * this log_do_checkpoint fix. i686 and x86_64 kernels are available from: http://people.redhat.com/sct/.private/test-kernels/kernel-2.6.9-22.EL.sct.4/
Fix committed for inclusion in U3.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html
*** Bug 200434 has been marked as a duplicate of this bug. ***