Description of problem: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret != 0" ------------[ cut here ]------------ kernel BUG at fs/jbd/checkpoint.c:361! Version-Release number of selected component (if applicable): 2.6.5-1.327 How reproducible: Rare System was a dual Xeon with AMI Megaraid RAID controller. File systems are Ext3. I'll attach the oops output in a second.
Created attachment 100200 [details] oops output Oops output when this happened. The system load was probably 3ish. Uptime was less than a day (due to an un-related reboot)
There were quite a few ext3 related changes in later kernels. I'm not guaranteeing they fix this problem, but it makes more sense to test -358 if you can.
No information given about later kernels, so closing: please reopen if you can still reproduce this problem.
I've been bit by this problem under both 2.6.8.1 and 2.6.9 now. I don't have an oops from 2.6.9 yet (unfortunately, I'll check once I get home and see if it got logged over the serial console) but here is one from 2.6.8.1: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret +!= 0" kernel BUG at fs/jbd/checkpoint.c:361! invalid operand: 0000 [#1] Oops: ------------[ cut here ]------------ SMP Modules linked in: ipt_REDIRECT ipt_REJECT iptable_nat iptable_mangle iptable_filter ipt_state +ipt_pkttype ipt_physdev ipt_multiport ipt_conntrack ipt_MARK ipt_LOG ip_conntrack ip_tables 8250 +serial_core snd_intel8x0 s nd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd +soundcore ehci_hcd uhci_hcd usbcore intel_agp agpgart eeprom lm85 i2c_sensor i2c_i801 i2c_dev +i2c_core pcspkr CPU: 1 EIP: 0060:[log_do_checkpoint+364/459] Not tainted EFLAGS: 00010286 (2.6.8.1-vs1.9.2kenobi.3) EIP is at log_do_checkpoint+0x16c/0x1cb eax: 0000006e ebx: 00000000 ecx: c036ad04 edx: c036ad04 esi: 00000000 edi: 00000001 ebp: c932d83c esp: e3a9fd0c ds: 007b es: 007b ss: 0068 Process sendmail (pid: 9628, threadinfo=e3a9e000 task=e10c3770) Stack: c03323c0 c031be9d c03301f7 00000169 c0335200 00294867 c1a87180 00000000 00000000 e498574c c0476120 00000000 00000003 c180c0a0 c180cd60 c015a341 dedcbf5c dedcbf5c dedcbf5c f314ae3c dedcbf5c c01a60ac f700f4e0 f314ae3c Call Trace: [wake_up_buffer+23/83] wake_up_buffer+0x17/0x53 [do_get_write_access+645/1583] do_get_write_access+0x285/0x62f [wake_up_buffer+23/83] wake_up_buffer+0x17/0x53 [find_busiest_group+234/806] find_busiest_group+0xea/0x326 [ext3_do_update_inode+517/1094] ext3_do_update_inode+0x205/0x446 [radix_tree_delete+325/398] radix_tree_delete+0x145/0x18e [__log_wait_for_space+199/218] __log_wait_for_space+0xc7/0xda [start_this_handle+290/954] start_this_handle+0x122/0x3ba [find_get_pages+55/90] find_get_pages+0x37/0x5a [pagevec_lookup+46/56] pagevec_lookup+0x2e/0x38 [truncate_inode_pages+289/696] truncate_inode_pages+0x121/0x2b8 [journal_start+171/210] journal_start+0xab/0xd2 [locks_delete_lock+139/221] locks_delete_lock+0x8b/0xdd [start_transaction+35/88] start_transaction+0x23/0x58 [locks_remove_posix+239/268] locks_remove_posix+0xef/0x10c [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6 [ext3_delete_inode+39/230] ext3_delete_inode+0x27/0xe6 [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6 [generic_delete_inode+147/316] generic_delete_inode+0x93/0x13c [iput+98/124] iput+0x62/0x7c [dput+231/403] dput+0xe7/0x193 [__fput+179/260] __fput+0xb3/0x104 [filp_close+89/134] filp_close+0x59/0x86 [sys_close+94/113] sys_close+0x5e/0x71 [syscall_call+7/11] syscall_call+0x7/0xb Code: 0f 0b 69 01 f7 01 33 c0 eb b8 8d 44 24 1c 8d 54 24 24 89 44
Alright, just happened again, that's twice in one day...