linux version 2.6.9-11.ELsmp on Dual Xeon system with ext3 ( formatted w/ /sbin/mkfs.ext3 -b 2048 -i 2048 -j -O dir_index,sparse_super ) and Compaq Smart Array 64xx (rev 01) Raid controller with a load avg was around a steady 2.0, there was a kernel panic after around 12 hours of load testing: ---BEGIN OOPS--- Aug 28 05:03:24 im01 kernel BUG at fs/jbd/checkpoint.c:361! Aug 28 05:03:24 im01 invalid operand: 0000 [#1] Aug 28 05:03:24 im01 SMP Aug 28 05:03:24 im01 Modules linked in: aoe(U) drbd(U) md5 ipv6 autofs4 i2c_dev i2c_core dm_mod button battery ac uhci_hcd ehci_hcd hw_random tg3 floppy ext3 jbd cciss sd_mod scsi_mod Aug 28 05:03:24 im01 CPU: 1 Aug 28 05:03:24 im01 EIP: 0060:[<f8868f87>] Tainted: GF VLI Aug 28 05:03:24 im01 EFLAGS: 00010216 (2.6.9-11.ELsmp) Aug 28 05:03:24 im01 EIP is at log_do_checkpoint+0x111/0x14b [jbd] Aug 28 05:03:24 im01 eax: 0000006e ebx: cc7fa4dc ecx: d3f46d30 edx: f886cd4c Aug 28 05:03:24 im01 esi: c3f52a00 edi: f7d4f980 ebp: 00000000 esp: d3f46d2c Aug 28 05:03:24 im01 ds: 007b es: 007b ss: 0068 Aug 28 05:03:24 im01 Process imapd (pid: 21571, threadinfo=d3f46000 task=cc444230) Aug 28 05:03:24 im01 Stack: f886cd4c f886befd f886cd38 00000169 f886ce07 008f3cc4 ef9fc8cc cc7fa4dc Aug 28 05:03:24 im01 00000000 00000000 ecd29de4 cb0d4424 cd1701e8 cb0d489c 00000001 00000000 Aug 28 05:03:24 im01 000000fe 0000017e 00000001 c0401220 c3bc1d60 00000001 c3bc9d60 c3bc1d60 Aug 28 05:03:24 im01 Call Trace: Aug 28 05:03:24 im01 [<c011d681>] load_balance_newidle+0x5c/0x74 Aug 28 05:03:24 im01 [<c011caf1>] finish_task_switch+0x30/0x66 Aug 28 05:03:24 im01 [<c02c5604>] schedule+0x844/0x87a Aug 28 05:03:24 im01 [<c0270a65>] memcpy_toiovec+0x5f/0x88 Aug 28 05:03:24 im01 [<c011dd19>] __wake_up_locked+0x11/0x13 Aug 28 05:03:24 im01 [<c02c4c38>] __down+0xcc/0xdb Aug 28 05:03:24 im01 [<c011dc6f>] default_wake_function+0x0/0xc Aug 28 05:03:24 im01 [<f8868b40>] __log_wait_for_space+0xbb/0xe5 [jbd] Aug 28 05:03:24 im01 [<f886537e>] start_this_handle+0x2e4/0x32a [jbd] Aug 28 05:03:24 im01 [<c016164a>] do_lookup+0x1f/0x8f Aug 28 05:03:24 im01 [<c011f6ee>] autoremove_wake_function+0x0/0x2d Aug 28 05:03:24 im01 [<c011f6ee>] autoremove_wake_function+0x0/0x2d Aug 28 05:03:24 im01 [<f886547c>] journal_start+0x78/0x9e [jbd] Aug 28 05:03:24 im01 [<f889ef9a>] ext3_dirty_inode+0x24/0x66 [ext3] Aug 28 05:03:24 im01 [<c0171b04>] __mark_inode_dirty+0x28/0x176 Aug 28 05:03:24 im01 [<c016c68e>] update_atime+0x6a/0x90 Aug 28 05:03:24 im01 [<c013d396>] generic_file_mmap+0x2a/0x37 Aug 28 05:03:24 im01 [<c014b22d>] do_mmap_pgoff+0x481/0x666 Aug 28 05:03:24 im01 [<c010b557>] sys_mmap2+0x7e/0xaf Aug 28 05:03:24 im01 [<c02c7377>] syscall_call+0x7/0xb Aug 28 05:03:24 im01 [<c02c007b>] unix_release_sock+0x15a/0x201 Aug 28 05:03:24 im01 Code: 89 f0 e8 4c fc ff ff 0b 44 24 10 75 29 68 07 ce 86 f8 68 69 01 00 00 68 38 cd 86 f8 68 fd be 86 f8 68 4c cd 86 f8 e8 4f 8a 8b c7 <0f> 0b 69 01 38 cd 86 f8 83 c4 14 39 7e 40 0f 84 09 ff ff ff 8d Aug 28 05:03:24 im01 <0>Fatal exception: panic in 5 seconds ---END OOPS--- +++ This bug was initially created as a clone of Bug #123137 +++ Description of problem: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret != 0" ------------[ cut here ]------------ kernel BUG at fs/jbd/checkpoint.c:361! Version-Release number of selected component (if applicable): 2.6.5-1.327 How reproducible: Rare System was a dual Xeon with AMI Megaraid RAID controller. File systems are Ext3. I'll attach the oops output in a second. -- Additional comment from dac on 2004-05-12 17:01 EST -- Created an attachment (id=100200) oops output Oops output when this happened. The system load was probably 3ish. Uptime was less than a day (due to an un-related reboot) -- Additional comment from davej on 2004-05-13 10:35 EST -- There were quite a few ext3 related changes in later kernels. I'm not guaranteeing they fix this problem, but it makes more sense to test -358 if you can. -- Additional comment from sct on 2004-09-10 11:08 EST -- No information given about later kernels, so closing: please reopen if you can still reproduce this problem. -- Additional comment from sfrost on 2004-10-19 15:58 EST -- I've been bit by this problem under both 2.6.8.1 and 2.6.9 now. I don't have an oops from 2.6.9 yet (unfortunately, I'll check once I get home and see if it got logged over the serial console) but here is one from 2.6.8.1: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret +!= 0" kernel BUG at fs/jbd/checkpoint.c:361! invalid operand: 0000 [#1] Oops: ------------[ cut here ]------------ SMP Modules linked in: ipt_REDIRECT ipt_REJECT iptable_nat iptable_mangle iptable_filter ipt_state +ipt_pkttype ipt_physdev ipt_multiport ipt_conntrack ipt_MARK ipt_LOG ip_conntrack ip_tables 8250 +serial_core snd_intel8x0 s nd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd +soundcore ehci_hcd uhci_hcd usbcore intel_agp agpgart eeprom lm85 i2c_sensor i2c_i801 i2c_dev +i2c_core pcspkr CPU: 1 EIP: 0060:[log_do_checkpoint+364/459] Not tainted EFLAGS: 00010286 (2.6.8.1-vs1.9.2kenobi.3) EIP is at log_do_checkpoint+0x16c/0x1cb eax: 0000006e ebx: 00000000 ecx: c036ad04 edx: c036ad04 esi: 00000000 edi: 00000001 ebp: c932d83c esp: e3a9fd0c ds: 007b es: 007b ss: 0068 Process sendmail (pid: 9628, threadinfo=e3a9e000 task=e10c3770) Stack: c03323c0 c031be9d c03301f7 00000169 c0335200 00294867 c1a87180 00000000 00000000 e498574c c0476120 00000000 00000003 c180c0a0 c180cd60 c015a341 dedcbf5c dedcbf5c dedcbf5c f314ae3c dedcbf5c c01a60ac f700f4e0 f314ae3c Call Trace: [wake_up_buffer+23/83] wake_up_buffer+0x17/0x53 [do_get_write_access+645/1583] do_get_write_access+0x285/0x62f [wake_up_buffer+23/83] wake_up_buffer+0x17/0x53 [find_busiest_group+234/806] find_busiest_group+0xea/0x326 [ext3_do_update_inode+517/1094] ext3_do_update_inode+0x205/0x446 [radix_tree_delete+325/398] radix_tree_delete+0x145/0x18e [__log_wait_for_space+199/218] __log_wait_for_space+0xc7/0xda [start_this_handle+290/954] start_this_handle+0x122/0x3ba [find_get_pages+55/90] find_get_pages+0x37/0x5a [pagevec_lookup+46/56] pagevec_lookup+0x2e/0x38 [truncate_inode_pages+289/696] truncate_inode_pages+0x121/0x2b8 [journal_start+171/210] journal_start+0xab/0xd2 [locks_delete_lock+139/221] locks_delete_lock+0x8b/0xdd [start_transaction+35/88] start_transaction+0x23/0x58 [locks_remove_posix+239/268] locks_remove_posix+0xef/0x10c [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6 [ext3_delete_inode+39/230] ext3_delete_inode+0x27/0xe6 [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6 [generic_delete_inode+147/316] generic_delete_inode+0x93/0x13c [iput+98/124] iput+0x62/0x7c [dput+231/403] dput+0xe7/0x193 [__fput+179/260] __fput+0xb3/0x104 [filp_close+89/134] filp_close+0x59/0x86 [sys_close+94/113] sys_close+0x5e/0x71 [syscall_call+7/11] syscall_call+0x7/0xb Code: 0f 0b 69 01 f7 01 33 c0 eb b8 8d 44 24 1c 8d 54 24 24 89 44 -- Additional comment from sfrost on 2004-10-19 20:11 EST -- Alright, just happened again, that's twice in one day...
*** This bug has been marked as a duplicate of 162814 ***