Description of problem: Using iSCSI to mount LeftHand Networks SAN unit. Running NFS to share mounted SAN volume to other machines. Using ext3 file system format. The system kernel panics infrequently. Jul 12 04:20:21 fs1 kernel: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret != 0" Jul 12 04:20:21 fs1 kernel: ------------[ cut here ]------------ Jul 12 04:20:21 fs1 kernel: kernel BUG at fs/jbd/checkpoint.c:361! Jul 12 04:20:21 fs1 kernel: invalid operand: 0000 [#1] Jul 12 04:20:21 fs1 kernel: SMP Jul 12 04:20:21 fs1 kernel: Modules linked in: nfs nfsd exportfs lockd md5 ipv6 sunrpc crc32c libcrc32c iscsi_sfnet scsi_transport_iscsi microcode dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random shpchp e1000 bonding(U) floppy sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod Jul 12 04:20:21 fs1 kernel: CPU: 2 Jul 12 04:20:21 fs1 kernel: EIP: 0060:[<f8832fa7>] Not tainted VLI Jul 12 04:20:21 fs1 kernel: EFLAGS: 00010216 (2.6.9-22.ELsmp) Jul 12 04:20:21 fs1 kernel: EIP is at log_do_checkpoint+0x111/0x14b [jbd] Jul 12 04:20:21 fs1 kernel: eax: 0000006e ebx: f6dd7c8c ecx: f7764ac4 edx: f8836d4c Jul 12 04:20:21 fs1 kernel: esi: f7e20600 edi: f64eab00 ebp: 00000000 esp: f7764ac0 Jul 12 04:20:21 fs1 kernel: ds: 007b es: 007b ss: 0068 Jul 12 04:20:21 fs1 kernel: Process nfsd (pid: 2600, threadinfo=f7764000 task=f701a7b0) Jul 12 04:20:21 fs1 kernel: Stack: f8836d4c f8835efd f8836d38 00000169 f8836e07 13d7fc8d f6dd7c8c f6dd7c8c Jul 12 04:20:21 fs1 kernel: 00000000 00000000 f4923c10 c015bff2 00001000 f888f157 00001000 00000246 Jul 12 04:20:21 fs1 kernel: 00000012 f88fde1a 00000000 ef0ca250 02000000 f7716344 00000008 00000000 Jul 12 04:20:21 fs1 kernel: Call Trace: Jul 12 04:20:21 fs1 kernel: [<c015bff2>] __bread+0x9/0x1e Jul 12 04:20:21 fs1 kernel: [<f888f157>] ext3_get_branch+0x63/0xc6 [ext3] Jul 12 04:20:21 fs1 kernel: [<f88fde1a>] e1000_xmit_frame+0x947/0x951 [e1000] Jul 12 04:20:21 fs1 kernel: [<c028b041>] qdisc_restart+0x12/0x1ad Jul 12 04:20:21 fs1 kernel: [<c027dccb>] dev_queue_xmit+0x1ff/0x207 Jul 12 04:20:21 fs1 kernel: [<f88ce357>] bond_dev_queue_xmit+0x1d4/0x1db [bonding] Jul 12 04:20:21 fs1 kernel: [<f88d12e5>] bond_xmit_roundrobin+0xd3/0xdb [bonding] Jul 12 04:20:21 fs1 kernel: [<f8832b60>] __log_wait_for_space+0xbb/0xe5 [jbd] Jul 12 04:20:21 fs1 kernel: [<f882f37e>] start_this_handle+0x2e4/0x32a [jbd] Jul 12 04:20:21 fs1 kernel: [<c012f294>] in_group_p+0x31/0x58 Jul 12 04:20:21 fs1 kernel: [<f889bb7c>] ext3_permission+0x0/0x163 [ext3] Jul 12 04:20:21 fs1 kernel: [<f889bc66>] ext3_permission+0xea/0x163 [ext3] Jul 12 04:20:21 fs1 kernel: [<c02cf6e3>] __cond_resched+0x14/0x39 Jul 12 04:20:21 fs1 kernel: [<f882f47c>] journal_start+0x78/0x9e [jbd] Jul 12 04:20:21 fs1 kernel: [<f888fd67>] ext3_prepare_write+0x32/0xf5 [ext3] Jul 12 04:20:21 fs1 kernel: [<c0141205>] generic_file_buffered_write+0x186/0x47c Jul 12 04:20:21 fs1 kernel: [<c0126344>] current_fs_time+0x44/0x4c Jul 12 04:20:21 fs1 kernel: [<c0141884>] __generic_file_aio_write_nolock+0x389/0x3b7 Jul 12 04:20:21 fs1 kernel: [<c01419b5>] __generic_file_write_nolock+0x84/0x99 Jul 12 04:20:21 fs1 kernel: [<c011ffb1>] autoremove_wake_function+0x0/0x2d Jul 12 04:20:21 fs1 kernel: [<c0141cc5>] generic_file_writev+0x4f/0xac Jul 12 04:20:21 fs1 kernel: [<c0141c76>] generic_file_writev+0x0/0xac Jul 12 04:20:21 fs1 kernel: [<c0159c8d>] do_sync_write+0x0/0xc9 Jul 12 04:20:21 fs1 kernel: [<c015a177>] do_readv_writev+0x19c/0x21d Jul 12 04:20:21 fs1 kernel: [<c0159281>] dentry_open+0xf0/0x1a5 Jul 12 04:20:21 fs1 kernel: [<c015a276>] vfs_writev+0x3e/0x43 Jul 12 04:20:21 fs1 kernel: [<f8c87e99>] nfsd_write+0xeb/0x289 [nfsd] Jul 12 04:20:21 fs1 kernel: [<f8c8e682>] nfsd3_proc_write+0xbf/0xd5 [nfsd] Jul 12 04:20:21 fs1 kernel: [<f8c906ab>] nfs3svc_decode_writeargs+0x0/0x243 [nfsd] Jul 12 04:20:21 fs1 kernel: [<f8c8467a>] nfsd_dispatch+0xba/0x170 [nfsd] Jul 12 04:20:21 fs1 kernel: [<f8be4429>] svc_process+0x41b/0x6ce [sunrpc] Jul 12 04:20:21 fs1 kernel: [<f8c8445a>] nfsd+0x1cc/0x332 [nfsd] Jul 12 04:20:21 fs1 kernel: [<f8c8428e>] nfsd+0x0/0x332 [nfsd] Jul 12 04:20:21 fs1 kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb Jul 12 04:20:21 fs1 kernel: Code: 89 f0 e8 4c fc ff ff 0b 44 24 10 75 29 68 07 6e 83 f8 68 69 01 00 00 68 38 6d 83 f8 68 fd 5e 83 f8 68 4c 6d 83 f8 e8 6b f3 8e c7 <0f> 0b 69 01 38 6d 83 f8 83 c4 14 39 7e 40 0f 84 09 ff ff ff 8d Jul 12 04:20:21 fs1 kernel: <0>Fatal exception: panic in 5 seconds [root@fs1 log]# According to Stephen Tweedie, this should be fixed in this patch which I traced back to Linux Kernel 2.6.12. --------------- commit 00ea81459c279f14a7b344320a71c94f60f88929 Author: Jan Kara <jack> Date: Thu Jun 2 14:02:00 2005 -0700 [PATCH] ext3: fix log_do_checkpoint() assertion failure Fix possible false assertion failure in log_do_checkpoint(). We might fail to detect that we actually made a progress when cleaning up the checkpoint lists if we don't retry after writing something to disk. The patch was confirmed to fix observed assertion failures for several users. When we flushed some buffers we need to retry scanning the list. Otherwise we can fail to detect our progress. Signed-off-by: Jan Kara <jack> Signed-off-by: Andrew Morton <akpm> Signed-off-by: Linus Torvalds <torvalds> ------------------ Issue is that the current RedHat EL kernel rev is only at 2.6.9-34.EL! Has this patch been back-ported to this kernel? Version-Release number of selected component (if applicable): Dell Power Edge 2850 Perc RAID controller iscsi-initiator-utils-4.0.3.0-2 Linux fs1 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux Difficult to repoduce! System kernel panics once every 2 weeks.
*** This bug has been marked as a duplicate of 162814 ***