Description of problem: I hit this issue on link-01 while running revolver (with the LITE I/O load) on an eight node cluster (link-01 - link-08). Link-06, link-07, and link-08 were shot by revolver and when replaying the journals on link-01 it this corruption. <Nov/22 03:45 pm>CMAN: removing node link-08 from the cluster : No response to messages <Nov/22 03:46 pm>CMAN: removing node link-07 from the cluster : No response to messages <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs2.2: jid=4: Trying to acquire journal lock... <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs1.2: jid=4: Trying to acquire journal lock... <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs1.2: jid=4: Busy <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs2.2: jid=4: Busy <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs2.2: jid=3: Trying to acquire journal lock... <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs2.2: jid=3: Busy <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs2.2: jid=0: Trying to acquire journal lock... <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs2.2: jid=0: Busy <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs1.2: jid=3: Trying to acquire journal lock... <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs1.2: jid=3: Busy <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs1.2: jid=1: Trying to acquire journal lock... <Nov/22 03:47 pm>GFS: fsid=LINK_128:gfs1.2: jid=1: Busy <Nov/22 03:47 pm>CMAN: quorum lost, blocking activity <Nov/22 03:50 pm>CMAN: quorum regained, resuming activity <Nov/22 03:51 pm>GFS: fsid=LINK_128:gfs2.2: jid=1: Trying to acquire journal lock... <Nov/22 03:51 pm>GFS: fsid=LINK_128:gfs1.2: jid=0: Trying to acquire journal lock... <Nov/22 03:51 pm>GFS: fsid=LINK_128:gfs2.2: jid=1: Busy <Nov/22 03:51 pm>GFS: fsid=LINK_128:gfs1.2: jid=0: Busy <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: fatal: invalid metadata block <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: bh = 31109846 (type: exp=4, found=0) <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: function = gfs_get_meta_buffer <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: file = /usr/src/build/643480-x86_64/BUILD/gfs-kernel-2.6.9-44/smp/src/gfs/dio.c,line = 1223 <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: time = 1132676573 <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: about to withdraw from the cluster <Nov/22 03:55 pm>GFS: fsid=LINK_128:gfs1.2: waiting for outstanding I/O <Nov/22 03:55 pm>----------- [cut here ] --------- [please bite here ] --------- <Nov/22 03:55 pm>Kernel BUG at lm:190 <Nov/22 03:55 pm>invalid operand: 0000 [1] SMP <Nov/22 03:55 pm>CPU 1 <Nov/22 03:55 pm>Modules linked in: lock_dlm(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core dm_mod ohci_hcd hw_random tg3 floppy ext3 jbd qla2300 qla2xxx scsi_trans<Nov/22 03:55 pm>port_fc sd_mod scsi_mod <Nov/22 03:55 pm>Pid: 6206, comm: growfiles Tainted: G M 2.6.9-22.0.1.ELsmp <Nov/22 03:55 pm>RIP: 0010:[<ffffffffa021e807>] <ffffffffa021e807>{:gfs:gfs_lm_withdraw+215} <Nov/22 03:55 pm>RSP: 0018:0000010037af5af8 EFLAGS: 00010202 <Nov/22 03:55 pm>RAX: 0000000000000037 RBX: ffffff00001828c0 RCX: 0000000100000000 <Nov/22 03:55 pm>RDX: ffffffff803d78c8 RSI: 0000000000000246 RDI: ffffffff803d78c0 <Nov/22 03:55 pm>RBP: ffffff000014a000 R08: ffffffff803d78c8 R09: ffffff00001828c0 <Nov/22 03:55 pm>R10: ffffffff8011de14 R11: ffffffff8011de14 R12: 000001002a860528 <Nov/22 03:55 pm>R13: 000001002a8606b8 R14: 0000000000000000 R15: 0000000001dab2d6 <Nov/22 03:55 pm>FS: 0000002a95575f00(0000) GS:ffffffff804d3100(005b) knlGS:00000000f7fdf6c0 <Nov/22 03:55 pm>CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b <Nov/22 03:55 pm>CR2: 00000000f7ffc000 CR3: 000000003ff38000 CR4: 00000000000006e0 <Nov/22 03:55 pm>Process growfiles (pid: 6206, threadinfo 0000010037af4000, task 000001003e749030) <Nov/22 03:55 pm>Stack: 0000003000000030 0000010037af5c28 0000010037af5b18 ffffffffa0007ac4 <Nov/22 03:55 pm> 00000100022802e8 000001003fb54ab0 ffffff00001828c0 ffffff00001828c0 <Nov/22 03:55 pm> 0000000001dab2d6 0000000000000004 <Nov/22 03:55 pm>Call Trace:<ffffffffa0007ac4>{:scsi_mod:scsi_request_fn+1100} <Nov/22 03:55 pm> <ffffffff80303814>{io_schedule+37} <ffffffff80178062>{__wait_on_buffer+143} <Nov/22 03:55 pm> <ffffffff80177ed6>{bh_wake_function+0} <ffffffffa0236c83>{:gfs:gfs_metatype_check_ii+54} <Nov/22 03:55 pm> <ffffffffa020b52c>{:gfs:gfs_get_meta_buffer+580} <ffffffffa0217985>{:gfs:gfs_copyin_dinode+23} <Nov/22 03:55 pm> <ffffffff8011de14>{flat_send_IPI_mask+0} <ffffffffa021744d>{:gfs:inode_go_lock+38} <Nov/22 03:55 pm> <ffffffffa021457a>{:gfs:glock_wait_internal+563} <ffffffffa0214cd2>{:gfs:gfs_glock_nq+961} <Nov/22 03:55 pm> <ffffffffa0214efb>{:gfs:gfs_glock_nq_init+20} <ffffffffa022c8a7>{:gfs:gfs_permission+64} <Nov/22 03:55 pm> <ffffffffa02272e1>{:gfs:gfs_drevalidate+409} <ffffffff80183086>{permission+51} <Nov/22 03:55 pm> <ffffffff80184dba>{may_open+88} <ffffffff801852ab>{open_namei+788} <Nov/22 03:55 pm> <ffffffff80131c39>{finish_task_switch+55} <ffffffff80176524>{filp_open+39} <Nov/22 03:55 pm> <ffffffff801e9fd5>{strncpy_from_user+74} <ffffffff8017662d>{get_unused_fd+230} <Nov/22 03:55 pm> <ffffffff8012762f>{sys32_open+54} <ffffffff8012500f>{cstar_do_call+27} <Nov/22 03:55 pm> <Nov/22 03:55 pm> <Nov/22 03:55 pm>Code: 0f 0b 3b a8 23 a0 ff ff ff ff be 00 8b 85 a0 88 03 00 85 c0 <Nov/22 03:55 pm>RIP <ffffffffa021e807>{:gfs:gfs_lm_withdraw+215} RSP <0000010037af5af8> <Nov/22 03:55 pm> <0>Kernel panic - not syncing: Oops Version-Release number of selected component (if applicable): Kernel 2.6.9-22.0.1.ELsmp on an x86_64 CMAN 2.6.9-40.0 (built Nov 7 2005 15:30:36) installed DLM 2.6.9-39.0 (built Nov 14 2005 17:38:14) installed Lock_Harness 2.6.9-44.0 (built Nov 17 2005 15:43:18) installed GFS 2.6.9-44.0 (built Nov 17 2005 15:43:35) installed Lock_Nolock 2.6.9-44.0 (built Nov 17 2005 15:43:19) installed
Just a note that it appears this issue has been seen outside of Redhat in bz 175589.
*** This bug has been marked as a duplicate of 175589 ***