Description of problem: I had GFS I/O going to 3 mirrors, all created from 3 different VGs, and I then killed the secondary leg of all 3 cmirrors, along with link-07 (one of the four nodes in the cluster) at the same time. Link-04, the node that paniced had a different view of the storage then the rest of the nodes in the cluster. The disk being killed on link-0[278] was /dev/sda, but on link-04 it was /dev/sdh. LINK-02: cmirror1 vg1 Mwi-ao 10.00G cmirror1_mlog 100.00 cmirror1_mimage_0(0),cmirror1_mimage_1(0) [cmirror1_mimage_0] vg1 iwi-ao 10.00G /dev/sdb1(0) [cmirror1_mimage_1] vg1 iwi-ao 10.00G /dev/sda1(0) [cmirror1_mlog] vg1 lwi-ao 4.00M /dev/sdc1(0) cmirror2 vg2 Mwi-ao 10.00G cmirror2_mlog 100.00 cmirror2_mimage_0(0),cmirror2_mimage_1(0) [cmirror2_mimage_0] vg2 iwi-ao 10.00G /dev/sdd1(0) [cmirror2_mimage_1] vg2 iwi-ao 10.00G /dev/sda2(0) [cmirror2_mlog] vg2 lwi-ao 4.00M /dev/sde1(0) cmirror3 vg3 Mwi-ao 10.00G cmirror3_mlog 100.00 cmirror3_mimage_0(0),cmirror3_mimage_1(0) [cmirror3_mimage_0] vg3 iwi-ao 10.00G /dev/sdg1(0) [cmirror3_mimage_1] vg3 iwi-ao 10.00G /dev/sda3(0) [cmirror3_mlog] vg3 lwi-ao 4.00M /dev/sdf1(0) LINK-04: cmirror1 vg1 wi-ao 10.00G cmirror1_mlog 100.00 cmirror1_mimage_0(0),cmirror1_mimage_1(0) [cmirror1_mimage_0] vg1 iwi-ao 10.00G /dev/sda1(0) [cmirror1_mimage_1] vg1 iwi-ao 10.00G /dev/sdh1(0) [cmirror1_mlog] vg1 lwi-ao 4.00M /dev/sdb1(0) cmirror2 vg2 wi-ao 10.00G cmirror2_mlog 100.00 cmirror2_mimage_0(0),cmirror2_mimage_1(0) [cmirror2_mimage_0] vg2 iwi-ao 10.00G /dev/sdc1(0) [cmirror2_mimage_1] vg2 iwi-ao 10.00G /dev/sdh2(0) [cmirror2_mlog] vg2 lwi-ao 4.00M /dev/sdd1(0) cmirror3 vg3 wi-ao 10.00G cmirror3_mlog 100.00 cmirror3_mimage_0(0),cmirror3_mimage_1(0) [cmirror3_mimage_0] vg3 iwi-ao 10.00G /dev/sdf1(0) [cmirror3_mimage_1] vg3 iwi-ao 10.00G /dev/sdh3(0) [cmirror3_mlog] vg3 lwi-ao 4.00M /dev/sde1(0) LINK-07: cmirror1 vg1 Mwi-ao 10.00G cmirror1_mlog 100.00 cmirror1_mimage_0(0),cmirror1_mimage_1(0) [cmirror1_mimage_0] vg1 iwi-ao 10.00G /dev/sdb1(0) [cmirror1_mimage_1] vg1 iwi-ao 10.00G /dev/sda1(0) [cmirror1_mlog] vg1 lwi-ao 4.00M /dev/sdc1(0) cmirror2 vg2 Mwi-ao 10.00G cmirror2_mlog 100.00 cmirror2_mimage_0(0),cmirror2_mimage_1(0) [cmirror2_mimage_0] vg2 iwi-ao 10.00G /dev/sdd1(0) [cmirror2_mimage_1] vg2 iwi-ao 10.00G /dev/sda2(0) [cmirror2_mlog] vg2 lwi-ao 4.00M /dev/sde1(0) cmirror3 vg3 Mwi-ao 10.00G cmirror3_mlog 100.00 cmirror3_mimage_0(0),cmirror3_mimage_1(0) [cmirror3_mimage_0] vg3 iwi-ao 10.00G /dev/sdg1(0) [cmirror3_mimage_1] vg3 iwi-ao 10.00G /dev/sda3(0) [cmirror3_mlog] vg3 lwi-ao 4.00M /dev/sdf1(0) LINK-08: cmirror1 vg1 Mwi-ao 10.00G cmirror1_mlog 100.00 cmirror1_mimage_0(0),cmirror1_mimage_1(0) [cmirror1_mimage_0] vg1 iwi-ao 10.00G /dev/sdb1(0) [cmirror1_mimage_1] vg1 iwi-ao 10.00G /dev/sda1(0) [cmirror1_mlog] vg1 lwi-ao 4.00M /dev/sdc1(0) cmirror2 vg2 Mwi-ao 10.00G cmirror2_mlog 100.00 cmirror2_mimage_0(0),cmirror2_mimage_1(0) [cmirror2_mimage_0] vg2 iwi-ao 10.00G /dev/sdd1(0) [cmirror2_mimage_1] vg2 iwi-ao 10.00G /dev/sda2(0) [cmirror2_mlog] vg2 lwi-ao 4.00M /dev/sde1(0) cmirror3 vg3 Mwi-ao 10.00G cmirror3_mlog 100.00 cmirror3_mimage_0(0),cmirror3_mimage_1(0) [cmirror3_mimage_0] vg3 iwi-ao 10.00G /dev/sdg1(0) [cmirror3_mimage_1] vg3 iwi-ao 10.00G /dev/sda3(0) [cmirror3_mlog] vg3 lwi-ao 4.00M /dev/sdf1(0) end_request: I/O error, dev sdh, sector 855178879 SCSI error : <1 0 1 1> return code = 0x10000 end_request: I/O error, dev sdh, sector 855178887 device-mapper: Write error during recovery (error = 0x1) device-mapper: recovery failed on region 10564 dm-cmirror: Unable to locate record of recovery ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at dm_cmirror_server:764 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) dm_cmirror(U) dlm(U) cman(U) mptfc qla2300 qla2xxx scsi_transport_fc md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Pid: 5678, comm: cluster_log_ser Not tainted 2.6.9-55.ELlargesmp RIP: 0010:[<ffffffffa02a6465>] <ffffffffa02a6465>{:dm_cmirror:cluster_log_serverd+4152} RSP: 0000:0000010031ab9e38 EFLAGS: 00010216 RAX: 0000000000000033 RBX: 00000000fffffffa RCX: 0000000000000246 RDX: 0000000000d6692d RSI: 0000000000000246 RDI: ffffffff803e6580 RBP: 0000000000000000 R08: 00000000000927bf R09: 00000000fffffffa R10: ffffffff80317aa0 R11: 0000ffff804015a0 R12: 0000010039bad400 R13: 0000000000000003 R14: 000001003880b9c0 R15: 000001003880b9e0 FS: 0000002a95563f00(0000) GS:ffffffff80500380(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a95610000 CR3: 0000000000101000 CR4: 00000000000006e0 Process cluster_log_ser (pid: 5678, threadinfo 0000010031ab8000, task 000001001d126030) Stack: 0000010001663100 0000010000000073 0000000000000012 0000000080142ce7 0000010001013a80 0000000000040001 000001001d126030 0000000000045d0f 00003aacd760bf6c 00000100397c37f0 Call Trace:<ffffffff8013aa77>{do_exit+3151} <ffffffff80179335>{vfs_read+248} <ffffffff80110f47>{child_rip+8} <ffffffffa02a542d>{:dm_cmirror:cluster_log_serverd+0} <ffffffff80110f3f>{child_rip+0} Code: 0f 0b e6 8d 2a a0 ff ff ff ff fc 02 48 85 ed 75 34 49 8d bc RIP <ffffffffa02a6465>{:dm_cmirror:cluster_log_serverd+4152} RSP <0000010031ab9e38> <0>Kernel panic - not syncing: Oops Version-Release number of selected component (if applicable): 2.6.9-55.ELlargesmp lvm2-cluster-2.02.21-7.el4 cmirror-kernel-2.6.9-32.0
I'm going to close this bug because it hasn't been seen in awhile and because I believe it may be related to bz 450939.