Here is the test case that was run on the 2-node cluster (link-02 and link-08): Senario: Kill secondary leg of non synced core log 2 leg mirror ****** Mirror hash info for this scenario ****** * name: fail_secondary_non_synced_core_log_2_legs_3_mirrors * sync: 0 * mirrors: 3 * disklog: 0 * failpv: /dev/sdd1 * legs: 2 * pvs: /dev/sdc1 /dev/sdd1 ************************************************ Creating mirror(s) on link-08... qarsh root@link-08 lvcreate --corelog -m 1 -n fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 -L 800M helterdc1:0-1000 /dev/sdd1:0-1000 Creating mirror(s) on link-08... qarsh root@link-08 lvcreate --corelog -m 1 -n fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 -L 800M helterdc1:0-1000 /dev/sdd1:0-1000 Creating mirror(s) on link-08... qarsh root@link-08 lvcreate --corelog -m 1 -n fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 -L 800M helterdc1:0-1000 /dev/sdd1:0-1000 mirror(s) currently at 2.00% synced right now Creating gfs on top of mirror(s) on link-08... Creating gfs on top of mirror(s) on link-08... Creating gfs on top of mirror(s) on link-08... Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on link-02... Mounting gfs /dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on link-02... Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on link-08... Mounting gfs /dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on link-08... Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on link-02... Mounting gfs /dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on link-02... Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on link-08... Mounting gfs /dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on link-08... Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on link-02... Mounting gfs /dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on link-02... Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on link-08... Mounting gfs /dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on link-08... Writing files to gfs on... link-02: qarsh root@link-02 "/usr/tests/sts/bin/checkit -w /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 -f /t_secondary_non_synced_core_log_2_legs_3_mirrors_1 -n 100" checkit starting with: CREATE Num files: 100 Random Seed: 27873 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 Working dir: /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 link-08: qarsh root@link-08 "/usr/tests/sts/bin/checkit -w /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 -f /t_secondary_non_synced_core_log_2_legs_3_mirrors_1 -n 100" checkit starting with: CREATE Num files: 100 Random Seed: 31297 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 Working dir: /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 Writing files to gfs on... link-02: qarsh root@link-02 "/usr/tests/sts/bin/checkit -w /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 -f /t_secondary_non_synced_core_log_2_legs_3_mirrors_2 -n 100" checkit starting with: CREATE Num files: 100 Random Seed: 27880 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 Working dir: /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 link-08: qarsh root@link-08 "/usr/tests/sts/bin/checkit -w /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 -f /t_secondary_non_synced_core_log_2_legs_3_mirrors_2 -n 100" checkit starting with: CREATE Num files: 100 Random Seed: 31313 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 Working dir: /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 Writing files to gfs on... link-02: qarsh root@link-02 "/usr/tests/sts/bin/checkit -w /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 -f /t_secondary_non_synced_core_log_2_legs_3_mirrors_3 -n 100" checkit starting with: CREATE Num files: 100 Random Seed: 27882 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 Working dir: /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 link-08: qarsh root@link-08 "/usr/tests/sts/bin/checkit -w /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 -f /t_secondary_non_synced_core_log_2_legs_3_mirrors_3 -n 100" checkit starting with: CREATE Num files: 100 Random Seed: 31315 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 Working dir: /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 Disabling device sdd on link-02 Disabling device sdd on link-08 Attempting I/O to cause mirror down conversion(s) on link-08 10+0 records in 10+0 records out Verifying the down conversion from mirror(s) to linear(s) /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error [DEADLOCK] LINK-02: Jul 12 13:21:21 link-02 lvm[10694]: Processing: vgreduce --config devices{ignore_suspended_devices=1} --removemissing helter_skelter Jul 12 13:21:21 link-02 lvm[10694]: O_DIRECT will be used Jul 12 13:21:21 link-02 lvm[10694]: Setting global/locking_type to 3 Jul 12 13:21:21 link-02 lvm[10694]: Cluster locking selected. Jul 12 13:21:21 link-02 lvm[10694]: Finding volume group "helter_skelter" Jul 12 13:21:21 link-02 lvm[10694]: Locking V_helter_skelter at 0x4 Jul 12 13:21:22 link-02 lvm[10694]: device-mapper: waitevent ioctl failed: Interrupted system call Jul 12 13:21:22 link-02 lvm[10694]: No longer monitoring mirror device helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 for events LINK-08: [...] Jul 12 13:27:56 link-08 kernel: dm-cmirror: Setting recovering region out-of-sync: 1597/m6d87J5P/1 Jul 12 13:27:56 link-08 kernel: dm-cmirror: Assigning recovery work to 1/m6d87J5P: 1598 Jul 12 13:27:56 link-08 kernel: dm-cmirror: Client received resync work: 1598/m6d87J5P Jul 12 13:27:56 link-08 kernel: device-mapper: Write error during recovery (error = 0x1) Jul 12 13:27:56 link-08 kernel: device-mapper: recovery failed on region 1598 Jul 12 13:27:56 link-08 kernel: dm-cmirror: Setting recovering region out-of-sync: 1598/m6d87J5P/1 Jul 12 13:27:56 link-08 kernel: dm-cmirror: Assigning recovery work to 1/m6d87J5P: 1599 Jul 12 13:27:56 link-08 kernel: dm-cmirror: Client received resync work: 1599/m6d87J5P Jul 12 13:27:56 link-08 kernel: device-mapper: Write error during recovery (error = 0x1) Jul 12 13:27:56 link-08 kernel: device-mapper: recovery failed on region 1599 Jul 12 13:27:56 link-08 kernel: dm-cmirror: Setting recovering region out-of-sync: 1599/m6d87J5P/1 Jul 12 13:29:31 link-08 lvm[6742]: Error locking on node link-08: Command timed out Jul 12 13:29:31 link-08 lvm[6742]: Failed to lock fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sda1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sda1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sdb1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdb1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sdc1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdc1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sde1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sde1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sdf1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdf1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sdg1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdg1 Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata from /dev/sdh1 header at 4096 Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdh1 Jul 12 13:29:31 link-08 lvm[6742]: <backtrace> Jul 12 13:29:31 link-08 lvm[6742]: Locking V_helter_skelter at 0x6 [root@link-02 tmp]# dmsetup ls --tree helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 (253:10) ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1 (253:9) │ └─ (8:49) └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0 (253:8) └─ (8:33) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 (253:7) ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1 (253:6) │ └─ (8:49) └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0 (253:5) └─ (8:33) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 (253:4) ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1 (253:3) │ └─ (8:49) └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0 (253:2) └─ (8:33) [root@link-08 tmp]# dmsetup ls --tree helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 (253:10) ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1 (253:9) └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0 (253:8) └─ (8:33) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 (253:7) ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1 (253:6) └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0 (253:5) └─ (8:33) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 (253:4) ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1 (253:3) └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0 (253:2) └─ (8:33) [root@link-02 tmp]# dmsetup ls helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1 (253, 9) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0 (253, 8) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1 (253, 6) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 (253, 10) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0 (253, 5) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 (253, 7) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1 (253, 3) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 (253, 4) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0 (253, 2) [root@link-08 tmp]# dmsetup ls helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1 (253, 9) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0 (253, 8) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1 (253, 6) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 (253, 10) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0 (253, 5) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 (253, 7) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1 (253, 3) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 (253, 4) helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0 (253, 2) [root@link-02 tmp]# dmsetup status helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3: 0 1638400 mirror 2 253:8 253:9 791/1600 1 AD 1 clustered_core helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2: 0 1638400 mirror 2 253:5 253:6 758/1600 1 AD 1 clustered_core helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1: 0 1638400 mirror 2 253:2 253:3 906/1600 1 AD 1 clustered_core helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0: 0 1638400 linear [root@link-08 tmp]# dmsetup status helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1: 0 1638400 error helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1: 0 1638400 error helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3: 0 1638400 mirror 2 253:8 253:9 791/1600 1 AD 1 clustered_core helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0: 0 1638400 linear helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2: 0 1638400 mirror 2 253:5 253:6 758/1600 1 AD 1 clustered_core helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1: 0 1638400 error helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1: 0 1638400 mirror 2 253:2 253:3 906/1600 1 AD 1 clustered_core helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0: 0 1638400 linear Version-Release number of selected component (if applicable): 2.6.9-55.8.ELsmp lvm2-cluster-2.02.21-7.el4 cmirror-kernel-2.6.9-32.0
More info: [root@link-02 tmp]# ./lvm_backtraces.pl Backtrace for lvs (30054): helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 is SUSPENDED helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0 is SUSPENDED helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1 is SUSPENDED Jul 12 13:21:21 link-02 lvm[10694]: Setting global/locking_type to 3 Jul 12 13:21:21 link-02 lvm[10694]: Cluster locking selected. Jul 12 13:21:21 link-02 lvm[10694]: Finding volume group "helter_skelter" Jul 12 13:21:21 link-02 lvm[10694]: Locking V_helter_skelter at 0x4 Jul 12 13:21:22 link-02 lvm[10694]: device-mapper: waitevent ioctl failed: Interrupted system call Jul 12 13:21:22 link-02 lvm[10694]: No longer monitoring mirror device helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 for events Jul 12 13:40:41 link-02 dhclient: DHCPREQUEST on eth0 to 10.15.89.100 port 67 Jul 12 13:40:41 link-02 dhclient: DHCPACK from 10.15.89.100 Jul 12 13:40:41 link-02 dhclient: bound to 10.15.89.152 -- renewal in 8638 seconds. Jul 12 15:12:55 link-02 sshd(pam_unix)[2310]: session opened for user root by root(uid=0) Jul 12 15:50:38 link-02 kernel: dm-cmirror: LOG INFO: Jul 12 15:50:38 link-02 kernel: dm-cmirror: uuid: LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6 Jul 12 15:50:38 link-02 kernel: dm-cmirror: uuid_ref : 1 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?region_count: 1600 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?sync_count : 0 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?sync_search : 0 Jul 12 15:50:38 link-02 kernel: dm-cmirror: in_sync : NO Jul 12 15:50:38 link-02 kernel: dm-cmirror: suspended : NO Jul 12 15:50:38 link-02 kernel: dm-cmirror: server_id : [[link-08]] Jul 12 15:50:38 link-02 kernel: dm-cmirror: server_valid: YES Jul 12 15:50:38 link-02 kernel: dm-cmirror: LOG INFO: Jul 12 15:50:38 link-02 kernel: dm-cmirror: uuid: LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P Jul 12 15:50:38 link-02 kernel: dm-cmirror: uuid_ref : 1 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?region_count: 1600 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?sync_count : 0 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?sync_search : 0 Jul 12 15:50:38 link-02 kernel: dm-cmirror: in_sync : NO Jul 12 15:50:38 link-02 kernel: dm-cmirror: suspended : NO Jul 12 15:50:38 link-02 kernel: dm-cmirror: server_id : [[link-08]] Jul 12 15:50:38 link-02 kernel: dm-cmirror: server_valid: YES Jul 12 15:50:38 link-02 kernel: dm-cmirror: LOG INFO: Jul 12 15:50:38 link-02 kernel: dm-cmirror: uuid: LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim Jul 12 15:50:38 link-02 kernel: dm-cmirror: uuid_ref : 1 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?region_count: 1600 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?sync_count : 0 Jul 12 15:50:38 link-02 kernel: dm-cmirror: ?sync_search : 0 Jul 12 15:50:38 link-02 kernel: dm-cmirror: in_sync : NO Jul 12 15:50:38 link-02 kernel: dm-cmirror: suspended : YES Jul 12 15:50:38 link-02 kernel: dm-cmirror: server_id : [[link-08]] Jul 12 15:50:38 link-02 kernel: dm-cmirror: server_valid: YES DLM lockspace 'clvmd' Resource 0000010022914378 (parent 0000000000000000). Name (len=16) "V_helter_skelter" Local Copy, Master is node [[link-08]] Granted Queue Conversion Queue Waiting Queue 00030068 -- (PW) Master: 00030070 LQ: 0,0x0 Resource 00000100373d3048 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P" Local Copy, Master is node [[link-08]] Granted Queue 000201d2 CR Master: 00030136 Conversion Queue Waiting Queue Resource 00000100373d3178 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6" Master Copy Granted Queue 000101c3 CR Remote: [[link-08]] 000303f6 00020200 CR Conversion Queue Waiting Queue Resource 00000100373d3e88 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim" Local Copy, Master is node [[link-08]] Granted Queue 00030393 CR Master: 00020193 Conversion Queue Waiting Queue checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 logwatch.Fen22081 [root@link-08 tmp]# ./lvm_backtraces.pl Backtrace for lvs (1032): #1 0x000000000044efd3 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags" is not available.) at locking/cluster_locking.c:115 #2 0x000000000044f2c6 in _lock_resource (cmd=Variable "cmd" is not available.) at locking/cluster_locking.c:410 #3 0x000000000043b646 in _lock_vol (cmd=0x684240, resource=0x7fbfffb550 "VolGroup00", flags=33) at locking/locking.c:237 ####################################### # _lock_vol flags = LCK_VG | LCK_HOLD | LCK_READ ####################################### #4 0x000000000043b859 in lock_vol (cmd=0x684240, vol=0x69df48 "VolGroup00", flags=33) at locking/locking.c:270 #5 0x000000000041e87f in process_each_lv (cmd=0x684240, argc=Variable "argc" is not available.) at toollib.c:324 #6 0x000000000041cd7a in _report (cmd=0x684240, argc=0, argv=0x7fbffffa70, report_type=Variable "report_type" is not available.) at reporter.c:329 #7 0x0000000000414710 in lvm_run_command (cmd=0x684240, argc=0, argv=0x7fbffffa70) at lvmcmdline.c:935 #8 0x0000000000415492 in lvm2_main (argc=1, argv=0x7fbffffa68, is_static=Variable "is_static" is not available.) at lvmcmdline.c:1423 #9 0x000000344d61c3fb in __libc_start_main () from /lib64/tls/libc.so.6 #10 0x000000000040c42a in _start () #11 0x0000007fbffffa58 in ?? () #12 0x000000000000001c in ?? () #13 0x0000000000000001 in ?? () #14 0x0000007fbffffc35 in ?? () Backtrace for lvs (31359): #1 0x000000000044efd3 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags" is not available.) at locking/cluster_locking.c:115 #2 0x000000000044f2c6 in _lock_resource (cmd=Variable "cmd" is not available.) at locking/cluster_locking.c:410 #3 0x000000000043b646 in _lock_vol (cmd=0x684240, resource=0x7fbfffb7c0 "helter_skelter", flags=33) at locking/locking.c:237 ####################################### # _lock_vol flags = LCK_VG | LCK_HOLD | LCK_READ ####################################### #4 0x000000000043b859 in lock_vol (cmd=0x684240, vol=0x69dd98 "helter_skelter", flags=33) at locking/locking.c:270 #5 0x000000000041e87f in process_each_lv (cmd=0x684240, argc=Variable "argc" is not available.) at toollib.c:324 #6 0x000000000041cd7a in _report (cmd=0x684240, argc=0, argv=0x7fbffffce0, report_type=Variable "report_type" is not available.) at reporter.c:329 #7 0x0000000000414710 in lvm_run_command (cmd=0x684240, argc=0, argv=0x7fbffffce0) at lvmcmdline.c:935 #8 0x0000000000415492 in lvm2_main (argc=1, argv=0x7fbffffcd8, is_static=Variable "is_static" is not available.) at lvmcmdline.c:1423 #9 0x000000344d61c3fb in __libc_start_main () from /lib64/tls/libc.so.6 #10 0x000000000040c42a in _start () #11 0x0000007fbffffcc8 in ?? () #12 0x000000000000001c in ?? () #13 0x0000000000000001 in ?? () #14 0x0000007fbffffe99 in ?? () Jul 12 15:48:40 link-08 last message repeated 21 times Jul 12 15:49:43 link-08 last message repeated 21 times Jul 12 15:50:46 link-08 last message repeated 21 times Jul 12 15:51:49 link-08 last message repeated 21 times Jul 12 15:52:52 link-08 last message repeated 21 times Jul 12 15:53:55 link-08 last message repeated 21 times Jul 12 15:54:58 link-08 last message repeated 21 times Jul 12 15:56:01 link-08 last message repeated 21 times Jul 12 15:57:04 link-08 last message repeated 21 times Jul 12 15:58:07 link-08 last message repeated 21 times Jul 12 15:58:52 link-08 last message repeated 15 times Jul 12 15:58:53 link-08 kernel: dm-cmirror: LOG INFO: Jul 12 15:58:53 link-08 kernel: dm-cmirror: uuid: LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6 Jul 12 15:58:53 link-08 kernel: dm-cmirror: uuid_ref : 1 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?region_count: 1600 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?sync_count : 791 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?sync_search : 904 Jul 12 15:58:53 link-08 kernel: dm-cmirror: in_sync : NO Jul 12 15:58:53 link-08 kernel: dm-cmirror: suspended : NO Jul 12 15:58:53 link-08 kernel: dm-cmirror: server_id : [[link-08]] Jul 12 15:58:53 link-08 kernel: dm-cmirror: server_valid: YES Jul 12 15:58:53 link-08 kernel: dm-cmirror: LOG INFO: Jul 12 15:58:53 link-08 kernel: dm-cmirror: uuid: LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P Jul 12 15:58:53 link-08 kernel: dm-cmirror: uuid_ref : 1 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?region_count: 1600 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?sync_count : 758 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?sync_search : 1600 Jul 12 15:58:53 link-08 kernel: dm-cmirror: in_sync : NO Jul 12 15:58:53 link-08 kernel: dm-cmirror: suspended : NO Jul 12 15:58:53 link-08 kernel: dm-cmirror: server_id : [[link-08]] Jul 12 15:58:53 link-08 kernel: dm-cmirror: server_valid: YES Jul 12 15:58:53 link-08 kernel: dm-cmirror: LOG INFO: Jul 12 15:58:53 link-08 kernel: dm-cmirror: uuid: LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim Jul 12 15:58:53 link-08 kernel: dm-cmirror: uuid_ref : 1 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?region_count: 1600 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?sync_count : 906 Jul 12 15:58:53 link-08 kernel: dm-cmirror: ?sync_search : 921 Jul 12 15:58:53 link-08 kernel: dm-cmirror: in_sync : NO Jul 12 15:58:53 link-08 kernel: dm-cmirror: suspended : NO Jul 12 15:58:53 link-08 kernel: dm-cmirror: server_id : [[link-08]] Jul 12 15:58:53 link-08 kernel: dm-cmirror: server_valid: YES Jul 12 15:58:55 link-08 qarshd[31358]: Nothing to do DLM lockspace 'clvmd' Resource 000001000f9513b8 (parent 0000000000000000). Name (len=16) "V_helter_skelter" Master Copy LVB: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Granted Queue 000203fb PR Conversion Queue Waiting Queue 00030070 -- (PW) Remote: [[link-02]] 00030068 LQ: 0,0x0 Resource 000001000fb25af8 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnLlY2RGXzkuwIzsXr8PKM69IDWqaRC0RQ" Master Copy Granted Queue 000300ad CR Conversion Queue Waiting Queue Resource 000001000fb25048 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P" Master Copy Granted Queue 00030136 CR Remote: [[link-02]] 000201d2 00050365 CR Conversion Queue Waiting Queue Resource 000001000fb25178 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6" Local Copy, Master is node [[link-02]] Granted Queue 000303f6 CR Master: 000101c3 Conversion Queue Waiting Queue Resource 000001000fb25c28 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnxeEKade0UDQpyylNBaDuRGvTAokv4IaT" Master Copy Granted Queue 0003020d CR Conversion Queue Waiting Queue Resource 000001000fb25d58 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPn670Urzezd4GsF71VERA5K05vcb19Dbbm" Master Copy Granted Queue 000500de CR Conversion Queue Waiting Queue Resource 000001000fb25e88 (parent 0000000000000000). Name (len=64) "Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim" Master Copy Granted Queue 0003021a PW 00020193 CR Remote: [[link-02]] 00030393 Conversion Queue Waiting Queue Resource 00000100333a1b28 (parent 0000000000000000). Name (len=12) "V_VolGroup00" Master Copy Granted Queue 00040228 PR Conversion Queue Waiting Queue
Was able to reproduce this today. After the down conversion: [root@link-02 ~]# dmsetup ls --tree helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2 (253:9) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_1 (253:8) │ └─ (8:65) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_0 (253:7) │ └─ (8:17) └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mlog (253:6) └─ (8:81) helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1 (253:5) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_1 (253:4) │ └─ (8:65) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_0 (253:3) │ └─ (8:17) └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mlog (253:2) └─ (8:81) [root@link-08 ~]# dmsetup ls --tree helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2 (253:9) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_1 (253:8) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_0 (253:7) │ └─ (8:17) └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mlog (253:6) └─ (8:81) helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1 (253:5) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_1 (253:4) ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_0 (253:3) │ └─ (8:17) └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mlog (253:2) └─ (8:81)
Reproduced again, seems to only happen during the failure of a secondary legs of non synced mirrors.
I'll try to reproduce. BTW, it's very hard to read the comments... especially due to the retardedly long name of the LVs. If this happens in the future, could you do: $> dmsetup ls --tree | sed s/helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors/short/ That would make it a hell of alot easier to read. Or, put it in a file and I can run it through sed/grep.
I'm not sure if "retardedly" is an adjective or an adverb or either?
seems to work fine with only one mirror... The key must be having multiple mirrors that fail at once (3 mirrors above). You seem to have only "triggered" one mirror though... The write to trigger down-convert happens on only one node and to only one of the mirrors, right?
having trouble reproducing, even with 3 mirrors. Can we try to distill the test down to some simple operations? Here's what I'm trying: 1) create 3 corelog cluster mirrors on same devices 2) fail secondary device (on all three mirrors) while mirrors are syncing 3) do I/O to one of the mirrors to trigger down convert **) no FS's, no nominal I/O until step 3 Do you have an idea as to how often this bug hits? 100%, 50%, ?
I've address issues related to this in other bugs. This should be fixed. Marking modified. assigned -> modified
Marking this bz verified as it hasn't been seen in almost 4 months, will reopen if seen again.
The Red Hat Cluster Suite product is past end-of-life; closing.