Bug 248056 - down conversion appeared to fail after leg failure resulting in deadlocked clvmd
Summary: down conversion appeared to fail after leg failure resulting in deadlocked clvmd
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cmirror-kernel
Version: 4
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-12 20:51 UTC by Corey Marthaler
Modified: 2013-09-23 15:31 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 15:31:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2007-07-12 20:51:06 UTC
Here is the test case that was run on the 2-node cluster (link-02 and link-08):

Senario: Kill secondary leg of non synced core log 2 leg mirror

****** Mirror hash info for this scenario ******
* name:      fail_secondary_non_synced_core_log_2_legs_3_mirrors
* sync:      0
* mirrors:   3
* disklog:   0
* failpv:    /dev/sdd1
* legs:      2
* pvs:       /dev/sdc1 /dev/sdd1
************************************************

Creating mirror(s) on link-08...
qarsh root@link-08 lvcreate --corelog -m 1 -n
fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 -L 800M helterdc1:0-1000
/dev/sdd1:0-1000
Creating mirror(s) on link-08...
qarsh root@link-08 lvcreate --corelog -m 1 -n
fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 -L 800M helterdc1:0-1000
/dev/sdd1:0-1000
Creating mirror(s) on link-08...
qarsh root@link-08 lvcreate --corelog -m 1 -n
fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 -L 800M helterdc1:0-1000
/dev/sdd1:0-1000

mirror(s) currently at 2.00% synced right now

Creating gfs on top of mirror(s) on link-08...
Creating gfs on top of mirror(s) on link-08...
Creating gfs on top of mirror(s) on link-08...
Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on
link-02...
Mounting gfs
/dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on
link-02...
Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on
link-08...
Mounting gfs
/dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 on
link-08...
Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on
link-02...
Mounting gfs
/dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on
link-02...
Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on
link-08...
Mounting gfs
/dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 on
link-08...
Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on
link-02...
Mounting gfs
/dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on
link-02...
Creating mnt point /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on
link-08...
Mounting gfs
/dev/helter_skelter/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 on
link-08...
Writing files to gfs on...
        link-02:
qarsh root@link-02 "/usr/tests/sts/bin/checkit -w
/mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 -f
/t_secondary_non_synced_core_log_2_legs_3_mirrors_1 -n 100"
checkit starting with:
CREATE
Num files:          100
Random Seed:        27873
Verify XIOR Stream:
/tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1
        link-08:
qarsh root@link-08 "/usr/tests/sts/bin/checkit -w
/mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 -f
/t_secondary_non_synced_core_log_2_legs_3_mirrors_1 -n 100"
checkit starting with:
CREATE
Num files:          100
Random Seed:        31297
Verify XIOR Stream:
/tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_1
Writing files to gfs on...
        link-02:
qarsh root@link-02 "/usr/tests/sts/bin/checkit -w
/mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 -f
/t_secondary_non_synced_core_log_2_legs_3_mirrors_2 -n 100"
checkit starting with:
CREATE
Num files:          100
Random Seed:        27880
Verify XIOR Stream:
/tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2
        link-08:
qarsh root@link-08 "/usr/tests/sts/bin/checkit -w
/mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 -f
/t_secondary_non_synced_core_log_2_legs_3_mirrors_2 -n 100"
checkit starting with:
CREATE
Num files:          100
Random Seed:        31313
Verify XIOR Stream:
/tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_2
Writing files to gfs on...
        link-02:
qarsh root@link-02 "/usr/tests/sts/bin/checkit -w
/mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 -f
/t_secondary_non_synced_core_log_2_legs_3_mirrors_3 -n 100"
checkit starting with:
CREATE
Num files:          100
Random Seed:        27882
Verify XIOR Stream:
/tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3
        link-08:
qarsh root@link-08 "/usr/tests/sts/bin/checkit -w
/mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 -f
/t_secondary_non_synced_core_log_2_legs_3_mirrors_3 -n 100"
checkit starting with:
CREATE
Num files:          100
Random Seed:        31315
Verify XIOR Stream:
/tmp/checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_non_synced_core_log_2_legs_3_mirrors_3
Disabling device sdd on link-02
Disabling device sdd on link-08

Attempting I/O to cause mirror down conversion(s) on link-08
10+0 records in
10+0 records out
Verifying the down conversion from mirror(s) to linear(s)
  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error
[DEADLOCK]

LINK-02:
Jul 12 13:21:21 link-02 lvm[10694]: Processing: vgreduce --config
devices{ignore_suspended_devices=1} --removemissing helter_skelter
Jul 12 13:21:21 link-02 lvm[10694]: O_DIRECT will be used
Jul 12 13:21:21 link-02 lvm[10694]: Setting global/locking_type to 3
Jul 12 13:21:21 link-02 lvm[10694]: Cluster locking selected.
Jul 12 13:21:21 link-02 lvm[10694]: Finding volume group "helter_skelter"
Jul 12 13:21:21 link-02 lvm[10694]: Locking V_helter_skelter at 0x4
Jul 12 13:21:22 link-02 lvm[10694]: device-mapper: waitevent ioctl failed:
Interrupted system call
Jul 12 13:21:22 link-02 lvm[10694]: No longer monitoring mirror device
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 for events



LINK-08:
[...]
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Setting recovering region
out-of-sync: 1597/m6d87J5P/1
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Assigning recovery work to
1/m6d87J5P: 1598
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Client received resync work:
1598/m6d87J5P
Jul 12 13:27:56 link-08 kernel: device-mapper: Write error during recovery
(error = 0x1)
Jul 12 13:27:56 link-08 kernel: device-mapper: recovery failed on region 1598
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Setting recovering region
out-of-sync: 1598/m6d87J5P/1
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Assigning recovery work to
1/m6d87J5P: 1599
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Client received resync work:
1599/m6d87J5P
Jul 12 13:27:56 link-08 kernel: device-mapper: Write error during recovery
(error = 0x1)
Jul 12 13:27:56 link-08 kernel: device-mapper: recovery failed on region 1599
Jul 12 13:27:56 link-08 kernel: dm-cmirror: Setting recovering region
out-of-sync: 1599/m6d87J5P/1
Jul 12 13:29:31 link-08 lvm[6742]: Error locking on node link-08: Command timed out
Jul 12 13:29:31 link-08 lvm[6742]: Failed to lock
fail_secondary_non_synced_core_log_2_legs_3_mirrors_1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sda1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sda1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sdb1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdb1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sdc1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdc1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sde1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sde1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sdf1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdf1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sdg1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdg1
Jul 12 13:29:31 link-08 lvm[6742]: Wiping pre-committed helter_skelter metadata
from /dev/sdh1 header at 4096
Jul 12 13:29:31 link-08 lvm[6742]: Closed /dev/sdh1
Jul 12 13:29:31 link-08 lvm[6742]: <backtrace>
Jul 12 13:29:31 link-08 lvm[6742]: Locking V_helter_skelter at 0x6



[root@link-02 tmp]# dmsetup ls --tree
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 (253:10)
 ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1
(253:9)
 │  └─ (8:49)
 └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0
(253:8)
    └─ (8:33)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 (253:7)
 ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1
(253:6)
 │  └─ (8:49)
 └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0
(253:5)
    └─ (8:33)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 (253:4)
 ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1
(253:3)
 │  └─ (8:49)
 └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0
(253:2)
    └─ (8:33)


[root@link-08 tmp]# dmsetup ls --tree
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3 (253:10)
 ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1
(253:9)
 └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0
(253:8)
    └─ (8:33)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2 (253:7)
 ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1
(253:6)
 └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0
(253:5)
    └─ (8:33)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 (253:4)
 ├─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1
(253:3)
 └─helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0
(253:2)
    └─ (8:33)

[root@link-02 tmp]# dmsetup ls
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1  
(253, 9)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0  
(253, 8)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1  
(253, 6)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3    (253, 10)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0  
(253, 5)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2    (253, 7)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1  
(253, 3)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1    (253, 4)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0  
(253, 2)


[root@link-08 tmp]# dmsetup ls
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1  
(253, 9)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0  
(253, 8)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1  
(253, 6)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3    (253, 10)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0  
(253, 5)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2    (253, 7)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1  
(253, 3)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1    (253, 4)
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0  
(253, 2)


[root@link-02 tmp]# dmsetup status
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3: 0 1638400
mirror 2 253:8 253:9 791/1600 1 AD 1 clustered_core
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2: 0 1638400
mirror 2 253:5 253:6 758/1600 1 AD 1 clustered_core
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1: 0 1638400
mirror 2 253:2 253:3 906/1600 1 AD 1 clustered_core
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0: 0
1638400 linear


[root@link-08 tmp]# dmsetup status
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_1: 0
1638400 error
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3_mimage_0: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_1: 0
1638400 error
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_3: 0 1638400
mirror 2 253:8 253:9 791/1600 1 AD 1 clustered_core
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2_mimage_0: 0
1638400 linear
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_2: 0 1638400
mirror 2 253:5 253:6 758/1600 1 AD 1 clustered_core
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1: 0
1638400 error
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1: 0 1638400
mirror 2 253:2 253:3 906/1600 1 AD 1 clustered_core
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0: 0
1638400 linear



Version-Release number of selected component (if applicable):
2.6.9-55.8.ELsmp
lvm2-cluster-2.02.21-7.el4
cmirror-kernel-2.6.9-32.0

Comment 1 Corey Marthaler 2007-07-12 20:53:31 UTC
More info:

[root@link-02 tmp]# ./lvm_backtraces.pl
Backtrace for lvs (30054):
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 is SUSPENDED
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_0 is
SUSPENDED
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1_mimage_1 is
SUSPENDED
Jul 12 13:21:21 link-02 lvm[10694]: Setting global/locking_type to 3
Jul 12 13:21:21 link-02 lvm[10694]: Cluster locking selected.
Jul 12 13:21:21 link-02 lvm[10694]: Finding volume group "helter_skelter"
Jul 12 13:21:21 link-02 lvm[10694]: Locking V_helter_skelter at 0x4
Jul 12 13:21:22 link-02 lvm[10694]: device-mapper: waitevent ioctl failed:
Interrupted system call
Jul 12 13:21:22 link-02 lvm[10694]: No longer monitoring mirror device
helter_skelter-fail_secondary_non_synced_core_log_2_legs_3_mirrors_1 for events
Jul 12 13:40:41 link-02 dhclient: DHCPREQUEST on eth0 to 10.15.89.100 port 67
Jul 12 13:40:41 link-02 dhclient: DHCPACK from 10.15.89.100
Jul 12 13:40:41 link-02 dhclient: bound to 10.15.89.152 -- renewal in 8638 seconds.
Jul 12 15:12:55 link-02 sshd(pam_unix)[2310]: session opened for user root by
root(uid=0)
Jul 12 15:50:38 link-02 kernel: dm-cmirror: LOG INFO:
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   uuid:
LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?region_count: 1600
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?sync_count  : 0
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?sync_search : 0
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   in_sync     : NO
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   suspended   : NO
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   server_id   : [[link-08]]
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   server_valid: YES
Jul 12 15:50:38 link-02 kernel: dm-cmirror: LOG INFO:
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   uuid:
LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?region_count: 1600
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?sync_count  : 0
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?sync_search : 0
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   in_sync     : NO
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   suspended   : NO
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   server_id   : [[link-08]]
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   server_valid: YES
Jul 12 15:50:38 link-02 kernel: dm-cmirror: LOG INFO:
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   uuid:
LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?region_count: 1600
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?sync_count  : 0
Jul 12 15:50:38 link-02 kernel: dm-cmirror:  ?sync_search : 0
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   in_sync     : NO
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   suspended   : YES
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   server_id   : [[link-08]]
Jul 12 15:50:38 link-02 kernel: dm-cmirror:   server_valid: YES
DLM lockspace 'clvmd'

Resource 0000010022914378 (parent 0000000000000000). Name (len=16)
"V_helter_skelter"
Local Copy, Master is node [[link-08]]
Granted Queue
Conversion Queue
Waiting Queue
00030068 -- (PW) Master:     00030070  LQ: 0,0x0

Resource 00000100373d3048 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P"
Local Copy, Master is node [[link-08]]
Granted Queue
000201d2 CR Master:     00030136
Conversion Queue
Waiting Queue

Resource 00000100373d3178 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6"
Master Copy
Granted Queue
000101c3 CR Remote:   [[link-08]] 000303f6
00020200 CR
Conversion Queue
Waiting Queue

Resource 00000100373d3e88 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim"
Local Copy, Master is node [[link-08]]
Granted Queue
00030393 CR Master:     00020193
Conversion Queue
Waiting Queue



checkit_fail_secondary_non_synced_core_log_2_legs_3_mirrors_3  logwatch.Fen22081
[root@link-08 tmp]# ./lvm_backtraces.pl
Backtrace for lvs (1032):
#1  0x000000000044efd3 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags"
is not available.)    at locking/cluster_locking.c:115
#2  0x000000000044f2c6 in _lock_resource (cmd=Variable "cmd" is not available.)
   at locking/cluster_locking.c:410
#3  0x000000000043b646 in _lock_vol (cmd=0x684240,     resource=0x7fbfffb550
"VolGroup00", flags=33) at locking/locking.c:237
        #######################################
        # _lock_vol flags = LCK_VG | LCK_HOLD | LCK_READ
        #######################################
#4  0x000000000043b859 in lock_vol (cmd=0x684240, vol=0x69df48 "VolGroup00",   
 flags=33) at locking/locking.c:270
#5  0x000000000041e87f in process_each_lv (cmd=0x684240, argc=Variable "argc" is
not available.)    at toollib.c:324
#6  0x000000000041cd7a in _report (cmd=0x684240, argc=0, argv=0x7fbffffa70,    
report_type=Variable "report_type" is not available.) at reporter.c:329
#7  0x0000000000414710 in lvm_run_command (cmd=0x684240, argc=0,    
argv=0x7fbffffa70) at lvmcmdline.c:935
#8  0x0000000000415492 in lvm2_main (argc=1, argv=0x7fbffffa68,
is_static=Variable "is_static" is not available.)    at lvmcmdline.c:1423
#9  0x000000344d61c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#10 0x000000000040c42a in _start ()
#11 0x0000007fbffffa58 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x0000007fbffffc35 in ?? ()

Backtrace for lvs (31359):
#1  0x000000000044efd3 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags"
is not available.)    at locking/cluster_locking.c:115
#2  0x000000000044f2c6 in _lock_resource (cmd=Variable "cmd" is not available.)
   at locking/cluster_locking.c:410
#3  0x000000000043b646 in _lock_vol (cmd=0x684240,     resource=0x7fbfffb7c0
"helter_skelter", flags=33) at locking/locking.c:237
        #######################################
        # _lock_vol flags = LCK_VG | LCK_HOLD | LCK_READ
        #######################################
#4  0x000000000043b859 in lock_vol (cmd=0x684240,     vol=0x69dd98
"helter_skelter", flags=33) at locking/locking.c:270
#5  0x000000000041e87f in process_each_lv (cmd=0x684240, argc=Variable "argc" is
not available.)    at toollib.c:324
#6  0x000000000041cd7a in _report (cmd=0x684240, argc=0, argv=0x7fbffffce0,    
report_type=Variable "report_type" is not available.) at reporter.c:329
#7  0x0000000000414710 in lvm_run_command (cmd=0x684240, argc=0,    
argv=0x7fbffffce0) at lvmcmdline.c:935
#8  0x0000000000415492 in lvm2_main (argc=1, argv=0x7fbffffcd8,
is_static=Variable "is_static" is not available.)    at lvmcmdline.c:1423
#9  0x000000344d61c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#10 0x000000000040c42a in _start ()
#11 0x0000007fbffffcc8 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x0000007fbffffe99 in ?? ()

Jul 12 15:48:40 link-08 last message repeated 21 times
Jul 12 15:49:43 link-08 last message repeated 21 times
Jul 12 15:50:46 link-08 last message repeated 21 times
Jul 12 15:51:49 link-08 last message repeated 21 times
Jul 12 15:52:52 link-08 last message repeated 21 times
Jul 12 15:53:55 link-08 last message repeated 21 times
Jul 12 15:54:58 link-08 last message repeated 21 times
Jul 12 15:56:01 link-08 last message repeated 21 times
Jul 12 15:57:04 link-08 last message repeated 21 times
Jul 12 15:58:07 link-08 last message repeated 21 times
Jul 12 15:58:52 link-08 last message repeated 15 times
Jul 12 15:58:53 link-08 kernel: dm-cmirror: LOG INFO:
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   uuid:
LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   uuid_ref    : 1
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?region_count: 1600
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?sync_count  : 791
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?sync_search : 904
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   in_sync     : NO
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   suspended   : NO
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   server_id   : [[link-08]]
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   server_valid: YES
Jul 12 15:58:53 link-08 kernel: dm-cmirror: LOG INFO:
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   uuid:
LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   uuid_ref    : 1
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?region_count: 1600
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?sync_count  : 758
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?sync_search : 1600
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   in_sync     : NO
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   suspended   : NO
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   server_id   : [[link-08]]
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   server_valid: YES
Jul 12 15:58:53 link-08 kernel: dm-cmirror: LOG INFO:
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   uuid:
LVM-Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   uuid_ref    : 1
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?region_count: 1600
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?sync_count  : 906
Jul 12 15:58:53 link-08 kernel: dm-cmirror:  ?sync_search : 921
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   in_sync     : NO
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   suspended   : NO
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   server_id   : [[link-08]]
Jul 12 15:58:53 link-08 kernel: dm-cmirror:   server_valid: YES
Jul 12 15:58:55 link-08 qarshd[31358]: Nothing to do
DLM lockspace 'clvmd'

Resource 000001000f9513b8 (parent 0000000000000000). Name (len=16)
"V_helter_skelter"
Master Copy
LVB: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Granted Queue
000203fb PR
Conversion Queue
Waiting Queue
00030070 -- (PW) Remote:   [[link-02]] 00030068  LQ: 0,0x0

Resource 000001000fb25af8 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnLlY2RGXzkuwIzsXr8PKM69IDWqaRC0RQ"
Master Copy
Granted Queue
000300ad CR
Conversion Queue
Waiting Queue

Resource 000001000fb25048 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnFlKnH5BHNM2WiJiwh1BWczHqm6d87J5P"
Master Copy
Granted Queue
00030136 CR Remote:   [[link-02]] 000201d2
00050365 CR
Conversion Queue
Waiting Queue

Resource 000001000fb25178 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnvDDqCBqO9flsbnVgNBs19x6sWgwnTZq6"
Local Copy, Master is node [[link-02]]
Granted Queue
000303f6 CR Master:     000101c3
Conversion Queue
Waiting Queue

Resource 000001000fb25c28 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPnxeEKade0UDQpyylNBaDuRGvTAokv4IaT"
Master Copy
Granted Queue
0003020d CR
Conversion Queue
Waiting Queue

Resource 000001000fb25d58 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPn670Urzezd4GsF71VERA5K05vcb19Dbbm"
Master Copy
Granted Queue
000500de CR
Conversion Queue
Waiting Queue

Resource 000001000fb25e88 (parent 0000000000000000). Name (len=64)
"Upkxa2WuLZzzUE1Z6fIR0YKB9iYpxnPncdAeKShZb89OS5yV5khhicenj0QIeFim"
Master Copy
Granted Queue
0003021a PW
00020193 CR Remote:   [[link-02]] 00030393
Conversion Queue
Waiting Queue

Resource 00000100333a1b28 (parent 0000000000000000). Name (len=12) "V_VolGroup00"
Master Copy
Granted Queue
00040228 PR
Conversion Queue
Waiting Queue



Comment 2 Corey Marthaler 2007-07-13 18:45:45 UTC
Was able to reproduce this today.

After the down conversion:

[root@link-02 ~]# dmsetup ls --tree
helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2 (253:9)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_1 (253:8)
 │  └─ (8:65)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_0 (253:7)
 │  └─ (8:17)
 └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mlog (253:6)
    └─ (8:81)
helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1 (253:5)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_1 (253:4)
 │  └─ (8:65)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_0 (253:3)
 │  └─ (8:17)
 └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mlog (253:2)
    └─ (8:81)


[root@link-08 ~]# dmsetup ls --tree
helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2 (253:9)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_1 (253:8)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mimage_0 (253:7)
 │  └─ (8:17)
 └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_2_mlog (253:6)
    └─ (8:81)
helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1 (253:5)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_1 (253:4)
 ├─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mimage_0 (253:3)
 │  └─ (8:17)
 └─helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors_1_mlog (253:2)
    └─ (8:81)



Comment 3 Corey Marthaler 2007-07-13 21:16:57 UTC
Reproduced again, seems to only happen during the failure of a secondary legs of
non synced mirrors.

Comment 4 Jonathan Earl Brassow 2007-07-27 19:59:51 UTC
I'll try to reproduce.

BTW, it's very hard to read the comments... especially due to the retardedly
long name of the LVs.  If this happens in the future, could you do:
$> dmsetup ls --tree | sed
s/helter_skelter-fail_secondary_non_synced_2_legs_2_mirrors/short/

That would make it a hell of alot easier to read.  Or, put it in a file and I
can run it through sed/grep.


Comment 5 Corey Marthaler 2007-07-27 22:15:55 UTC
I'm not sure if "retardedly" is an adjective or an adverb or either?

Comment 6 Jonathan Earl Brassow 2007-07-31 15:40:39 UTC
seems to work fine with only one mirror...  The key must be having multiple mirrors that fail at once (3 
mirrors above).

You seem to have only "triggered" one mirror though...  The write to trigger down-convert happens on 
only one node and to only one of the mirrors, right?


Comment 7 Jonathan Earl Brassow 2007-07-31 15:54:08 UTC
having trouble reproducing, even with 3 mirrors.  Can we try to distill the test down to some simple 
operations?  Here's what I'm trying:

1) create 3 corelog cluster mirrors on same devices
2) fail secondary device (on all three mirrors) while mirrors are syncing
3) do I/O to one of the mirrors to trigger down convert 
**) no FS's, no nominal I/O until step 3

Do you have an idea as to how often this bug hits?  100%, 50%, ?

Comment 8 Jonathan Earl Brassow 2007-09-28 15:29:26 UTC
I've address issues related to this in other bugs.  This should be fixed. 
Marking modified.

assigned -> modified


Comment 9 Corey Marthaler 2007-11-08 17:16:46 UTC
Marking this bz verified as it hasn't been seen in almost 4 months, will reopen
if seen again.

Comment 11 Lon Hohberger 2013-09-23 15:31:56 UTC
The Red Hat Cluster Suite product is past end-of-life; closing.


Note You need to log in before you can comment on or make changes to this bug.