Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 684343

Summary: deadlock during multiple leg device failures of an exclusively activated cmirror containing a snapshot
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WONTFIX QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: agk, coughlan, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-27 18:51:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 756082    

Description Corey Marthaler 2011-03-11 20:51:49 UTC
Description of problem:
Scenario: Kill multiple legs of synced 4 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_multiple_legs_4legs_1
* sync:               1
* leg devices:        /dev/sdf1 /dev/sde1 /dev/sdg1 /dev/sdb1
* log devices:        /dev/sdc1
* no MDA devices:     
* failpv(s):          /dev/sdf1 /dev/sdb1
* failnode(s):        taft-02
* additional snap:    /dev/sde1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate -m 3 -n syncd_multiple_legs_4legs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sdc1:0-150

EXCLUSIVELY ACTIVATING CMIRROR on taft-02
Creating a snapshot volume of each of the mirrors

PV=/dev/sdb1
        syncd_multiple_legs_4legs_1_mimage_3: 3.1
PV=/dev/sdf1
        syncd_multiple_legs_4legs_1_mimage_0: 4
PV=/dev/sdb1
        syncd_multiple_legs_4legs_1_mimage_3: 3.1
PV=/dev/sdf1
        syncd_multiple_legs_4legs_1_mimage_0: 4

Waiting until all mirrors become fully syncd...
   0/1 mirror(s) are fully synced: ( 85.83% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating gfs2 on top of mirror(s) on taft-02...
Mounting mirrored gfs2 filesystems on taft-02...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-02 ----

Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-02 ----

Disabling device sdf on taft-02
Disabling device sdb on taft-02

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.310101 s, 135 MB/s


# before the failure
[root@taft-02 ~]# lvs -a -o +devices
 LV         VG             Attr   LSize   Origin                      Snap%  Devices
 hs_snap1   helter_skelter swi-a- 252.00m syncd_multiple_legs_4legs_1  13.24 /dev/sde1(150)

 LV                                     VG             Attr   LSize   Log                              Copy%  Devices
 syncd_multiple_legs_4legs_1            helter_skelter owi-ao 600.00m syncd_multiple_legs_4legs_1_mlog 100.00 syncd_multiple_legs_4legs_1_mimage_0(0),syncd_multiple_legs_4legs_1_mimage_1(0),syncd_multiple_legs_4legs_1_mimage_2(0),syncd_multiple_legs_4legs_1_mimage_3(0)
 [syncd_multiple_legs_4legs_1_mimage_0] helter_skelter iwi-ao 600.00m                                         /dev/sdf1(0)
 [syncd_multiple_legs_4legs_1_mimage_1] helter_skelter iwi-ao 600.00m                                         /dev/sde1(0)
 [syncd_multiple_legs_4legs_1_mimage_2] helter_skelter iwi-ao 600.00m                                         /dev/sdg1(0)
 [syncd_multiple_legs_4legs_1_mimage_3] helter_skelter iwi-ao 600.00m                                         /dev/sdb1(0)
 [syncd_multiple_legs_4legs_1_mlog]     helter_skelter lwi-ao   4.00m                                         /dev/sdc1(0)

# after the failure
[root@taft-02 ~]# lvs -a -o +devices
[STUCK]



Version-Release number of selected component (if applicable):
2.6.32-94.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.31.el6    BUILT: Wed Jan 26 05:39:15 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

Comment 1 Corey Marthaler 2011-03-11 21:00:06 UTC
Doesn't look like anything died on any of these nodes.

[root@taft-02 ~]# ps -ef | grep dmeventd
root      1738  1722  0 14:52 pts/1    00:00:00 grep dmeventd
root      3085     1  0 Mar10 ?        00:03:34 /sbin/dmeventd
[root@taft-02 ~]# ps -ef | grep cmirrord
root      1740  1722  0 14:52 pts/1    00:00:00 grep cmirrord
root     30959     1  0 13:36 ?        00:00:05 cmirrord
[root@taft-02 ~]# ps -ef | grep udev
root       576     1  0 Mar10 ?        00:00:30 /sbin/udevd -d
root      1586   576  0 14:05 ?        00:00:00 /sbin/udevd -d
root      1587   576  0 14:05 ?        00:00:00 /sbin/udevd -d
root      1742  1722  0 14:52 pts/1    00:00:00 grep udev


The main mirror vol is marked as SUSPENDED, but none of the sub devices are.
Name:              helter_skelter-syncd_multiple_legs_4legs_1
State:             SUSPENDED
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-0Yyo33rKjwXbNuZEDXnFij2HauvM3B601s2HPfMxQUnup8I3JOJVMHPPeFBK8tr8

[root@taft-02 ~]# dmsetup ls
helter_skelter-syncd_multiple_legs_4legs_1_mimage_2     (253, 6)
helter_skelter-syncd_multiple_legs_4legs_1      (253, 8)
helter_skelter-syncd_multiple_legs_4legs_1_mimage_1     (253, 5)
helter_skelter-syncd_multiple_legs_4legs_1_mlog (253, 2)
helter_skelter-syncd_multiple_legs_4legs_1-real (253, 10)
helter_skelter-hs_snap1-cow     (253, 11)
helter_skelter-hs_snap1 (253, 9)


Mar 11 14:05:00 taft-02 qarshd[1225]: Running cmdline: echo offline > /sys/block/sdf/device/state &
Mar 11 14:05:00 taft-02 qarshd[1228]: Running cmdline: echo offline > /sys/block/sdb/device/state &
[...]
Mar 11 14:05:05 taft-02 kernel: sd 3:0:0:5: rejecting I/O to offline device
Mar 11 14:05:05 taft-02 kernel: __ratelimit: 264 callbacks suppressed
Mar 11 14:05:05 taft-02 kernel: device-mapper: raid1: Read failure on mirror device 253:4.  Trying alternative device.
Mar 11 14:05:05 taft-02 lvm[3085]: Primary mirror device 253:4 read failed.
Mar 11 14:05:05 taft-02 lvm[3085]: helter_skelter-syncd_multiple_legs_4legs_1-real is now in-sync.
[...]
Mar 11 14:05:06 taft-02 qarshd[1234]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_multiple_legs_4legs_1/ddfile count=10 bs=4M
[...]
Mar 11 14:05:07 taft-02 lvm[3085]: Primary mirror device 253:4 has failed (D).
Mar 11 14:05:07 taft-02 lvm[3085]: Secondary mirror device 253:7 has failed (D).
Mar 11 14:05:07 taft-02 lvm[3085]: Device failure in helter_skelter-syncd_multiple_legs_4legs_1-real.
Mar 11 14:05:08 taft-02 lvm[3085]: Couldn't find device with uuid f8oLNc-7pQZ-16h3-qjrt-pFJA-VRNp-xB2stB.
Mar 11 14:05:08 taft-02 lvm[3085]: Couldn't find device with uuid sUUk1o-5939-F18h-D8Jx-tNdY-f4NO-dHvYsd.
Mar 11 14:05:12 taft-02 lvm[3085]: Mirror status: 2 of 4 images failed.
Mar 11 14:05:14 taft-02 lvm[3085]: device-mapper: waitevent ioctl failed: Interrupted system call
Mar 11 14:05:14 taft-02 lvm[3085]: Another thread is handling an event. Waiting...
Mar 11 14:05:20 taft-02 lvm[3085]: Monitoring mirror device helter_skelter-syncd_multiple_legs_4legs_1-real for events.
Mar 11 14:05:20 taft-02 lvm[3085]: Another thread is handling an event. Waiting...
Mar 11 14:07:25 taft-02 lvm[3085]: Error locking on node taft-02: Command timed out
Mar 11 14:07:25 taft-02 lvm[3085]: Problem reactivating syncd_multiple_legs_4legs_1
Mar 11 14:09:18 taft-02 kernel: INFO: task clvmd:31012 blocked for more than 120 seconds.
Mar 11 14:09:18 taft-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 11 14:09:18 taft-02 kernel: clvmd         D ffff880200c05880     0 31012      1 0x00000080
Mar 11 14:09:18 taft-02 kernel: ffff880215065c28 0000000000000086 ffff880200b96a70 ffff880028216a00
Mar 11 14:09:18 taft-02 kernel: ffff880200000000 0000000000000003 ffff880215065bc8 ffff880200b96a70
Mar 11 14:09:18 taft-02 kernel: ffff8802134df0e8 ffff880215065fd8 0000000000010558 ffff8802134df0e8
Mar 11 14:09:18 taft-02 kernel: Call Trace:
Mar 11 14:09:18 taft-02 kernel: [<ffffffff814c66e5>] rwsem_down_failed_common+0x95/0x1d0
Mar 11 14:09:18 taft-02 kernel: [<ffffffff8108a106>] ? autoremove_wake_function+0x16/0x40
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0007e90>] ? dev_suspend+0x0/0x240 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffff814c6843>] rwsem_down_write_failed+0x23/0x30
Mar 11 14:09:18 taft-02 kernel: [<ffffffff8125f3f3>] call_rwsem_down_write_failed+0x13/0x20
Mar 11 14:09:18 taft-02 kernel: [<ffffffff814c5d42>] ? down_write+0x32/0x40
Mar 11 14:09:18 taft-02 kernel: [<ffffffff8119d7c9>] thaw_bdev+0x99/0x130
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0000b5a>] unlock_fs+0x2a/0x50 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0002596>] dm_resume+0xd6/0xf0 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa000804c>] dev_suspend+0x1bc/0x240 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0007e90>] ? dev_suspend+0x0/0x240 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0008923>] ctl_ioctl+0x1a3/0x240 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffffa00089d3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Mar 11 14:09:18 taft-02 kernel: [<ffffffff81178922>] vfs_ioctl+0x22/0xa0
Mar 11 14:09:18 taft-02 kernel: [<ffffffff81178ac4>] do_vfs_ioctl+0x84/0x580
Mar 11 14:09:18 taft-02 kernel: [<ffffffff81179041>] sys_ioctl+0x81/0xa0
Mar 11 14:09:18 taft-02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Mar 11 14:09:18 taft-02 kernel: INFO: task xdoio:1221 blocked for more than 120 seconds.
[...]

Comment 2 Corey Marthaler 2011-03-11 21:46:26 UTC
This issue is easily reproducible.

./helter_skelter -l /home/msp/cmarthal/work/rhel6/sts-root -r /usr/tests/sts-rhel6.1 -R ../../var/share/resource_files/taft.xml -e kill_multiple_legs_synced_4_legs -E taft-02

Comment 6 Jonathan Earl Brassow 2012-01-04 16:22:28 UTC
Snapshots of mirrors (or exclusively activated cmirrors) will not be supported.  Snapshots of "raid1" segment type will be.  I will most likely close this bug WONTFIX, but I'll try to take a look and see what's going on first - capacity permitting.

Comment 8 Jonathan Earl Brassow 2012-04-27 18:51:27 UTC
We do not support snapshots of 'mirror' segment type.