| Summary: | deadlock during multiple leg device failures of an exclusively activated cmirror containing a snapshot | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> |
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> |
| Status: | CLOSED WONTFIX | QA Contact: | Corey Marthaler <cmarthal> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.1 | CC: | agk, coughlan, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-04-27 18:51:27 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 756082 | ||
Doesn't look like anything died on any of these nodes. [root@taft-02 ~]# ps -ef | grep dmeventd root 1738 1722 0 14:52 pts/1 00:00:00 grep dmeventd root 3085 1 0 Mar10 ? 00:03:34 /sbin/dmeventd [root@taft-02 ~]# ps -ef | grep cmirrord root 1740 1722 0 14:52 pts/1 00:00:00 grep cmirrord root 30959 1 0 13:36 ? 00:00:05 cmirrord [root@taft-02 ~]# ps -ef | grep udev root 576 1 0 Mar10 ? 00:00:30 /sbin/udevd -d root 1586 576 0 14:05 ? 00:00:00 /sbin/udevd -d root 1587 576 0 14:05 ? 00:00:00 /sbin/udevd -d root 1742 1722 0 14:52 pts/1 00:00:00 grep udev The main mirror vol is marked as SUSPENDED, but none of the sub devices are. Name: helter_skelter-syncd_multiple_legs_4legs_1 State: SUSPENDED Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-0Yyo33rKjwXbNuZEDXnFij2HauvM3B601s2HPfMxQUnup8I3JOJVMHPPeFBK8tr8 [root@taft-02 ~]# dmsetup ls helter_skelter-syncd_multiple_legs_4legs_1_mimage_2 (253, 6) helter_skelter-syncd_multiple_legs_4legs_1 (253, 8) helter_skelter-syncd_multiple_legs_4legs_1_mimage_1 (253, 5) helter_skelter-syncd_multiple_legs_4legs_1_mlog (253, 2) helter_skelter-syncd_multiple_legs_4legs_1-real (253, 10) helter_skelter-hs_snap1-cow (253, 11) helter_skelter-hs_snap1 (253, 9) Mar 11 14:05:00 taft-02 qarshd[1225]: Running cmdline: echo offline > /sys/block/sdf/device/state & Mar 11 14:05:00 taft-02 qarshd[1228]: Running cmdline: echo offline > /sys/block/sdb/device/state & [...] Mar 11 14:05:05 taft-02 kernel: sd 3:0:0:5: rejecting I/O to offline device Mar 11 14:05:05 taft-02 kernel: __ratelimit: 264 callbacks suppressed Mar 11 14:05:05 taft-02 kernel: device-mapper: raid1: Read failure on mirror device 253:4. Trying alternative device. Mar 11 14:05:05 taft-02 lvm[3085]: Primary mirror device 253:4 read failed. Mar 11 14:05:05 taft-02 lvm[3085]: helter_skelter-syncd_multiple_legs_4legs_1-real is now in-sync. [...] Mar 11 14:05:06 taft-02 qarshd[1234]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_multiple_legs_4legs_1/ddfile count=10 bs=4M [...] Mar 11 14:05:07 taft-02 lvm[3085]: Primary mirror device 253:4 has failed (D). Mar 11 14:05:07 taft-02 lvm[3085]: Secondary mirror device 253:7 has failed (D). Mar 11 14:05:07 taft-02 lvm[3085]: Device failure in helter_skelter-syncd_multiple_legs_4legs_1-real. Mar 11 14:05:08 taft-02 lvm[3085]: Couldn't find device with uuid f8oLNc-7pQZ-16h3-qjrt-pFJA-VRNp-xB2stB. Mar 11 14:05:08 taft-02 lvm[3085]: Couldn't find device with uuid sUUk1o-5939-F18h-D8Jx-tNdY-f4NO-dHvYsd. Mar 11 14:05:12 taft-02 lvm[3085]: Mirror status: 2 of 4 images failed. Mar 11 14:05:14 taft-02 lvm[3085]: device-mapper: waitevent ioctl failed: Interrupted system call Mar 11 14:05:14 taft-02 lvm[3085]: Another thread is handling an event. Waiting... Mar 11 14:05:20 taft-02 lvm[3085]: Monitoring mirror device helter_skelter-syncd_multiple_legs_4legs_1-real for events. Mar 11 14:05:20 taft-02 lvm[3085]: Another thread is handling an event. Waiting... Mar 11 14:07:25 taft-02 lvm[3085]: Error locking on node taft-02: Command timed out Mar 11 14:07:25 taft-02 lvm[3085]: Problem reactivating syncd_multiple_legs_4legs_1 Mar 11 14:09:18 taft-02 kernel: INFO: task clvmd:31012 blocked for more than 120 seconds. Mar 11 14:09:18 taft-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 11 14:09:18 taft-02 kernel: clvmd D ffff880200c05880 0 31012 1 0x00000080 Mar 11 14:09:18 taft-02 kernel: ffff880215065c28 0000000000000086 ffff880200b96a70 ffff880028216a00 Mar 11 14:09:18 taft-02 kernel: ffff880200000000 0000000000000003 ffff880215065bc8 ffff880200b96a70 Mar 11 14:09:18 taft-02 kernel: ffff8802134df0e8 ffff880215065fd8 0000000000010558 ffff8802134df0e8 Mar 11 14:09:18 taft-02 kernel: Call Trace: Mar 11 14:09:18 taft-02 kernel: [<ffffffff814c66e5>] rwsem_down_failed_common+0x95/0x1d0 Mar 11 14:09:18 taft-02 kernel: [<ffffffff8108a106>] ? autoremove_wake_function+0x16/0x40 Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0007e90>] ? dev_suspend+0x0/0x240 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffff814c6843>] rwsem_down_write_failed+0x23/0x30 Mar 11 14:09:18 taft-02 kernel: [<ffffffff8125f3f3>] call_rwsem_down_write_failed+0x13/0x20 Mar 11 14:09:18 taft-02 kernel: [<ffffffff814c5d42>] ? down_write+0x32/0x40 Mar 11 14:09:18 taft-02 kernel: [<ffffffff8119d7c9>] thaw_bdev+0x99/0x130 Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0000b5a>] unlock_fs+0x2a/0x50 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0002596>] dm_resume+0xd6/0xf0 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffffa000804c>] dev_suspend+0x1bc/0x240 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0007e90>] ? dev_suspend+0x0/0x240 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffffa0008923>] ctl_ioctl+0x1a3/0x240 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffffa00089d3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] Mar 11 14:09:18 taft-02 kernel: [<ffffffff81178922>] vfs_ioctl+0x22/0xa0 Mar 11 14:09:18 taft-02 kernel: [<ffffffff81178ac4>] do_vfs_ioctl+0x84/0x580 Mar 11 14:09:18 taft-02 kernel: [<ffffffff81179041>] sys_ioctl+0x81/0xa0 Mar 11 14:09:18 taft-02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Mar 11 14:09:18 taft-02 kernel: INFO: task xdoio:1221 blocked for more than 120 seconds. [...] This issue is easily reproducible. ./helter_skelter -l /home/msp/cmarthal/work/rhel6/sts-root -r /usr/tests/sts-rhel6.1 -R ../../var/share/resource_files/taft.xml -e kill_multiple_legs_synced_4_legs -E taft-02 Snapshots of mirrors (or exclusively activated cmirrors) will not be supported. Snapshots of "raid1" segment type will be. I will most likely close this bug WONTFIX, but I'll try to take a look and see what's going on first - capacity permitting. We do not support snapshots of 'mirror' segment type. |
Description of problem: Scenario: Kill multiple legs of synced 4 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_multiple_legs_4legs_1 * sync: 1 * leg devices: /dev/sdf1 /dev/sde1 /dev/sdg1 /dev/sdb1 * log devices: /dev/sdc1 * no MDA devices: * failpv(s): /dev/sdf1 /dev/sdb1 * failnode(s): taft-02 * additional snap: /dev/sde1 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-02... taft-02: lvcreate -m 3 -n syncd_multiple_legs_4legs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sdc1:0-150 EXCLUSIVELY ACTIVATING CMIRROR on taft-02 Creating a snapshot volume of each of the mirrors PV=/dev/sdb1 syncd_multiple_legs_4legs_1_mimage_3: 3.1 PV=/dev/sdf1 syncd_multiple_legs_4legs_1_mimage_0: 4 PV=/dev/sdb1 syncd_multiple_legs_4legs_1_mimage_3: 3.1 PV=/dev/sdf1 syncd_multiple_legs_4legs_1_mimage_0: 4 Waiting until all mirrors become fully syncd... 0/1 mirror(s) are fully synced: ( 85.83% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Creating gfs2 on top of mirror(s) on taft-02... Mounting mirrored gfs2 filesystems on taft-02... Writing verification files (checkit) to mirror(s) on... ---- taft-02 ---- Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-02 ---- Disabling device sdf on taft-02 Disabling device sdb on taft-02 Attempting I/O to cause mirror down conversion(s) on taft-02 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.310101 s, 135 MB/s # before the failure [root@taft-02 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Devices hs_snap1 helter_skelter swi-a- 252.00m syncd_multiple_legs_4legs_1 13.24 /dev/sde1(150) LV VG Attr LSize Log Copy% Devices syncd_multiple_legs_4legs_1 helter_skelter owi-ao 600.00m syncd_multiple_legs_4legs_1_mlog 100.00 syncd_multiple_legs_4legs_1_mimage_0(0),syncd_multiple_legs_4legs_1_mimage_1(0),syncd_multiple_legs_4legs_1_mimage_2(0),syncd_multiple_legs_4legs_1_mimage_3(0) [syncd_multiple_legs_4legs_1_mimage_0] helter_skelter iwi-ao 600.00m /dev/sdf1(0) [syncd_multiple_legs_4legs_1_mimage_1] helter_skelter iwi-ao 600.00m /dev/sde1(0) [syncd_multiple_legs_4legs_1_mimage_2] helter_skelter iwi-ao 600.00m /dev/sdg1(0) [syncd_multiple_legs_4legs_1_mimage_3] helter_skelter iwi-ao 600.00m /dev/sdb1(0) [syncd_multiple_legs_4legs_1_mlog] helter_skelter lwi-ao 4.00m /dev/sdc1(0) # after the failure [root@taft-02 ~]# lvs -a -o +devices [STUCK] Version-Release number of selected component (if applicable): 2.6.32-94.el6.x86_64 lvm2-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 lvm2-libs-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 lvm2-cluster-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 udev-147-2.31.el6 BUILT: Wed Jan 26 05:39:15 CST 2011 device-mapper-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-libs-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-event-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-event-libs-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 cmirror-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011