Bug 743112
| Summary: | multiple failed cmirror devices are unable to be replaced due to locking errors | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> |
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.2 | CC: | agk, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, syeghiay, thornber, zkabelac |
| Target Milestone: | rc | Keywords: | Regression, TestBlocker |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.87-6.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 17:03:22 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 746254 | ||
| Bug Blocks: | 748554 | ||
This test case works in single machine mode. It appears to be the cmirror locking failures that cause this issue. This test case also appears to work when only one device is failed. Requires a kernel change too... The fix will have to wait for more than just userspace changes (which are already committed upstream. git commit id for the LVM changes are: 681ceb16d80e9cfbb768a05b5296e23da427b9cd c954b73149f992c1a29c6ce5abc8ac4b7ecaf4aa If this bug is reproducible in a single machine test case, then the proposed patches - while still necessary - will not yield a solution to this bug. Fix verified in the lvm latest rpms with latest kernel. 2.6.32-216.el6.x86_64 lvm2-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 lvm2-libs-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 lvm2-cluster-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 device-mapper-libs-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 device-mapper-event-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 device-mapper-event-libs-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 cmirror-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1522.html |
Description of problem: Scenario: Kill multiple legs of synced 3 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_multiple_legs_3legs_1 * sync: 1 * striped: 0 * leg devices: /dev/sdf1 /dev/sdd1 /dev/sdg1 * log devices: /dev/sdb1 * no MDA devices: * failpv(s): /dev/sdf1 /dev/sdg1 * failnode(s): taft-01 taft-02 taft-03 taft-04 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-04... taft-04: lvcreate -m 2 -n syncd_multiple_legs_3legs_1 -L 300M helter_skelter /dev/sdf1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-150 PV=/dev/sdg1 syncd_multiple_legs_3legs_1_mimage_2: 5.1 PV=/dev/sdf1 syncd_multiple_legs_3legs_1_mimage_0: 5.1 PV=/dev/sdg1 syncd_multiple_legs_3legs_1_mimage_2: 5.1 PV=/dev/sdf1 syncd_multiple_legs_3legs_1_mimage_0: 5.1 Waiting until all mirrors become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Creating gfs2 on top of mirror(s) on taft-01... Mounting mirrored gfs2 filesystems on taft-01... Mounting mirrored gfs2 filesystems on taft-02... Mounting mirrored gfs2 filesystems on taft-03... Mounting mirrored gfs2 filesystems on taft-04... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Disabling device sdf on taft-01 Disabling device sdg on taft-01 Disabling device sdf on taft-02 Disabling device sdg on taft-02 Disabling device sdf on taft-03 Disabling device sdg on taft-03 Disabling device sdf on taft-04 Disabling device sdg on taft-04 Attempting I/O to cause mirror down conversion(s) on taft-01 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.34491 s, 122 MB/s Verifying current sanity of lvm after the failure Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. Verifying FAILED device /dev/sdf1 is *NOT* in the volume(s) Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s) Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. olog: 1 Verifying LOG device(s) /dev/sdb1 *ARE* in the mirror(s) Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. Verifying LEG device /dev/sdd1 *IS* in the volume(s) Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. verify the newly allocated dm devices were added as a result of the failures Checking EXISTENCE of syncd_multiple_legs_3legs_1_mimage_0 on: taft-01 taft-02 syncd_multiple_legs_3legs_1_mimage_0 on taft-02 should now exist ** ARE THERE ENOUGH SPARE PVS TO ALLOCATE? ** # BEFORE THE FAILURE: [root@taft-02 ~]# lvs -a -o +devices LV Attr LSize Log Copy% Devices syncd_multiple_legs_3legs_1 mwi-ao 300.00m syncd_multiple_legs_3legs_1_mlog 100.00 syncd_multiple_legs_3legs_1_mimage_0(0),syncd_multiple_legs_3legs_1_mimage_1(0),syncd_multiple_legs_3legs_1_mimage_2(0) [syncd_multiple_legs_3legs_1_mimage_0] iwi-ao 300.00m /dev/sdf1(0) [syncd_multiple_legs_3legs_1_mimage_1] iwi-ao 300.00m /dev/sdd1(0) [syncd_multiple_legs_3legs_1_mimage_2] iwi-ao 300.00m /dev/sdg1(0) [syncd_multiple_legs_3legs_1_mlog] lwi-ao 4.00m /dev/sdb1(0) # AFTER THE FAILURE: [root@taft-02 ~]# lvs -a -o +devices Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. LV Attr LSize Log Copy% Devices syncd_multiple_legs_3legs_1 mwi-ao 300.00m 100.00 syncd_multiple_legs_3legs_1_mimage_2(0),syncd_multiple_legs_3legs_1_mimage_3(0) syncd_multiple_legs_3legs_1_mimage_0 -wi--- 300.00m unknown device(0) syncd_multiple_legs_3legs_1_mimage_1 vwi-a- 300.00m [syncd_multiple_legs_3legs_1_mimage_2] iwi-ao 300.00m /dev/sdd1(0) [syncd_multiple_legs_3legs_1_mimage_3] iwi-ao 300.00m /dev/sdh1(0) syncd_multiple_legs_3legs_1_mlog -wi-s- 4.00m /dev/sdb1(0) [root@taft-02 ~]# grep lvm\\[ /var/log/messages Oct 3 15:46:10 taft-02 lvm[1257]: helter_skelter-syncd_multiple_legs_3legs_1 is now in-sync. Oct 3 15:51:09 taft-02 lvm[2433]: Monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events. Oct 3 15:51:20 taft-02 lvm[2433]: No longer monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events. Oct 3 15:53:15 taft-02 lvm[2433]: Monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events. Oct 3 15:53:40 taft-02 lvm[2433]: helter_skelter-syncd_multiple_legs_3legs_1 is now in-sync. Oct 3 15:54:16 taft-02 lvm[2433]: Primary mirror device 253:4 has failed (D). Oct 3 15:54:16 taft-02 lvm[2433]: Secondary mirror device 253:6 has failed (D). Oct 3 15:54:16 taft-02 lvm[2433]: Device failure in helter_skelter-syncd_multiple_legs_3legs_1. Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 145669554176: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 145669664768: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 0: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 145669554176: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 145669664768: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 0: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 4096: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error Oct 3 15:54:17 taft-02 lvm[2433]: Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3. Oct 3 15:54:17 taft-02 lvm[2433]: Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF. Oct 3 15:54:20 taft-02 lvm[2433]: Error locking on node taft-01: LV helter_skelter/syncd_multiple_legs_3legs_1_mimage_0 in use: not deactivating Oct 3 15:54:20 taft-02 lvm[2433]: Mirror status: 2 of 3 images failed. Oct 3 15:54:20 taft-02 lvm[2433]: Trying to up-convert to 3 images, 1 logs. Oct 3 15:54:20 taft-02 lvm[2433]: Unable to create LV syncd_multiple_legs_3legs_1_mlog in Volume Group helter_skelter: name already in use. Oct 3 15:54:20 taft-02 lvm[2433]: Failed to create mirror log. Oct 3 15:54:20 taft-02 lvm[2433]: Trying to up-convert to 2 images, 1 logs. Oct 3 15:54:20 taft-02 lvm[2433]: Unable to create LV syncd_multiple_legs_3legs_1_mlog in Volume Group helter_skelter: name already in use. Oct 3 15:54:20 taft-02 lvm[2433]: Failed to create mirror log. Oct 3 15:54:20 taft-02 lvm[2433]: Trying to up-convert to 2 images, 0 logs. Oct 3 15:54:20 taft-02 lvm[2433]: Monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events. Oct 3 15:54:20 taft-02 lvm[2433]: Another thread is handling an event. Waiting... Oct 3 15:54:21 taft-02 lvm[2433]: WARNING: Failed to replace 1 of 3 images in volume syncd_multiple_legs_3legs_1 Oct 3 15:54:21 taft-02 lvm[2433]: WARNING: Failed to replace 1 of 1 logs in volume syncd_multiple_legs_3legs_1 Oct 3 15:54:21 taft-02 lvm[2433]: helter_skelter/syncd_multiple_legs_3legs_1: Converted: 1.3% Oct 3 15:54:36 taft-02 lvm[2433]: helter_skelter/syncd_multiple_legs_3legs_1: Converted: 100.0% Oct 3 15:54:51 taft-02 lvm[2433]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_3legs_1 finished successfully. Oct 3 15:54:51 taft-02 lvm[2433]: helter_skelter-syncd_multiple_legs_3legs_1 is now in-sync. Oct 3 15:54:53 taft-02 lvm[2433]: No longer monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events. Version-Release number of selected component (if applicable): 2.6.32-203.el6.x86_64 lvm2-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 lvm2-libs-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 lvm2-cluster-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 device-mapper-libs-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 device-mapper-event-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 device-mapper-event-libs-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 cmirror-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 How reproducible: Everytime