RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 743112 - multiple failed cmirror devices are unable to be replaced due to locking errors
Summary: multiple failed cmirror devices are unable to be replaced due to locking errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.2
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 746254
Blocks: 748554
TreeView+ depends on / blocked
 
Reported: 2011-10-03 21:05 UTC by Corey Marthaler
Modified: 2011-12-06 17:03 UTC (History)
10 users (show)

Fixed In Version: lvm2-2.02.87-6.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 17:03:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1522 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2011-12-06 00:50:10 UTC

Description Corey Marthaler 2011-10-03 21:05:22 UTC
Description of problem:
Scenario: Kill multiple legs of synced 3 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_multiple_legs_3legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdf1 /dev/sdd1 /dev/sdg1
* log devices:        /dev/sdb1
* no MDA devices:     
* failpv(s):          /dev/sdf1 /dev/sdg1
* failnode(s):        taft-01 taft-02 taft-03 taft-04
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 2 -n syncd_multiple_legs_3legs_1 -L 300M helter_skelter /dev/sdf1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-150

PV=/dev/sdg1
        syncd_multiple_legs_3legs_1_mimage_2: 5.1
PV=/dev/sdf1
        syncd_multiple_legs_3legs_1_mimage_0: 5.1
PV=/dev/sdg1
        syncd_multiple_legs_3legs_1_mimage_2: 5.1
PV=/dev/sdf1
        syncd_multiple_legs_3legs_1_mimage_0: 5.1

Waiting until all mirrors become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating gfs2 on top of mirror(s) on taft-01...
Mounting mirrored gfs2 filesystems on taft-01...
Mounting mirrored gfs2 filesystems on taft-02...
Mounting mirrored gfs2 filesystems on taft-03...
Mounting mirrored gfs2 filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
        ---- taft-02 ----
        ---- taft-03 ----
        ---- taft-04 ----

Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
        ---- taft-02 ----
        ---- taft-03 ----
        ---- taft-04 ----

Disabling device sdf on taft-01
Disabling device sdg on taft-01
Disabling device sdf on taft-02
Disabling device sdg on taft-02
Disabling device sdf on taft-03
Disabling device sdg on taft-03
Disabling device sdf on taft-04
Disabling device sdg on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.34491 s, 122 MB/s
Verifying current sanity of lvm after the failure
  Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
  Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
Verifying FAILED device /dev/sdf1 is *NOT* in the volume(s)
  Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
  Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
  Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
  Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
olog: 1
Verifying LOG device(s) /dev/sdb1 *ARE* in the mirror(s)
  Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
  Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
Verifying LEG device /dev/sdd1 *IS* in the volume(s)
  Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
  Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of syncd_multiple_legs_3legs_1_mimage_0 on:  taft-01 taft-02
syncd_multiple_legs_3legs_1_mimage_0 on taft-02 should now exist
** ARE THERE ENOUGH SPARE PVS TO ALLOCATE? **


# BEFORE THE FAILURE:
[root@taft-02 ~]# lvs -a -o +devices
  LV                                     Attr   LSize   Log                              Copy%  Devices
  syncd_multiple_legs_3legs_1            mwi-ao 300.00m syncd_multiple_legs_3legs_1_mlog 100.00 syncd_multiple_legs_3legs_1_mimage_0(0),syncd_multiple_legs_3legs_1_mimage_1(0),syncd_multiple_legs_3legs_1_mimage_2(0)
  [syncd_multiple_legs_3legs_1_mimage_0] iwi-ao 300.00m                                         /dev/sdf1(0)
  [syncd_multiple_legs_3legs_1_mimage_1] iwi-ao 300.00m                                         /dev/sdd1(0)
  [syncd_multiple_legs_3legs_1_mimage_2] iwi-ao 300.00m                                         /dev/sdg1(0)
  [syncd_multiple_legs_3legs_1_mlog]     lwi-ao   4.00m                                         /dev/sdb1(0)

# AFTER THE FAILURE:
[root@taft-02 ~]# lvs -a -o +devices
  Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
  Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
  LV                                     Attr   LSize   Log Copy%  Devices
  syncd_multiple_legs_3legs_1            mwi-ao 300.00m     100.00 syncd_multiple_legs_3legs_1_mimage_2(0),syncd_multiple_legs_3legs_1_mimage_3(0)
  syncd_multiple_legs_3legs_1_mimage_0   -wi--- 300.00m            unknown device(0)
  syncd_multiple_legs_3legs_1_mimage_1   vwi-a- 300.00m 
  [syncd_multiple_legs_3legs_1_mimage_2] iwi-ao 300.00m            /dev/sdd1(0)
  [syncd_multiple_legs_3legs_1_mimage_3] iwi-ao 300.00m            /dev/sdh1(0)
  syncd_multiple_legs_3legs_1_mlog       -wi-s-   4.00m            /dev/sdb1(0)


[root@taft-02 ~]# grep lvm\\[ /var/log/messages
Oct  3 15:46:10 taft-02 lvm[1257]: helter_skelter-syncd_multiple_legs_3legs_1 is now in-sync.
Oct  3 15:51:09 taft-02 lvm[2433]: Monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events.
Oct  3 15:51:20 taft-02 lvm[2433]: No longer monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events.
Oct  3 15:53:15 taft-02 lvm[2433]: Monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events.
Oct  3 15:53:40 taft-02 lvm[2433]: helter_skelter-syncd_multiple_legs_3legs_1 is now in-sync.
Oct  3 15:54:16 taft-02 lvm[2433]: Primary mirror device 253:4 has failed (D).
Oct  3 15:54:16 taft-02 lvm[2433]: Secondary mirror device 253:6 has failed (D).
Oct  3 15:54:16 taft-02 lvm[2433]: Device failure in helter_skelter-syncd_multiple_legs_3legs_1.
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 145669554176: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 145669664768: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 145669554176: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 145669664768: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 0: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 512 at 4096: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error
Oct  3 15:54:17 taft-02 lvm[2433]: Couldn't find device with uuid GF8iiQ-zZQw-aLHL-GXEx-OZSx-tM1m-663Iu3.
Oct  3 15:54:17 taft-02 lvm[2433]: Couldn't find device with uuid XNUu12-W3jG-5ID1-YBhQ-2cC7-9pDx-6eEcWF.
Oct  3 15:54:20 taft-02 lvm[2433]: Error locking on node taft-01: LV helter_skelter/syncd_multiple_legs_3legs_1_mimage_0 in use: not deactivating
Oct  3 15:54:20 taft-02 lvm[2433]: Mirror status: 2 of 3 images failed.
Oct  3 15:54:20 taft-02 lvm[2433]: Trying to up-convert to 3 images, 1 logs.
Oct  3 15:54:20 taft-02 lvm[2433]: Unable to create LV syncd_multiple_legs_3legs_1_mlog in Volume Group helter_skelter: name already in use.
Oct  3 15:54:20 taft-02 lvm[2433]: Failed to create mirror log.
Oct  3 15:54:20 taft-02 lvm[2433]: Trying to up-convert to 2 images, 1 logs.
Oct  3 15:54:20 taft-02 lvm[2433]: Unable to create LV syncd_multiple_legs_3legs_1_mlog in Volume Group helter_skelter: name already in use.
Oct  3 15:54:20 taft-02 lvm[2433]: Failed to create mirror log.
Oct  3 15:54:20 taft-02 lvm[2433]: Trying to up-convert to 2 images, 0 logs.
Oct  3 15:54:20 taft-02 lvm[2433]: Monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events.
Oct  3 15:54:20 taft-02 lvm[2433]: Another thread is handling an event. Waiting...
Oct  3 15:54:21 taft-02 lvm[2433]: WARNING: Failed to replace 1 of 3 images in volume syncd_multiple_legs_3legs_1
Oct  3 15:54:21 taft-02 lvm[2433]: WARNING: Failed to replace 1 of 1 logs in volume syncd_multiple_legs_3legs_1
Oct  3 15:54:21 taft-02 lvm[2433]: helter_skelter/syncd_multiple_legs_3legs_1: Converted: 1.3%
Oct  3 15:54:36 taft-02 lvm[2433]: helter_skelter/syncd_multiple_legs_3legs_1: Converted: 100.0%
Oct  3 15:54:51 taft-02 lvm[2433]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_3legs_1 finished successfully.
Oct  3 15:54:51 taft-02 lvm[2433]: helter_skelter-syncd_multiple_legs_3legs_1 is now in-sync.
Oct  3 15:54:53 taft-02 lvm[2433]: No longer monitoring mirror device helter_skelter-syncd_multiple_legs_3legs_1 for events.


Version-Release number of selected component (if applicable):
2.6.32-203.el6.x86_64

lvm2-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-libs-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-cluster-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
cmirror-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011


How reproducible:
Everytime

Comment 1 Corey Marthaler 2011-10-03 21:27:29 UTC
This test case works in single machine mode. It appears to be the cmirror locking failures that cause this issue.

Comment 2 Corey Marthaler 2011-10-03 22:10:57 UTC
This test case also appears to work when only one device is failed.

Comment 5 Jonathan Earl Brassow 2011-10-14 14:25:22 UTC
Requires a kernel change too...  The fix will have to wait for more than just userspace changes (which are already committed upstream.

Comment 7 Jonathan Earl Brassow 2011-10-14 20:17:33 UTC
git commit id for the LVM changes are:
681ceb16d80e9cfbb768a05b5296e23da427b9cd
c954b73149f992c1a29c6ce5abc8ac4b7ecaf4aa

Comment 9 Jonathan Earl Brassow 2011-10-17 16:38:46 UTC
If this bug is reproducible in a single machine test case, then the proposed patches - while still necessary - will not yield a solution to this bug.

Comment 13 Corey Marthaler 2011-11-03 21:59:52 UTC
Fix verified in the lvm latest rpms with latest kernel.

2.6.32-216.el6.x86_64

lvm2-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
lvm2-libs-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
lvm2-cluster-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
device-mapper-libs-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
device-mapper-event-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
device-mapper-event-libs-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
cmirror-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011

Comment 14 errata-xmlrpc 2011-12-06 17:03:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1522.html


Note You need to log in before you can comment on or make changes to this bug.