RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 596367 - when allocating a new leg (due to single device failure), the currently healthy log is also replaced
Summary: when allocating a new leg (due to single device failure), the currently healt...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 682648
TreeView+ depends on / blocked
 
Reported: 2010-05-26 16:49 UTC by Corey Marthaler
Modified: 2011-03-07 06:01 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When an LVM mirror suffers a device failure, a two-stage recovery takes place. The first stage involves removing the failed devices. This can result in the mirror being reduced to a linear device. The second stage — if configured to do so by the administrator — is to attempt to replace any of the failed devices. Note, however, that there is no guarantee that the second stage will choose devices previously in-use by the mirror that had not been part of the failure if others are available.
Clone Of:
: 682648 (view as bug list)
Environment:
Last Closed: 2010-11-22 23:14:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2010-05-26 16:49:31 UTC
Description of problem:
This is a regression in behavior. 

Before the primary leg device failure:
[root@taft-01 ~]# lvs -a -o +devices
  LV                               VG             Attr   LSize   Log                        Copy%  Devices
  syncd_primary_2legs_1            helter_skelter mwi-ao 600.00m syncd_primary_2legs_1_mlog 100.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
  [syncd_primary_2legs_1_mimage_0] helter_skelter iwi-ao 600.00m                                   /dev/sde1(0)
  [syncd_primary_2legs_1_mimage_1] helter_skelter iwi-ao 600.00m                                   /dev/sdb1(0)
  [syncd_primary_2legs_1_mlog]     helter_skelter lwi-ao   4.00m                                   /dev/sdh1(0)


After the primary leg device failure:
[root@taft-01 ~]# lvs -a -o +devices
  /dev/sde1: open failed: No such device or address
  Couldn't find device with uuid HMUsDw-n2Sj-RVKJ-Sdxd-H4RC-Jjdp-8LB0mO.
  LV                               VG             Attr   LSize   Log                        Copy%  Devices
  syncd_primary_2legs_1            helter_skelter mwi-ao 600.00m                    syncd_primary_2legs_1_mlog  68.67         syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
  [syncd_primary_2legs_1_mimage_0] helter_skelter Iwi-ao 600.00m                                   /dev/sdb1(0)
  [syncd_primary_2legs_1_mimage_1] helter_skelter Iwi-ao 600.00m                                   /dev/sdh1(1)
  [syncd_primary_2legs_1_mlog]     helter_skelter lwi-ao   4.00m                                   /dev/sdc1(0)


Note how the "good" log now becomes the new secondary leg and a new log is allocated. This seems to be adding unnecessary risk to the device failure allocation process.


Version-Release number of selected component (if applicable):
2.6.32-28.el6bz590851_v1.x86_64

lvm2-2.02.65-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
lvm2-libs-2.02.65-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
lvm2-cluster-2.02.65-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-libs-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-event-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
device-mapper-event-libs-1.02.48-1.el6    BUILT: Tue May 18 04:46:06 CDT 2010
cmirror-2.02.65-1.el6    BUILT: Wed May 19 11:19:57 CDT 2010


How reproducible:
Often

Comment 4 Alasdair Kergon 2010-07-28 21:00:18 UTC
Well what's going on here?

mimage0 fails.

The code could:
  (1) Reduce the mirror to a linear device, then convert it back to a mirror.  This involves allocating a new mimage1 and a new log.  If you want it to keep the old log then it's the job of that code to remember where the log was and supply that to the allocation code, but I see no need for this.  Or just detach the log before reducing the mirror, and reattach it when creating the new mirror.

  (2) Move the mimage1 to the primary, mark the log as fully out-of-sync, and allocate a new secondary.  This keeps the log in the same place because it never removes it.


Does the code do (1) or (2)?  To see this effect, (1) I presume.  To change the behaviour, it would need to do (2).

There's nothing in the current code that says "logs will only be allocated from particular PVs".  There's nothing to fix in the allocation code here.

It would be nice to improve the behaviour, but how much of the "repair" code would have to be changed to do that?  And is it worth it?

Comment 5 Corey Marthaler 2010-07-28 21:23:53 UTC
From a QE perspective, issues like this where there are changes in behavior from one release to another, but never any requirements alerting us of changes to come make it difficult to plan for test development/test changes. 

If behavior is going to be changed, it needs to be written down ahead of time so that we can plan for it, instead of just seeing what behavior has changed after code is given to QE and they deal with test failures.

Comment 8 Jonathan Earl Brassow 2010-08-29 19:24:08 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When an LVM mirror suffers a device failure, a two-stage recovery takes place.  The first stage involves removing the failed devices.  This can result in the mirror being reduced to a linear device.  The second stage is to attempt to replace any of the failed devices, if configured to do so by the administrator.  There is no gaurentee that the second stage will choose devices previously in-use by the mirror that had not been part of the failure if others are available.

Comment 11 Ryan Lerch 2010-09-02 02:58:22 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,3 @@
-When an LVM mirror suffers a device failure, a two-stage recovery takes place.  The first stage involves removing the failed devices.  This can result in the mirror being reduced to a linear device.  The second stage is to attempt to replace any of the failed devices, if configured to do so by the administrator.  There is no gaurentee that the second stage will choose devices previously in-use by the mirror that had not been part of the failure if others are available.+When an LVM mirror suffers a device failure, a two-stage recovery takes place. 
+The first stage involves removing the failed devices.  This can result in the
+mirror being reduced to a linear device. The second stage — if configured to do so by the administrator — is to attempt to replace any of the failed devices. Note, however, that there is no guarantee that the second stage will choose devices previously in-use by the mirror that had not been part of the failure if others are available.

Comment 13 RHEL Program Management 2010-11-22 23:14:52 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.