Bug 693855

Summary: mirror device failure during HA lvm service relocation may cause service failure
Product: Red Hat Enterprise Linux 5 Reporter: Jonathan Earl Brassow <jbrassow>
Component: rgmanagerAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.6CC: cluster-maint, cmarthal, edamato, lhh, mjuricek, rmccabe
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rgmanager-2.0.52-32.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 683213 Environment:
Last Closed: 2013-01-08 07:05:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 683213    
Bug Blocks: 807971    

Description Jonathan Earl Brassow 2011-04-05 18:42:02 UTC
+++ This bug was initially created as a clone of Bug #683213 +++

Description of problem:
I killed a mirror leg device on the service owner, which then appeared to have cause the service to be relocated, but that relocation failed.

This is with the "old way" of using tags.

Old Owner:
Mar  8 14:30:04 taft-01 rgmanager[3335]: I am node #1
Mar  8 14:30:05 taft-01 rgmanager[3335]: Resource Group Manager Starting
Mar  8 14:30:05 taft-01 rgmanager[3335]: Loading Service Data
Mar  8 14:30:07 taft-01 rgmanager[3335]: Initializing Services
Mar  8 14:30:07 taft-01 rgmanager[3906]: stop: Could not match /dev/TAFT/ha1 with a real device
Mar  8 14:30:07 taft-01 rgmanager[3335]: stop on fs "fs1" returned 2 (invalid argument(s))
Mar  8 14:30:09 taft-01 rgmanager[3961]: Deactivating TAFT/ha1
Mar  8 14:30:09 taft-01 rgmanager[3983]: Making resilient : lvchange -an TAFT/ha1
Mar  8 14:30:09 taft-01 rgmanager[4008]: Resilient command: lvchange -an TAFT/ha1 --config devices{filter=["a|/dev/sda2|","a|/dev/sdb1|","a|/dev/sdc1|","a|/dev/sdd1|",
Mar  8 14:30:10 taft-01 rgmanager[4032]: Removing ownership tag (taft-01) from TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[4156]: Unable to delete tag from TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[4178]: Failed to stop TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[4200]: Failed to stop TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[3335]: stop on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:11 taft-01 rgmanager[3335]: Services Initialized
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: Local UP
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: taft-02 UP
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: taft-03 UP
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: taft-04 UP

Attempted New Owner:
Mar  8 14:30:04 taft-03 rgmanager[3897]: I am node #3
Mar  8 14:30:04 taft-03 rgmanager[3897]: Resource Group Manager Starting
Mar  8 14:30:04 taft-03 rgmanager[3897]: Loading Service Data
Mar  8 14:30:06 taft-03 rgmanager[3897]: Initializing Services
Mar  8 14:30:06 taft-03 rgmanager[4468]: stop: Could not match /dev/TAFT/ha1 with a real device
Mar  8 14:30:06 taft-03 rgmanager[3897]: stop on fs "fs1" returned 2 (invalid argument(s))
Mar  8 14:30:08 taft-03 rgmanager[4523]: Deactivating TAFT/ha1
Mar  8 14:30:08 taft-03 rgmanager[4545]: Making resilient : lvchange -an TAFT/ha1
Mar  8 14:30:09 taft-03 rgmanager[4570]: Resilient command: lvchange -an TAFT/ha1 --config devices{filter=["a|/dev/sda2|","a|/dev/sdb1|","a|/dev/sdc1|","a|/dev/sdd1|",
Mar  8 14:30:09 taft-03 rgmanager[4594]: Removing ownership tag (taft-03) from TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[4723]: Unable to delete tag from TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[4745]: Failed to stop TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[4767]: Failed to stop TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[3897]: stop on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:10 taft-03 rgmanager[3897]: Services Initialized
Mar  8 14:30:10 taft-03 rgmanager[3897]: State change: Local UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: State change: taft-01 UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: State change: taft-04 UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: State change: taft-02 UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: Starting stopped service service:halvm
Mar  8 14:30:13 taft-03 rgmanager[4839]: Activating TAFT/ha1
Mar  8 14:30:13 taft-03 rgmanager[4951]: Unable to add tag to TAFT/ha1
Mar  8 14:30:14 taft-03 rgmanager[4973]: Failed to start TAFT/ha1
Mar  8 14:30:14 taft-03 rgmanager[4995]: Attempting cleanup of TAFT
Mar  8 14:30:14 taft-03 rgmanager[5155]: Failed to make TAFT consistent
Mar  8 14:30:14 taft-03 rgmanager[3897]: start on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:14 taft-03 rgmanager[3897]: #68: Failed to start service:halvm; return value: 1
Mar  8 14:30:14 taft-03 rgmanager[3897]: Stopping service service:halvm
Mar  8 14:30:15 taft-03 rgmanager[5192]: stop: Could not match /dev/TAFT/ha1 with a real device
Mar  8 14:30:15 taft-03 rgmanager[3897]: stop on fs "fs1" returned 2 (invalid argument(s))
Mar  8 14:30:16 taft-03 rgmanager[5247]: Deactivating TAFT/ha1
Mar  8 14:30:17 taft-03 rgmanager[5269]: Making resilient : lvchange -an TAFT/ha1
Mar  8 14:30:17 taft-03 rgmanager[5294]: Resilient command: lvchange -an TAFT/ha1 --config devices{filter=["a|/dev/sda2|","a|/dev/sdb1|","a|/dev/sdc1|","a|/dev/sdd1|",
Mar  8 14:30:18 taft-03 rgmanager[5318]: Removing ownership tag (taft-03) from TAFT/ha1
Mar  8 14:30:18 taft-03 rgmanager[5430]: Unable to delete tag from TAFT/ha1
Mar  8 14:30:18 taft-03 rgmanager[5452]: Failed to stop TAFT/ha1
Mar  8 14:30:19 taft-03 rgmanager[5474]: Failed to stop TAFT/ha1
Mar  8 14:30:19 taft-03 rgmanager[3897]: stop on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:19 taft-03 rgmanager[3897]: #12: RG service:halvm failed to stop; intervention required
Mar  8 14:30:19 taft-03 rgmanager[3897]: Service service:halvm is failed
Mar  8 14:30:19 taft-03 rgmanager[3897]: #13: Service service:halvm failed to stop cleanly


[root@taft-03 ~]# lvs -a -o +devices
  LV             VG        Attr   LSize  Origin Snap%  Move Log      Copy%  Convert Devices
  ha1            TAFT      mwi---  3.00g                    ha1_mlog                ha1_mimage_0(0),ha1_mimage_1(0),ha1_mimage_2(0)
  [ha1_mimage_0] TAFT      Iwi---  3.00g                                            /dev/sde1(0)
  [ha1_mimage_1] TAFT      Iwi---  3.00g                                            /dev/sdg1(0)
  [ha1_mimage_2] TAFT      Iwi---  3.00g                                            /dev/sdb1(0)
  [ha1_mlog]     TAFT      lwi---  4.00m                                            /dev/sdd1(0)
  lv_home        vg_taft03 -wi-ao 25.64g                                            /dev/sda2(8269)
  lv_root        vg_taft03 -wi-ao 32.30g                                            /dev/sda2(0)
  lv_swap        vg_taft03 -wi-ao  9.81g                                            /dev/sda2(14832)
[root@taft-03 ~]# lvchange -an TAFT
[root@taft-03 ~]# lvchange -ay TAFT
  Not activating TAFT/ha1 since it does not pass activation filter.
[root@taft-03 ~]# vgchange --addtag taft-03 TAFT
  Cannot change VG TAFT while PVs are missing.
  Consider vgreduce --removemissing.

[root@taft-03 ~]# vgreduce --removemissing TAFT
  WARNING: Partial LV ha1 needs to be repaired or removed.
  WARNING: Partial LV ha1_mimage_0 needs to be repaired or removed.
  WARNING: There are still partial LVs in VG TAFT.
  To remove them unconditionally use: vgreduce --removemissing --force.
  Proceeding to remove empty missing PVs.
  Command failed with status code 5.


Version-Release number of selected component (if applicable):
2.6.32-94.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.31.el6    BUILT: Wed Jan 26 05:39:15 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

--- Additional comment from jbrassow on 2011-04-05 14:35:20 EDT ---

Created attachment 490060 [details]
rhel6 patch for this issue

problem may also be present in rhel5.  This patch would fix the issue there as well.

Comment 1 RHEL Program Management 2012-04-02 10:46:41 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 7 errata-xmlrpc 2013-01-08 07:05:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0026.html