Bug 683213 - mirror device failure during HA lvm service relocation may cause service failure
Summary: mirror device failure during HA lvm service relocation may cause service failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: resource-agents
Version: 6.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 693855
TreeView+ depends on / blocked
 
Reported: 2011-03-08 20:41 UTC by Corey Marthaler
Modified: 2011-05-19 14:21 UTC (History)
3 users (show)

Fixed In Version: resource-agents-3.0.12-21.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 693855 (view as bug list)
Environment:
Last Closed: 2011-05-19 14:21:11 UTC
Target Upstream Version:


Attachments (Terms of Use)
rhel6 patch for this issue (3.86 KB, patch)
2011-04-05 18:35 UTC, Jonathan Earl Brassow
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0744 0 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2011-05-18 18:09:07 UTC

Description Corey Marthaler 2011-03-08 20:41:37 UTC
Description of problem:
I killed a mirror leg device on the service owner, which then appeared to have cause the service to be relocated, but that relocation failed.

This is with the "old way" of using tags.

Old Owner:
Mar  8 14:30:04 taft-01 rgmanager[3335]: I am node #1
Mar  8 14:30:05 taft-01 rgmanager[3335]: Resource Group Manager Starting
Mar  8 14:30:05 taft-01 rgmanager[3335]: Loading Service Data
Mar  8 14:30:07 taft-01 rgmanager[3335]: Initializing Services
Mar  8 14:30:07 taft-01 rgmanager[3906]: stop: Could not match /dev/TAFT/ha1 with a real device
Mar  8 14:30:07 taft-01 rgmanager[3335]: stop on fs "fs1" returned 2 (invalid argument(s))
Mar  8 14:30:09 taft-01 rgmanager[3961]: Deactivating TAFT/ha1
Mar  8 14:30:09 taft-01 rgmanager[3983]: Making resilient : lvchange -an TAFT/ha1
Mar  8 14:30:09 taft-01 rgmanager[4008]: Resilient command: lvchange -an TAFT/ha1 --config devices{filter=["a|/dev/sda2|","a|/dev/sdb1|","a|/dev/sdc1|","a|/dev/sdd1|",
Mar  8 14:30:10 taft-01 rgmanager[4032]: Removing ownership tag (taft-01) from TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[4156]: Unable to delete tag from TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[4178]: Failed to stop TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[4200]: Failed to stop TAFT/ha1
Mar  8 14:30:11 taft-01 rgmanager[3335]: stop on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:11 taft-01 rgmanager[3335]: Services Initialized
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: Local UP
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: taft-02 UP
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: taft-03 UP
Mar  8 14:30:11 taft-01 rgmanager[3335]: State change: taft-04 UP

Attempted New Owner:
Mar  8 14:30:04 taft-03 rgmanager[3897]: I am node #3
Mar  8 14:30:04 taft-03 rgmanager[3897]: Resource Group Manager Starting
Mar  8 14:30:04 taft-03 rgmanager[3897]: Loading Service Data
Mar  8 14:30:06 taft-03 rgmanager[3897]: Initializing Services
Mar  8 14:30:06 taft-03 rgmanager[4468]: stop: Could not match /dev/TAFT/ha1 with a real device
Mar  8 14:30:06 taft-03 rgmanager[3897]: stop on fs "fs1" returned 2 (invalid argument(s))
Mar  8 14:30:08 taft-03 rgmanager[4523]: Deactivating TAFT/ha1
Mar  8 14:30:08 taft-03 rgmanager[4545]: Making resilient : lvchange -an TAFT/ha1
Mar  8 14:30:09 taft-03 rgmanager[4570]: Resilient command: lvchange -an TAFT/ha1 --config devices{filter=["a|/dev/sda2|","a|/dev/sdb1|","a|/dev/sdc1|","a|/dev/sdd1|",
Mar  8 14:30:09 taft-03 rgmanager[4594]: Removing ownership tag (taft-03) from TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[4723]: Unable to delete tag from TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[4745]: Failed to stop TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[4767]: Failed to stop TAFT/ha1
Mar  8 14:30:10 taft-03 rgmanager[3897]: stop on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:10 taft-03 rgmanager[3897]: Services Initialized
Mar  8 14:30:10 taft-03 rgmanager[3897]: State change: Local UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: State change: taft-01 UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: State change: taft-04 UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: State change: taft-02 UP
Mar  8 14:30:11 taft-03 rgmanager[3897]: Starting stopped service service:halvm
Mar  8 14:30:13 taft-03 rgmanager[4839]: Activating TAFT/ha1
Mar  8 14:30:13 taft-03 rgmanager[4951]: Unable to add tag to TAFT/ha1
Mar  8 14:30:14 taft-03 rgmanager[4973]: Failed to start TAFT/ha1
Mar  8 14:30:14 taft-03 rgmanager[4995]: Attempting cleanup of TAFT
Mar  8 14:30:14 taft-03 rgmanager[5155]: Failed to make TAFT consistent
Mar  8 14:30:14 taft-03 rgmanager[3897]: start on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:14 taft-03 rgmanager[3897]: #68: Failed to start service:halvm; return value: 1
Mar  8 14:30:14 taft-03 rgmanager[3897]: Stopping service service:halvm
Mar  8 14:30:15 taft-03 rgmanager[5192]: stop: Could not match /dev/TAFT/ha1 with a real device
Mar  8 14:30:15 taft-03 rgmanager[3897]: stop on fs "fs1" returned 2 (invalid argument(s))
Mar  8 14:30:16 taft-03 rgmanager[5247]: Deactivating TAFT/ha1
Mar  8 14:30:17 taft-03 rgmanager[5269]: Making resilient : lvchange -an TAFT/ha1
Mar  8 14:30:17 taft-03 rgmanager[5294]: Resilient command: lvchange -an TAFT/ha1 --config devices{filter=["a|/dev/sda2|","a|/dev/sdb1|","a|/dev/sdc1|","a|/dev/sdd1|",
Mar  8 14:30:18 taft-03 rgmanager[5318]: Removing ownership tag (taft-03) from TAFT/ha1
Mar  8 14:30:18 taft-03 rgmanager[5430]: Unable to delete tag from TAFT/ha1
Mar  8 14:30:18 taft-03 rgmanager[5452]: Failed to stop TAFT/ha1
Mar  8 14:30:19 taft-03 rgmanager[5474]: Failed to stop TAFT/ha1
Mar  8 14:30:19 taft-03 rgmanager[3897]: stop on lvm "lvm" returned 1 (generic error)
Mar  8 14:30:19 taft-03 rgmanager[3897]: #12: RG service:halvm failed to stop; intervention required
Mar  8 14:30:19 taft-03 rgmanager[3897]: Service service:halvm is failed
Mar  8 14:30:19 taft-03 rgmanager[3897]: #13: Service service:halvm failed to stop cleanly


[root@taft-03 ~]# lvs -a -o +devices
  LV             VG        Attr   LSize  Origin Snap%  Move Log      Copy%  Convert Devices
  ha1            TAFT      mwi---  3.00g                    ha1_mlog                ha1_mimage_0(0),ha1_mimage_1(0),ha1_mimage_2(0)
  [ha1_mimage_0] TAFT      Iwi---  3.00g                                            /dev/sde1(0)
  [ha1_mimage_1] TAFT      Iwi---  3.00g                                            /dev/sdg1(0)
  [ha1_mimage_2] TAFT      Iwi---  3.00g                                            /dev/sdb1(0)
  [ha1_mlog]     TAFT      lwi---  4.00m                                            /dev/sdd1(0)
  lv_home        vg_taft03 -wi-ao 25.64g                                            /dev/sda2(8269)
  lv_root        vg_taft03 -wi-ao 32.30g                                            /dev/sda2(0)
  lv_swap        vg_taft03 -wi-ao  9.81g                                            /dev/sda2(14832)
[root@taft-03 ~]# lvchange -an TAFT
[root@taft-03 ~]# lvchange -ay TAFT
  Not activating TAFT/ha1 since it does not pass activation filter.
[root@taft-03 ~]# vgchange --addtag taft-03 TAFT
  Cannot change VG TAFT while PVs are missing.
  Consider vgreduce --removemissing.

[root@taft-03 ~]# vgreduce --removemissing TAFT
  WARNING: Partial LV ha1 needs to be repaired or removed.
  WARNING: Partial LV ha1_mimage_0 needs to be repaired or removed.
  WARNING: There are still partial LVs in VG TAFT.
  To remove them unconditionally use: vgreduce --removemissing --force.
  Proceeding to remove empty missing PVs.
  Command failed with status code 5.


Version-Release number of selected component (if applicable):
2.6.32-94.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.31.el6    BUILT: Wed Jan 26 05:39:15 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

Comment 1 Jonathan Earl Brassow 2011-04-05 18:35:20 UTC
Created attachment 490060 [details]
rhel6 patch for this issue

problem may also be present in rhel5.  This patch would fix the issue there as well.

Comment 3 Corey Marthaler 2011-04-05 18:51:34 UTC
Tests ran over night (failing mirror devices and relocating services) with the fix in comment #1. The only issues seen were 692186 and 688394.

Comment 8 Corey Marthaler 2011-04-08 22:54:49 UTC
Ran 15 failure iterations (mirror device + service relocation) with the test revolution_9 and didn't see any issues. Marking verified with the latest rpms.


2.6.32-130.el6.x86_64
resource-agents-3.0.12-21.el6.x86_64


lvm2-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-libs-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-cluster-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
udev-147-2.35.el6    BUILT: Wed Mar 30 07:32:05 CDT 2011
device-mapper-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
cmirror-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011

Comment 9 errata-xmlrpc 2011-05-19 14:21:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0744.html


Note You need to log in before you can comment on or make changes to this bug.