428448 – HA LVM service fails to relocate when I/O is running

Bug 428448 - HA LVM service fails to relocate when I/O is running

Summary: HA LVM service fails to relocate when I/O is running

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	428475
TreeView+	depends on / blocked

Reported:	2008-01-11 16:58 UTC by Corey Marthaler
Modified:	2009-04-16 22:32 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2008-0353
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 14:30:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0353	0	normal	SHIPPED_LIVE	rgmanager bug fix and enhancement update	2008-05-20 12:46:24 UTC

Description Corey Marthaler 2008-01-11 16:58:08 UTC

Description of problem:
I attempted to relocate my lvm service with I/O running, and it failed.

[root@hayes-02 cluster]# clusvcadm -r halvm -m hayes-03
Trying to relocate service:halvm to hayes-03...Failure

This is a 3 node cluster, 4 lvs/fs in the service, and I/O running to all
filesystems.

HAYES-01:
Jan 11 05:36:27 hayes-01 clurgmgrd[20500]: <notice> Stopping service service:halvm
Jan 11 05:36:29 hayes-01 clurgmgrd: [20500]: <notice> Forcefully unmounting /mnt/fs4
Jan 11 05:36:29 hayes-01 clurgmgrd: [20500]: <warning> killing process 27558
(root xdoio /mnt/fs4)
Jan 11 05:36:41 hayes-01 clurgmgrd: [20500]: <notice> Forcefully unmounting /mnt/fs3
Jan 11 05:36:41 hayes-01 clurgmgrd: [20500]: <warning> killing process 27556
(root xdoio /mnt/fs3)
Jan 11 05:36:53 hayes-01 clurgmgrd: [20500]: <notice> Forcefully unmounting /mnt/fs2
Jan 11 05:36:53 hayes-01 clurgmgrd: [20500]: <warning> killing process 27557
(root xdoio /mnt/fs2)
Jan 11 05:37:04 hayes-01 clurgmgrd: [20500]: <notice> Forcefully unmounting /mnt/fs1
Jan 11 05:37:04 hayes-01 clurgmgrd: [20500]: <warning> killing process 27555
(root xdoio /mnt/fs1)
Jan 11 05:37:15 hayes-01 clurgmgrd: [20500]: <err> initrd image is newer than
lvm.conf [GOOD]
Jan 11 05:37:16 hayes-01 clurgmgrd[20500]: <notice> Service service:halvm is stopped
Jan 11 05:37:19 hayes-01 clurgmgrd[20500]: <err> #58: Failed opening connection
to member #2
Jan 11 05:37:19 hayes-01 clurgmgrd[20500]: <warning> #70: Failed to relocate
service:halvm; restarting locally
Jan 11 05:37:19 hayes-01 clurgmgrd[20500]: <notice> Recovering failed service
service:halvm
Jan 11 05:37:19 hayes-01 clurgmgrd: [20500]: <err> initrd image is newer than
lvm.conf [GOOD]
Jan 11 05:37:20 hayes-01 kernel: kjournald starting.  Commit interval 5 seconds
Jan 11 05:37:20 hayes-01 kernel: EXT3 FS on dm-2, internal journal
Jan 11 05:37:20 hayes-01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jan 11 05:37:20 hayes-01 kernel: kjournald starting.  Commit interval 5 seconds
Jan 11 05:37:20 hayes-01 kernel: EXT3 FS on dm-3, internal journal
Jan 11 05:37:20 hayes-01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jan 11 05:37:20 hayes-01 kernel: kjournald starting.  Commit interval 5 seconds
Jan 11 05:37:20 hayes-01 kernel: EXT3 FS on dm-4, internal journal
Jan 11 05:37:20 hayes-01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jan 11 05:37:21 hayes-01 kernel: kjournald starting.  Commit interval 5 seconds
Jan 11 05:37:21 hayes-01 kernel: EXT3 FS on dm-5, internal journal
Jan 11 05:37:21 hayes-01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jan 11 05:37:21 hayes-01 clurgmgrd[20500]: <notice> Service service:halvm started
Jan 11 05:37:23 hayes-01 clurgmgrd: [20500]: <notice> Getting status


HAYES-02:
Jan 11 05:37:05 hayes-02 clurgmgrd[1633]: <err> #37: Error receiving header from
1 sz=0 CTX 0x2aaaac000cf0


HAYES-03:
Jan 11 05:39:16 hayes-03 clurgmgrd[4180]: <notice> Starting stopped service
service:halvm
Jan 11 05:39:16 hayes-03 clurgmgrd: [4180]: <err> initrd image is newer than
lvm.conf [GOOD]
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> Failed to add ownership tag to
HAYES
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> Failed to activate volume
group, HAYES
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <notice> Attempting cleanup of HAYES
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> Failed to make HAYES consistent
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> start on lvm "lvm" returned 1
(generic error)
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <warning> #68: Failed to start
service:halvm; return value: 1
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> Stopping service service:halvm
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> stop: Could not match
/dev/HAYES/ha4 with a real device
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> stop on fs "fs4" returned 2
(invalid argument(s))
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> stop: Could not match
/dev/HAYES/ha3 with a real device
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> stop on fs "fs3" returned 2
(invalid argument(s))
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> stop: Could not match
/dev/HAYES/ha2 with a real device
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> stop on fs "fs2" returned 2
(invalid argument(s))
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> stop: Could not match
/dev/HAYES/ha1 with a real device
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> stop on fs "fs1" returned 2
(invalid argument(s))
Jan 11 05:39:17 hayes-03 clurgmgrd: [4180]: <err> initrd image is newer than
lvm.conf [GOOD]
Jan 11 05:39:17 hayes-03 clurgmgrd[4180]: <notice> Service service:halvm is
recovering


After the failure, the service remains runing on the initial node:
[root@hayes-01 cluster]# clustat
Cluster Status for HAYES @ Fri Jan 11 05:55:46 2008
Member Status: Quorate

 Member Name                      ID   Status
 ------ ----                      ---- ------
 hayes-01                             1 Online, Local, rgmanager
 hayes-02                             2 Online, rgmanager
 hayes-03                             3 Online, rgmanager

 Service Name            Owner (Last)            State
 ------- ----            ----- ------            -----
 service:halvm           hayes-01                started


[root@hayes-01 cluster]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    388   2008-01-11 05:04:58  hayes-01
   2   M    392   2008-01-11 05:04:59  hayes-02
   3   M    404   2008-01-11 05:19:03  hayes-03



Version-Release number of selected component (if applicable):
2.6.18-62.el5
rgmanager-2.0.32-4.el5

Comment 1 RHEL Program Management 2008-01-11 21:36:08 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 errata-xmlrpc 2008-05-21 14:30:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0353.html

Note You need to log in before you can comment on or make changes to this bug.