639103 – last_owner is not correctly updated on service failover

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 639103 - last_owner is not correctly updated on service failover

Summary: last_owner is not correctly updated on service failover

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-09-30 20:40 UTC by Lon Hohberger
Modified:	2011-05-19 14:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:	rgmanager-3.0.12-11.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:	610483
Environment:
Last Closed:	2011-05-19 14:18:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0750	0	normal	SHIPPED_LIVE	rgmanager bug fix and enhancement update	2011-05-18 18:09:01 UTC

Description Lon Hohberger 2010-09-30 20:40:05 UTC

+++ This bug was initially created as a clone of Bug #610483 +++

Description of problem:

On service recovery procedure after Node crash the service has been running
last_owner reported by 'clustat -l' utility is wrong.

As far as last_owner updated by 'service stop' logic, no one update the
last_owner for failover case.

Version-Release number of selected component (if applicable):

rgmanager-2.0.52-6.el5.centos


How reproducible:


Steps to Reproduce:
1. Create service SRV1 on node N1 with realocation rules to node N2 on failover
2. Start SRV1 on N1
3. fence the node N1
4. After service has started on N2 check the last_owner for SRV1 by 'clustat'


Additional info:
Temporary fix for the problem.

[root@dim-ws rgmanager-2.0.52]#  diff -u src/daemons/rg_state.c src/daemons/rg_state.c.orig
--- src/daemons/rg_state.c      2010-07-02 12:19:07.000000000 +0400
+++ src/daemons/rg_state.c.orig 2010-07-02 12:18:58.000000000 +0400
@@ -681,6 +681,7 @@
                /*
                 * Service is running but owner is down -> RG_EFAILOVER
                 */
+               svcStatus->rs_last_owner = svcStatus->rs_owner;
                clulog(LOG_NOTICE,
                       "Taking over service %s from down member %s\n",
                       svcName, memb_id_to_name(membership,

Comment 1 Lon Hohberger 2010-09-30 21:20:17 UTC

[root@snap ~]# clustat
Cluster Status for sereal @ Thu Sep 30 17:17:55 2010
Member Status: Quorate

 Member Name                                  ID   Status
 ------ ----                                  ---- ------
 snap                                             1 Online, Local, rgmanager
 crackle                                          2 Online, rgmanager

 Service Name                        Owner (Last)                        State         
 ------- ----                        ----- ------                        -----         
 service:test                        snap                                started       
[root@snap ~]# clusvcadm -r test
Trying to relocate service:test...Failure: Service is frozen
[root@snap ~]# clusvcadm -U test
Local machine unfreezing service:test...Success
[root@snap ~]# clusvcadm -r test
Trying to relocate service:test...clustat -l
Success
service:test is now running on crackle
[root@snap ~]# clustat -l
Cluster Status for sereal @ Thu Sep 30 17:18:32 2010
Member Status: Quorate

 Member Name                                  ID   Status
 ------ ----                                  ---- ------
 snap                                             1 Online, Local, rgmanager
 crackle                                          2 Online, rgmanager

Service Information
------- -----------

Service Name      : service:test
  Current State   : started (112)
  Owner           : crackle
  Last Owner      : snap
  Last Transition : Thu Sep 30 17:18:27 2010


Yeah, this is just for failover.   Ok.

Comment 2 Lon Hohberger 2010-12-01 20:36:37 UTC

http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=221c56a50dc2451eadffcdf11ebee8b5542377ea

Comment 3 Lon Hohberger 2011-01-31 22:56:11 UTC

http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=e82577d4784d1ef9748ae8e932d7445c3337d761

Comment 5 Corey Marthaler 2011-04-14 23:02:10 UTC

Is the last owner now only supposed to show up in the long output format?

The regular (Last) clustat output is still blank.


[root@taft-01 sbin]# clustat
Cluster Status for TAFT @ Thu Apr 14 17:54:48 2011
Member Status: Quorate

 Member Name       ID   Status
 ------ ----       ---- ------
 taft-01               1 Online, Local, rgmanager
 taft-02               2 Online, rgmanager
 taft-03               3 Online, rgmanager
 taft-04               4 Online, rgmanager

 Service Name       Owner (Last)       State
 ------- ----       ----- ------       -----
 service:nfs1       taft-03            started

# FAILOVER/FENCING TAKES PLACE

[root@taft-01 sbin]# clustat
Cluster Status for TAFT @ Thu Apr 14 17:55:43 2011
Member Status: Quorate

 Member Name       ID   Status
 ------ ----       ---- ------
 taft-01               1 Online, Local, rgmanager
 taft-02               2 Online, rgmanager
 taft-03               3 Offline
 taft-04               4 Online, rgmanager

 Service Name       Owner (Last)       State
 ------- ----       ----- ------       -----
 service:nfs1       taft-01            started

[root@taft-01 sbin]# clustat -l
Cluster Status for TAFT @ Thu Apr 14 17:56:19 2011
Member Status: Quorate

 Member Name       ID   Status
 ------ ----       ---- ------
 taft-01               1 Online, Local, rgmanager
 taft-02               2 Online, rgmanager
 taft-03               3 Offline
 taft-04               4 Online, rgmanager

Service Information
------- -----------

Service Name      : service:nfs1
  Current State   : started (112)
  Flags           : none (0)
  Owner           : taft-01
  Last Owner      : taft-03
  Last Transition : Thu Apr 14 17:55:19 2011

Comment 6 Lon Hohberger 2011-04-15 16:12:00 UTC

That's correct.

clustat reports (last_owner) if the service is in the stopped state.

You will see the last owner in:

 ' clustat -l '
 ' clustat -x '

outputs.

Comment 7 Corey Marthaler 2011-04-15 16:24:39 UTC

Marking verified based on comments #5 and #6.

Comment 8 errata-xmlrpc 2011-05-19 14:18:21 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0750.html

Note You need to log in before you can comment on or make changes to this bug.