Bug 714671

Summary: rgmanager with central processing behaves differently depending on the order in which services appear in cluster.conf
Product: Red Hat Enterprise Linux 5 Reporter: Julio Entrena Perez <jentrena>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.6CC: cluster-maint, edamato, jentrena
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-21 13:43:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
cluster.conf file that shows the deferred RIND event taking place
none
cluster.conf file that shows the RIND event taking place when expected none

Description Julio Entrena Perez 2011-06-20 12:10:28 UTC
Created attachment 505599 [details]
cluster.conf file that shows the deferred RIND event taking place

> Description of problem:
Depending on the order in which services appear in cluster.conf, rgmanager with central processing defers the RIND script follow-service.sl during a failover.

> Version-Release number of selected component (if applicable):
rgmanager-2.0.52-9.el5_6.1 .

> How reproducible:
Always.

> Steps to Reproduce:
1. Create a cluster.conf file with:
- two nodes 1, 2 and a quorum disk.
- two failover domains:
  - FD1 includes both nodes and prefers the first.
  - FD2 includes the second node only.
- five services (defined in this order):
  - service A uses FD1.
  - service D uses FD2.
  - service E uses FD2.
  - service B uses FD1 and has a hard dependency on service A.
  - service C uses FD1 and has a hard dependency on service A.

- central processing enabled for rgmanager.

- four RIND events:
  - event class service follow-service-sl A,D,A.
  - event class node follow-service-sl A,D,A.
  - event class service follow-service-sl A,E,A.
  - event class node follow-service-sl A,E,A.

2. Start all services. A, B and C will start in node 1, D and E will start in node 2.

3. Stop the rgmanager service in node 1.
  
> Actual results:
- Services A, B and C are stopped in node 1.
- Services A, B and C are started in node 2.
- Services D and E are stopped in node 2.

> Expected results:
- Services A, B and C are stopped in node 1.
- Service A is started in node 2.
- Services D and E are stopped in node 2.
- Services B and C are started in node 2.

> Additional info:
If services are defined in cluster.conf in the following order, the expected behaviour is achieved:
  - service B uses FD1 and has a hard dependency on service A.
  - service C uses FD1 and has a hard dependency on service A.
  - service A uses FD1.
  - service D uses FD2.
  - service E uses FD2.

Please, note that the only difference is the order in which services appear in cluster.conf. No property of any service is modified at all.

See attached 'cluster.conf.works' and 'cluster.conf.doesntwork' files.

Comment 1 Julio Entrena Perez 2011-06-20 12:11:10 UTC
Created attachment 505600 [details]
cluster.conf file that shows the RIND event taking place when expected

Comment 2 Julio Entrena Perez 2011-06-20 12:12:14 UTC
# diff cluster.conf.works cluster.conf.doesntwork 
2c2
< <cluster config_version="123" name="cl55ase">
---
> <cluster config_version="124" name="cl55ase">
54a55,62
>                 <service autostart="0" domain="fd_n1" exclusive="0" name="IWR_db_p1" nfslock="1" recovery="relocate">
>                         <fs ref="fs_share">
>                                 <nfsexport ref="nfs_share">
>                                         <nfsclient ref="nfsc_share"/>
>                                 </nfsexport>
>                         </fs>
>                         <ip ref="10.33.1.250"/>
>                 </service>
63,70d70
< 		<service autostart="0" domain="fd_n1" exclusive="0" name="IWR_db_p1" nfslock="1" recovery="relocate">
< 			<fs ref="fs_share">
< 				<nfsexport ref="nfs_share">
< 					<nfsclient ref="nfsc_share"/>
< 				</nfsexport>
< 			</fs>
< 			<ip ref="10.33.1.250"/>
< 		</service>

Comment 4 Julio Entrena Perez 2011-06-20 12:14:35 UTC
I forgot to mention that the behaviour is the same regardless of which node is the RG-master and which one is the RG-worker.

Comment 5 Lon Hohberger 2011-06-21 13:43:05 UTC
The way the event processing works is very configuration-order dependent; that is:

  <service name="a" depend="b"/>
  <service name="b" />

When a node comes online, you will see the following:

  [node 1 online]
    start service a (can't; dependency not met)
    start service b
  [service b started]
    dependency met, start service a
  [service a started]

If you change the order:

  <service name="b" />
  <service name="a" depend="b"/>

You will see:

  [node 1 online]
    start service b
    start service a
  [service b started]
  [service a started]

You can create specific priority ordering for node events by adding the 'priority' attribute to services:

  <service name="a" depend="b" priority="2" />
  <service name="b" priority="1" />

... or by reordering them in cluster.conf.

*** This bug has been marked as a duplicate of bug 492828 ***