Bug 714671 - rgmanager with central processing behaves differently depending on the order in which services appear in cluster.conf
Summary: rgmanager with central processing behaves differently depending on the order ...
Keywords:
Status: CLOSED DUPLICATE of bug 492828
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-20 12:10 UTC by Julio Entrena Perez
Modified: 2011-06-24 13:01 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-21 13:43:05 UTC
Target Upstream Version:


Attachments (Terms of Use)
cluster.conf file that shows the deferred RIND event taking place (4.95 KB, application/octet-stream)
2011-06-20 12:10 UTC, Julio Entrena Perez
no flags Details
cluster.conf file that shows the RIND event taking place when expected (4.77 KB, application/octet-stream)
2011-06-20 12:11 UTC, Julio Entrena Perez
no flags Details

Description Julio Entrena Perez 2011-06-20 12:10:28 UTC
Created attachment 505599 [details]
cluster.conf file that shows the deferred RIND event taking place

> Description of problem:
Depending on the order in which services appear in cluster.conf, rgmanager with central processing defers the RIND script follow-service.sl during a failover.

> Version-Release number of selected component (if applicable):
rgmanager-2.0.52-9.el5_6.1 .

> How reproducible:
Always.

> Steps to Reproduce:
1. Create a cluster.conf file with:
- two nodes 1, 2 and a quorum disk.
- two failover domains:
  - FD1 includes both nodes and prefers the first.
  - FD2 includes the second node only.
- five services (defined in this order):
  - service A uses FD1.
  - service D uses FD2.
  - service E uses FD2.
  - service B uses FD1 and has a hard dependency on service A.
  - service C uses FD1 and has a hard dependency on service A.

- central processing enabled for rgmanager.

- four RIND events:
  - event class service follow-service-sl A,D,A.
  - event class node follow-service-sl A,D,A.
  - event class service follow-service-sl A,E,A.
  - event class node follow-service-sl A,E,A.

2. Start all services. A, B and C will start in node 1, D and E will start in node 2.

3. Stop the rgmanager service in node 1.
  
> Actual results:
- Services A, B and C are stopped in node 1.
- Services A, B and C are started in node 2.
- Services D and E are stopped in node 2.

> Expected results:
- Services A, B and C are stopped in node 1.
- Service A is started in node 2.
- Services D and E are stopped in node 2.
- Services B and C are started in node 2.

> Additional info:
If services are defined in cluster.conf in the following order, the expected behaviour is achieved:
  - service B uses FD1 and has a hard dependency on service A.
  - service C uses FD1 and has a hard dependency on service A.
  - service A uses FD1.
  - service D uses FD2.
  - service E uses FD2.

Please, note that the only difference is the order in which services appear in cluster.conf. No property of any service is modified at all.

See attached 'cluster.conf.works' and 'cluster.conf.doesntwork' files.

Comment 1 Julio Entrena Perez 2011-06-20 12:11:10 UTC
Created attachment 505600 [details]
cluster.conf file that shows the RIND event taking place when expected

Comment 2 Julio Entrena Perez 2011-06-20 12:12:14 UTC
# diff cluster.conf.works cluster.conf.doesntwork 
2c2
< <cluster config_version="123" name="cl55ase">
---
> <cluster config_version="124" name="cl55ase">
54a55,62
>                 <service autostart="0" domain="fd_n1" exclusive="0" name="IWR_db_p1" nfslock="1" recovery="relocate">
>                         <fs ref="fs_share">
>                                 <nfsexport ref="nfs_share">
>                                         <nfsclient ref="nfsc_share"/>
>                                 </nfsexport>
>                         </fs>
>                         <ip ref="10.33.1.250"/>
>                 </service>
63,70d70
< 		<service autostart="0" domain="fd_n1" exclusive="0" name="IWR_db_p1" nfslock="1" recovery="relocate">
< 			<fs ref="fs_share">
< 				<nfsexport ref="nfs_share">
< 					<nfsclient ref="nfsc_share"/>
< 				</nfsexport>
< 			</fs>
< 			<ip ref="10.33.1.250"/>
< 		</service>

Comment 4 Julio Entrena Perez 2011-06-20 12:14:35 UTC
I forgot to mention that the behaviour is the same regardless of which node is the RG-master and which one is the RG-worker.

Comment 5 Lon Hohberger 2011-06-21 13:43:05 UTC
The way the event processing works is very configuration-order dependent; that is:

  <service name="a" depend="b"/>
  <service name="b" />

When a node comes online, you will see the following:

  [node 1 online]
    start service a (can't; dependency not met)
    start service b
  [service b started]
    dependency met, start service a
  [service a started]

If you change the order:

  <service name="b" />
  <service name="a" depend="b"/>

You will see:

  [node 1 online]
    start service b
    start service a
  [service b started]
  [service a started]

You can create specific priority ordering for node events by adding the 'priority' attribute to services:

  <service name="a" depend="b" priority="2" />
  <service name="b" priority="1" />

... or by reordering them in cluster.conf.

*** This bug has been marked as a duplicate of bug 492828 ***


Note You need to log in before you can comment on or make changes to this bug.