Description of problem:
A node can be running at most one service marked as exclusive at any moment, but
the exclusivity seems to be honored only during failover, initial startup, and
relocation without specifying node. It is possible to run multiple exclusive
services on a single node with "clusvcadm -e <service>" (without -m/-n),
"clusvcadm -r <service> -m <node>", and during initial daemon startup in some cases.
This can lead to serious problems, depending on the reason exclusivity was
required by the application under cluster control. In my case, logical data
corruption can result when one application instance can potentially modify a mix
of data files of two services. I said the data corruption is logical because
it's not filesystem level corruption. But it can nonetheless result in
unrecoverable damage. This is bordering on urgent severity.
Version-Release number of selected component (if applicable):
(same result with rgmanager-1.9.34-1.i386.rpm)
How reproducible: always
Steps to Reproduce:
1. Create a simple cluster with 3 nodes (N1, N2, N3) and 2 services. (S1 and S2)
Each service can be a simple service with one IP address. Set exclusive="1" in
2. Start the daemons. Sometimes, both S1 and S2 end up getting started on the
same node. This problem doesn't happen all the time.
3a. Start S1 on N1, S2 on N2. Login to N1.
3b. clusvcadm -d S2; clusvcadm -e S2
3c. Both S1 and S2 are started on N1.
4a. Start S1 on N1, S2 on N2. Login to N1.
4b. clusvcadm -r S2 -m N1; clustat
4c. Both S1 and S2 are started on N1.
5a. Start S1 on N1, S2 on N2. Login to N1.
5b. clusvcadm -r S2 : S2 moves to N3.
5c. clusvcadm -r S2 (again) : S2 moves back to N2.
5d. Repeat. S2 moves back and forth between N2 and N3. So relocate works
correctly as long as target node is not specified.
Actual results: Both exclusive services are started on the same node.
Expected results: An exclusive service should refuse to start on a node that is
already running another exclusive service.
Additional info: I looked at a somewhat old version of the code and it's clear
exclusivity check is being performend only in some cases. It's easy to see that
-e and -r/-m actions are not doing exclusivity check, but I don't know why the
initial startup case also fails in some cases. In my test I started rgmanager
on all nodes sequentially, like:
# for n in N1 N2 N3 ; do ssh root@$n 'service rgmanager start' ; done
after ccsd/cman/fenced were started everywhere and quorum was established.
Could the timing have something to do with it?
Explicit specification of where a service should run (using -e/-r X -n X)
overrides the "exclusive" flag. It also overrides failover domain ordering (but
not restriction; though maybe it should).
However, the startup case is *definitely* a bug - rgmanager should not colocate
services even on startup. I think you're right - it sounds like a timing issue.
It should not be hard to fix.
Can you post your <rm> tag and children?
<ip address="10.10.130.11" monitor_link="1"/>
<ip address="10.10.130.12" monitor_link="1"/>
<service autostart="1" exclusive="1" name="S1">
<service autostart="1" exclusive="1" name="S2">
Please reconsider the expected behavior when target node is explicitly
specified. If the user specifies colocating exclusive services, that's a user
error! The software should prevent it rather than trusting the user knows
exactly what he/she is doing.
What makes it particularly error prone is the behavior of the ENABLE ("-e")
command without any other option. With that command, the target node is assumed
to be the localhost, so it's very easy to trigger an explicit colocation.
By the way, I don't know if this is already happening, but exclusivity check
should consider not only the "started" status, but other status values should be
examined such as "starting", "recovering", "failed", etc. When you choose the
eligible nodes for bringing up an exclusive service, all nodes that has an
exclusive service started or in a state that may potentially have some or all
service resources started must be considered ineligible.
Ok, sounds fine. Give me a couple of days.
Ok, this is taking longer than I thought, but ... the good news is that the fix
I've been working on fixes an entire class of issues like this, not just this
Created attachment 151735 [details]
I have prepared a patch which I hope fixes this problem.
Can you, please, take a look on it.
Your patch looks like it would fix the problem. Good work.
- count_resource_groups() is a very expensive operation (because
rg_lock()/get_rg_state()/rg_unlock() is very expensive!). If we could make a
local-only copy of it which uses get_rg_state_local() instead of get_rg_state()
during handle_start_req, this would improve performance an order of magnitude.
- handle_start_remote_req() might need to have similar changes, except rather
than flipping to relocate, it would just return failure.
What do you think?
Created attachment 152372 [details]
I have fixed all your comments. New version of patch is attached.
This patch is for rgmanager-1.9.54-1 from RHEL4.
Created attachment 152373 [details]
This patch is for rgmanager-2.0.23-1 from RHEL5.
I applied the RHEL5 patch to RHEL5 branch and head of CVS on 4/19, and I'm
applying the RHEL4 patch today. Sorry for the wait.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.