Apparently fixing this on the side of ccs is not enough for the flawless
cluster stack operation, but at least, it will lower the probability of
running into issues with configuration being reloaded within the cluster
stack components in a way the first such reload hasn't finished completely
when it is triggered anew (cf. the likely race condition in rgmanager
in the original [bug 1157951]).
+++ This bug was initially created as a clone of Bug #1157951 +++
--- Additional comment from Jan Pokorný on 2014-11-20 23:52:52 CET ---
you originally used the same (or equivalent) command as later on, i.e.:
> ccs -h localhost --activate --sync --password "secret" --rmvm iRed2
1. "Updating cluster.conf" followed by symptoms of cluster.conf being
indeed propagated, shortly twice in row on nr-c03n01, seemed unnatural
2. indeed there is a bug in ccs causing following sequence:
- if (removevm): remove_vm(name)
-> set_cluster_conf (while "activate" holds ~ --activate,
only against localhost)
"activate" should be temporarily masked if "sync" is set
to prevent "double activate", just as the method below does
- if (sync): sync_cluster_conf()
-> set_cluster_conf (with "activate" masked,
against all nodes via cluster.conf hostnames)
-> set_cluster_conf (with "activate" unmasked, hence true as above,
only against the last enumerated node)
Bottom-line: there is still a bug in rgmanager in not being able, in some
circumstances, to deal with 2+ subsequent configuration updates in a very
very very short time frame (likely a race condition)
Good news: buggy ccs (in a sense, working, but less efficiently than
appropriate) helped to discover this bug :)
Created attachment 959802 [details]
Solution should be easy, just temporarily mask the "activate" flag,
unmask it just before "sync" that is intentionally a last triggerable
modifier in the ccs invocation.
> This variant of the patch tries to preserve original behavior that
> standalone --activate (without --sync as suggested per help message)
> will also activate (rule of "no more than once" is respected).
> If not suitable, replace "not(sync) and activate" with "False".
Before Fix (2 propagate command sent):
[root@ask-03 ~]# rpm -q ccs
[root@ask-03 ~]# rm -f /etc/cluster/cluster.conf
[root@ask-03 ~]# ccs --createcluster test_cluster
[root@ask-03 ~]# ccs --addnode localhost
Node localhost added.
[root@ask-03 ~]# ccs --addvm my_vm
[root@ask-03 ~]# ccs --sync --activate --debug --rmvm my_vm | grep propagate | wc
2 34 678
After Fix (1 propagate command set):
[root@ask-02 ccs]# rpm -q ccs
[root@ask-02 ccs]# rm -f /etc/cluster/cluster.conf
[root@ask-02 ccs]# ccs --createcluster test_cluster
[root@ask-02 ccs]# ccs --addnode localhost
Node localhost added.
[root@ask-02 ccs]# ccs --addvm my_vm
[root@ask-02 ccs]# ccs --sync --activate --debug --rmvm my_vm | grep propagate | wc
1 17 340
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.