Red Hat Bugzilla – Bug 431382
qdiskd kills cman and self
Last modified: 2009-04-16 18:51:45 EDT
Description of problem:
I have a two node cluster, and when qdiskd starts the following error appears
and the cluster goes down.
Feb 3 00:20:19 nodo2 qdiskd: <info> Assuming master role
Feb 3 00:20:20 nodo2 qdiskd: <err> cman_dispatch: Host is down
Feb 3 00:20:20 nodo2 qdiskd: <err> Halting qdisk operations
Version-Release number of selected component (if applicable):
I have one heuristic for qdiskd in cluster.conf:
<quorumd device="/dev/gnbd2" votes="1" min_score="5" label="quorum" tko="20"
<heuristic interval="2" program="ping -c1 -t1 192.168.1.254"
I started cman and qdiskd in foreground "qdiskd -f -d" ans shows errors when
when detects the Active node, then the cman daemon goes down.
If I removes the heuristics, the qdiskd daemon starts but it shows in the
clustat as offline.
Are you sure you have a running two node cluster? That second line says that
cman was down .. ie this node is not in the cluster, or has just left it. Are
there any messages in syslog to indicate why cman has shut down?
If the problem is reproducible then try starting cman with 'cman_tool join -d'
to get more debugging information.
Also ... if this really is RHEL5.0, try upgrading. I think a lot of qdisk
problems were fixed in 5.1
cman-2.0.73-1.el5_1.1 I think is 5.1+errata. I've never seen this before,
however. Could you attach your cluster.conf as well?
I'm sure that there are two nodes. Once qdisk is running it shutdowns the cman
daemon. This only happends whith one heuristic. I added two more heuristics and
was fine, but the configured services doesn't makes failover. Also I was trying
removing the heartbeat link and the quorum is dissolved inmediately the services
either makes failover, in addition the rgmanager daemon couldn't goes down, it
is not possible to be stopped either rebooting the system, the only way is reset
the server. I think that quorum disk resolves the split-brain problem but not.
Maybe I am wrong or the cluster is badly formed, I only wants to communicate the
lab that I did. I attach the cluster.conf and messages. Thanks
Created attachment 294072 [details]
messages and cluster.conf
It looks like cman is being restarted without restarting the other services. If
you shut cman down, it's important to make sure that everything else is also
shut down. Normally it will check that for you if you use cman_tool or the init
scripts - I'm not entirely sure what's happened in this case.
The best wasy to be sure of this is to always use the init scripts (at least) or
even to reboot the entire node to make sure that there is no state left lying
If you reboot the node does it join the cluster properly ?
It's probably the openais IPC bug that causes openais to 'splode when qdiskd
Feb 3 02:15:44 apache2 openais: [MAIN ] AIS Executive Service RELEASE
'subrev 1324 version 0.80.2'
It wasn't fixed until 0.80.3-some_rev.
Try one of those.
*** This bug has been marked as a duplicate of 314641 ***
(In reply to comment #6)
> It's probably the openais IPC bug that causes openais to 'splode when qdiskd
> advertises master-status.
> Feb 3 02:15:44 apache2 openais: [MAIN ] AIS Executive Service RELEASE
> 'subrev 1324 version 0.80.2'
> It wasn't fixed until 0.80.3-some_rev.
Yes the member node joins succesfully to the cluster.
The openais-0.80.3-7 packages fix at least one problem specific to
qdiskd/cman/openais interaction - let us know if it fixes your qdiskd problem.