Description of problem: In a quorate 3-node cluster, attempts to join the fence domain fail: [root@link-11 root]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 3 M link-10 2 1 3 M link-11 3 1 3 M link-12 [root@link-11 root]# fenced -D Command Line Arguments: name = default debug = 1 fenced: fence_domain_add: init_nodes ccs error -1 [root@link-11 root]# ccs_test connect ccs_connect failed: Connection refused [root@link-11 root]# pidof ccsd 3377 [root@link-11 root]# Version-Release number of selected component (if applicable): [root@link-11 root]# fenced -V fenced DEVEL.1090872651 (built Jul 26 2004 15:11:58) Copyright (C) Red Hat, Inc. 2004 All rights reserved. [root@link-11 root]# ccsd -V ccsd DEVEL.1090872650 (built Jul 26 2004 15:11:54) Copyright (C) Red Hat, Inc. 2004 All rights reserved. How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Raising priority. This is a blocker for doing anything with a filesystem.
I can workaround this by running 'ccs_test connect force' on each node before attempting to join the fence domain.
I was able to reproduce this by: 1. forming a quorate cluster 2. on a single node, do cman_tool leave; cman_tool join The descriptor held on the cluster manager was becoming invalid. Now I close the descriptor on cman shutdown and attempt to reconnect when it becomes available again. If this is truly the fix for the problem, it may also address bug 128569
Ok, I needed to do an FD_ZERO(&rset); before populating the variable. This appears to be what was causing this bug, as well as 128569
Works now.
Updating version to the right level in the defects. Sorry for the storm.