Bug 128571 - Can't join fence domain b/c ccs_test connect fails
Summary: Can't join fence domain b/c ccs_test connect fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-07-26 16:04 UTC by Derek Anderson
Modified: 2010-01-12 02:55 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-07-27 14:34:42 UTC
Embargoed:


Attachments (Terms of Use)

Description Derek Anderson 2004-07-26 16:04:35 UTC
Description of problem:
In a quorate 3-node cluster, attempts to join the fence domain fail:

[root@link-11 root]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    3   M   link-10
   2    1    3   M   link-11
   3    1    3   M   link-12
[root@link-11 root]# fenced -D
Command Line Arguments:
  name = default
  debug = 1
fenced: fence_domain_add: init_nodes ccs error -1
[root@link-11 root]# ccs_test connect
ccs_connect failed: Connection refused
[root@link-11 root]# pidof ccsd
3377
[root@link-11 root]#

Version-Release number of selected component (if applicable):
[root@link-11 root]# fenced -V
fenced DEVEL.1090872651 (built Jul 26 2004 15:11:58)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.
[root@link-11 root]# ccsd -V
ccsd DEVEL.1090872650 (built Jul 26 2004 15:11:54)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Derek Anderson 2004-07-26 16:06:55 UTC
Raising priority.  This is a blocker for doing anything with a filesystem.

Comment 2 Derek Anderson 2004-07-26 16:40:44 UTC
I can workaround this by running 'ccs_test connect force' on each node
before attempting to join the fence domain.

Comment 3 Jonathan Earl Brassow 2004-07-26 20:48:34 UTC
I was able to reproduce this by:
1. forming a quorate cluster
2. on a single node, do cman_tool leave; cman_tool join

The descriptor held on the cluster manager was becoming invalid.  Now I 
close the descriptor on cman shutdown and attempt to reconnect when it 
becomes available again.

If this is truly the fix for the problem, it may also address bug 128569

Comment 4 Jonathan Earl Brassow 2004-07-26 22:25:12 UTC
Ok, I needed to do an
FD_ZERO(&rset);
before populating the variable.

This appears to be what was causing this bug, as well as 128569


Comment 5 Derek Anderson 2004-07-27 14:34:42 UTC
Works now.

Comment 6 Kiersten (Kerri) Anderson 2004-11-16 19:10:18 UTC
Updating version to the right level in the defects.  Sorry for the storm.


Note You need to log in before you can comment on or make changes to this bug.