Bug 137021 - ccs doesn't find most recent cluster.conf
ccs doesn't find most recent cluster.conf
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: ccs (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jonathan Earl Brassow
GFS Bugs
Depends On:
  Show dependency treegraph
Reported: 2004-10-25 04:21 EDT by David Teigland
Modified: 2010-01-27 13:03 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-01-27 13:03:56 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description David Teigland 2004-10-25 04:21:46 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7)
Gecko/20040626 Firefox/0.9.1

Description of problem:
All my nodes had just been rebooted and I wanted to make a change to
cluster.conf.  I made the change on one node, bumped the
config_version, and then started ccsd on all the nodes.

In theory, when cman_tool connects to ccsd, ccsd will go out, find the
most recent cluster.conf, and put it in /etc/cluster.  The config 
values read by cman_tool will then be the latest.  I ran "cman_tool 
join -c <clustername>" on all nodes but three of the eight didn't 
get the new cluster.conf, the other four did.

Version-Release number of selected component (if applicable):

How reproducible:
Didn't try

Steps to Reproduce:
1. reset all nodes
2. edit cluster.conf and increment config_version on one node
3. start ccsd on all nodes
4. run cman_tool join on all nodes

Additional info:
Comment 1 Kiersten (Kerri) Anderson 2004-11-04 10:08:05 EST
Updates with the proper version and component name.
Comment 2 Kiersten (Kerri) Anderson 2004-11-04 10:16:56 EST
Updates with the proper version and component name.
Comment 3 Kiersten (Kerri) Anderson 2004-11-04 10:21:02 EST
Updates with the proper version and component name. Again, just love out tools.
Comment 4 Kiersten (Kerri) Anderson 2004-11-16 14:11:52 EST
Updating version to the right level in the defects.  Sorry for the storm.
Comment 5 Jonathan Earl Brassow 2005-01-05 18:03:04 EST
This boiled down to a timing issue.  ccsd processes request serially
(broadcast, or otherwise).

When cman_tool was started, it issued a connect.  While not yet
quorate, ccsd must broadcast to see if there are any more recent
versions of the config file.  This happens as part of the connect. 
select is used to set a timeout on just how long it waits for replies.

2 problems were encountered.  1) the timeout was not being properly
reset after the select returned.  2) everyone uses the same timeout

What results is that if a connect is issued simultaneously on every
node, they first try to process the connect - then any broadcast
requests.  Because they all have the same timeout, they never recieve
broadcast responses from their peers (because they are also stuck
processing connects).

The current solution is to add a random component to the timeout and
make sure to set the timeout properly after the select returns.

Note You need to log in before you can comment on or make changes to this bug.