Bug 137021
Summary: | ccs doesn't find most recent cluster.conf | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | David Teigland <teigland> |
Component: | ccs | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | cluster-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-01-27 18:03:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Teigland
2004-10-25 08:21:46 UTC
Updates with the proper version and component name. Updates with the proper version and component name. Updates with the proper version and component name. Again, just love out tools. Updating version to the right level in the defects. Sorry for the storm. This boiled down to a timing issue. ccsd processes request serially (broadcast, or otherwise). When cman_tool was started, it issued a connect. While not yet quorate, ccsd must broadcast to see if there are any more recent versions of the config file. This happens as part of the connect. select is used to set a timeout on just how long it waits for replies. 2 problems were encountered. 1) the timeout was not being properly reset after the select returned. 2) everyone uses the same timeout What results is that if a connect is issued simultaneously on every node, they first try to process the connect - then any broadcast requests. Because they all have the same timeout, they never recieve broadcast responses from their peers (because they are also stuck processing connects). The current solution is to add a random component to the timeout and make sure to set the timeout properly after the select returns. |