Bug 144806
Summary: | ccsd not handeling all clu_connect errors on startup appropriately | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Adam "mantis" Manthei <amanthei> | ||||
Component: | ccs | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | cluster-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-01-27 18:03:52 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Adam "mantis" Manthei
2005-01-11 17:24:19 UTC
Created attachment 109621 [details]
add additonal error checks on startup
This adds additional error checking on startup. If ccsd can't connect to magma
after CCSD_CONNECT_RETRY seconds, it will fail and print an error to stderr
(The #define for CCSD_CONNECT_RETRY is in a gross spot, put it at least
demonstrates my intent)
The above patch does lead to other problems in that ccsd will not return until it connects to cman or gulm... this will cause problems for the init scripts since gulm/cman are started after ccsd. Is it better for ccsd to stop after failing to connect w/ clu_connect after so many seconds? At the very least, there should probably be some messages that are printed after a certain number of failed clu_connect() calls indicating in the logs that ccs is having issues. (This is not obvious unless you are looking at the code) We might also want to concider ignoring SIGHUP or log a message stating that ccsd is not ready to process the cluster.conf file until the clu_connect call succeeds instead of dieing by default as we do right now. A warning is now printed every ten seconds if a connection to the cluster infrastructure can not be made. This is like saying the user must run 'cman_tool join' or 'lock_gulmd' within 10 seconds of starting ccsd. Perhaps it would be wise to bump this value to a larger number and special case the EAFNOSUPPORT. |