Bug 612300
| Summary: | conga will fail to start a new node that is added via luci | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Shane Bradley <sbradley> | ||||
| Component: | conga | Assignee: | Ryan McCabe <rmccabe> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.5 | CC: | bbrock, cluster-maint, tao, wmealing | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | conga-0.12.2-16.el5 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-01-13 22:29:24 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 437255 [details]
patch to fix bug
*** Bug 629152 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0033.html |
Description of problem: The install error that luci reports: "A problem occurred when starting this node: service cman start failed: " When a new node is added in current implementation of Luci/Ricci a skeleton cluster.conf is added with just the clustername tag updated. The adding of the node relies on "ccsd" to find the clustername in the network. If for some the cluster(via clustername that is in skeleton cluster.conf) is not found then starting of cman will fail because cluster.conf does not have any nodes defined to create a new cluster since it could not find the orginal cluster. The name of the cluster that this node is trying to join is: "zylog_cluster2" which is the name that is in the skeleton cluster.conf. $ cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="1" name="zylog_cluster2"> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <clusternodes/> <cman/> <fencedevices/> <rm/> </cluster> Here is log for the 3rd node when it was added to cluster. $ tail -f /var/log/messages Jul 7 14:37:25 clusternode3 ricci: startup succeeded Jul 7 14:42:59 clusternode3 ccsd[9097]: Starting ccsd 2.0.115: Jul 7 14:42:59 clusternode3 ccsd[9097]: Built: May 25 2010 04:32:01 Jul 7 14:42:59 clusternode3 ccsd[9097]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Jul 7 14:42:59 clusternode3 ccsd[9097]: cluster.conf (cluster name = zylog_cluster2, version = 1) found. Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6' Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] AIS Executive Service: started and ready to provide service. Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] local node name "clusternode3" not found in cluster.conf Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] Error reading CCS info, cannot start Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] Error reading config from CCS Jul 7 14:43:01 clusternode3 openais[9103]: [MAIN ] AIS Executive exiting (reason: could not read the main configuration file). Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6' Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] AIS Executive Service: started and ready to provide service. Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] local node name "clusternode3" not found in cluster.conf Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] Error reading CCS info, cannot start Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] Error reading config from CCS Jul 7 14:43:02 clusternode3 openais[9125]: [MAIN ] AIS Executive exiting (reason: could not read the main configuration file). Jul 7 14:43:28 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 30 seconds. Jul 7 14:43:58 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 60 seconds. Jul 7 14:44:28 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 90 seconds. Jul 7 14:44:58 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 120 seconds. Jul 7 14:45:28 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 150 seconds. -------------------------------------------------------------------------------- What is happening is that "ccsd" is not able to find the cluster via clustername when it broadcast. By default ccsd just searches for a cluster in the network. Version-Release number of selected component (if applicable): luci-0.12.2-12.el5 ricci-0.12.2-12.el5 How reproducible: Does not reproduce on all clusters. 100% reproducable on clusterA, 0% on clusterB. Steps to Reproduce: 1. Create a cluster 2. Add existing cluster to luci 3. Add a new node to cluster Actual results: The new node will not join the cluster since it cannot find the cluster it was suppose to join. Expected results: The new node should join the cluster. Additional info: The solution to resolve this issue is a simple one. Instead of relying on ccsd to find the correct cluster and get the updated cluster.conf, just copy the cluster.conf that were sent to the original nodes to the new nodes. Don't start ccsd on the new node till the updated cluster.conf is copied to the new node.