Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 612300

Summary:

conga will fail to start a new node that is added via luci

Product:

Red Hat Enterprise Linux 5

Reporter:

Shane Bradley <sbradley>

Component:

conga

Assignee:

Ryan McCabe <rmccabe>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.5

CC:

bbrock, cluster-maint, tao, wmealing

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

conga-0.12.2-16.el5

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-01-13 22:29:24 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
patch to fix bug	none

Description Shane Bradley 2010-07-07 19:50:23 UTC

Description of problem:

The install error that luci reports: "A problem occurred when starting
this node: service cman start failed: "

When a new node is added in current implementation of Luci/Ricci a
skeleton cluster.conf is added with just the clustername tag
updated. The adding of the node relies on "ccsd" to find the
clustername in the network. If for some the cluster(via clustername
that is in skeleton cluster.conf) is not found then starting of cman
will fail because cluster.conf does not have any nodes defined to
create a new cluster since it could not find the orginal cluster.

The name of the cluster that this node is trying to join is:
"zylog_cluster2" which is the name that is in the skeleton
cluster.conf.

$  cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="zylog_cluster2">
       <fence_daemon post_fail_delay="0" post_join_delay="3"/>
       <clusternodes/>
       <cman/>
       <fencedevices/>
       <rm/>
</cluster>

Here is log for the 3rd node when it was added to cluster.
$ tail -f /var/log/messages
Jul  7 14:37:25 clusternode3 ricci: startup succeeded
Jul  7 14:42:59 clusternode3 ccsd[9097]: Starting ccsd 2.0.115:
Jul  7 14:42:59 clusternode3 ccsd[9097]:  Built: May 25 2010 04:32:01
Jul  7 14:42:59 clusternode3 ccsd[9097]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Jul  7 14:42:59 clusternode3 ccsd[9097]: cluster.conf (cluster name = zylog_cluster2, version = 1) found.
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] local node name "clusternode3" not found in cluster.conf
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] Error reading CCS info, cannot start
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] Error reading config from CCS
Jul  7 14:43:01 clusternode3 openais[9103]: [MAIN ] AIS Executive exiting (reason: could not read the main configuration file).
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] local node name "clusternode3" not found in cluster.conf
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] Error reading CCS info, cannot start
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] Error reading config from CCS
Jul  7 14:43:02 clusternode3 openais[9125]: [MAIN ] AIS Executive exiting (reason: could not read the main configuration file).
Jul  7 14:43:28 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 30 seconds.
Jul  7 14:43:58 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 60 seconds.
Jul  7 14:44:28 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 90 seconds.
Jul  7 14:44:58 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 120 seconds.
Jul  7 14:45:28 clusternode3 ccsd[9097]: Unable to connect to cluster infrastructure after 150 seconds.
--------------------------------------------------------------------------------

What is happening is that "ccsd" is not able to find the cluster via
clustername when it broadcast. By default ccsd just searches for a
cluster in the network. 

Version-Release number of selected component (if applicable):
luci-0.12.2-12.el5
ricci-0.12.2-12.el5

How reproducible:
Does not reproduce on all clusters. 100% reproducable on clusterA, 0%
on clusterB.

Steps to Reproduce:
1. Create a cluster
2. Add existing cluster to luci 
3. Add a new node to cluster
  
Actual results:
The new node will not join the cluster since it cannot find the
cluster it was suppose to join.

Expected results:
The new node should join the cluster.

Additional info:
The solution to resolve this issue is a simple one. Instead of relying
on ccsd to find the correct cluster and get the updated cluster.conf,
just copy the cluster.conf that were sent to the original nodes to the
new nodes. Don't start ccsd on the new node till the updated
cluster.conf is copied to the new node.

Comment 1 Ryan McCabe 2010-08-06 20:16:25 UTC

Created attachment 437255 [details]
patch to fix bug

Comment 4 Ryan McCabe 2010-10-26 20:22:07 UTC

*** Bug 629152 has been marked as a duplicate of this bug. ***

Comment 8 errata-xmlrpc 2011-01-13 22:29:24 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0033.html