Bug 219156 - Errors in performing node-specific tasks due to ricci outage not reflected in luci display
Summary: Errors in performing node-specific tasks due to ricci outage not reflected in...
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: conga   
(Show other bugs)
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ryan McCabe
QA Contact: Corey Marthaler
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-12-11 16:34 UTC by Len DiMaggio
Modified: 2009-04-16 22:42 UTC (History)
7 users (show)

Fixed In Version: RC
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-08 02:24:53 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Screenshot - when ricci is down - compare to 2nd attachment (490.71 KB, image/png)
2006-12-11 16:34 UTC, Len DiMaggio
no flags Details
With ricci running - note how cman and rgmanager status/start up config are listed (500.30 KB, image/png)
2006-12-11 16:35 UTC, Len DiMaggio
no flags Details

Description Len DiMaggio 2006-12-11 16:34:01 UTC
Description of problem:
Status of ricci service not reflected in luci status display for cluster node -
for example, if the ricci service on a cluster node is stopped, the node is
still listed as green (up) in the cluster display. There is no indication in the
luci display that the ricci service is down - other than the rgmanager and cman
services' status not being listed for the node (see screenshots attached). The
luci server log correctly shows that luci cannot connect to the (shutdown) ricci
agent on the node.

Version-Release number of selected component (if applicable):
luci-0.8-25.el5
ricci-0.8-25.el5

How reproducible:
100%

Steps to Reproduce:
1. In a functioning cluster, shutdown ricci on one of the cluster nodes.
2. View that cluster node's status via luci - see the first screenshot attachment.
3. Attempt an operation on that node (for example, try to reboot the node via luci)
4. The connection failure to ricci on that node is shown in the luci server log
(Dec 11 10:01:00 tng3-5 luci[15925]: ricci error from tng3-1.lab.msp.redhat.com:
Error connecting to tng3-1.lab.msp.redhat.com:11111: (111, 'Connection refused')
), but not in the luci wep app display.
  
Actual results:
Errors are written to the log, not shown in the web app.

Expected results:
The fact that ricci is down should be displayed in the luci web app.

Additional info:
See the attachments.

Comment 1 Len DiMaggio 2006-12-11 16:34:02 UTC
Created attachment 143297 [details]
Screenshot - when ricci is down - compare to 2nd attachment

Comment 2 Len DiMaggio 2006-12-11 16:35:41 UTC
Created attachment 143298 [details]
With ricci running - note how cman and rgmanager status/start up config are listed

Comment 3 Jim Parsons 2006-12-11 17:01:33 UTC
This ticket now seems to be mixing behhavior in two separate interfaces. Let's
address your concerns about node display color in the cluster list page first.
When the cluster list is generated, a ricci agent on one node in each cluster is
contacted for information about the cluster. If node1 is contacted and there is
no response, then node2 is checked, and so on until contact is made. If no nodes
in a cluster can be reached, then this indicates a severe UI/ricci problem and
the user is informed of this. 

If node1 is unreachable, but node2 is, the cluster status will be retrieved from
it - NOTE: member status in a cluster is COMPLETELY separate from whether ricci
is running...ricci does not even need to be installed and a node can still be a
fuctioning member of the cluster, visible from other nodes.

Knowing if every ricci agent is running in a cluster is a conga health thing,
and not one we want to pursue...for example, a 50 node cluster would take awhile
to poll each node for ricci response, and you are actually not learning anything
about your cluster from this action.

If you take a node-specific action in the ui, such as, asserting that a node
leave the cluster, or callinng up the node config page and performing an action
on it, the ui will quickly discover that the ricci agent is not running and
inform the user. I *think* this is the best behavior. Until you need to actually
connect to the node with the ricci agent down, just display all you know about
the node accurately.

Comment 4 Len DiMaggio 2006-12-11 17:26:02 UTC
That - comment #3 - makes sense - especially for a large cluster!

The problem that I'm seeing today is that on a node-specific action in luci -
for example, rebooting or deleting the node - when ricci is not running on that
node, the operation fails and no error is reported to the luci web gui. 

An error is reported if an attempt is made to retrieve the node's log via luci.



'...If you take a node-specific action in the ui, such as, asserting that a node
leave the cluster, or callinng up the node config page and performing an action
on it, the ui will quickly discover that the ricci agent is not running and
inform the user...'

Comment 5 Len DiMaggio 2006-12-11 17:37:31 UTC
Changed the summary to better describe the problem

Comment 6 Ryan McCabe 2006-12-11 22:43:19 UTC
Fixed in -HEAD

Comment 9 Len DiMaggio 2007-01-19 20:05:57 UTC
Verified in:
luci-0.8-29.el5
ricci-0.8-29.el5

This error is displayed:

 The ricci agent for this node is unresponsive. Node-specific information is not
available at this time.


Comment 10 RHEL Product and Program Management 2007-02-08 02:24:53 UTC
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.



Note You need to log in before you can comment on or make changes to this bug.