Bug 219156 - Errors in performing node-specific tasks due to ricci outage not reflected in luci display
Errors in performing node-specific tasks due to ricci outage not reflected in...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: conga (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ryan McCabe
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-11 11:34 EST by Len DiMaggio
Modified: 2009-04-16 18:42 EDT (History)
7 users (show)

See Also:
Fixed In Version: RC
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-07 21:24:53 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Screenshot - when ricci is down - compare to 2nd attachment (490.71 KB, image/png)
2006-12-11 11:34 EST, Len DiMaggio
no flags Details
With ricci running - note how cman and rgmanager status/start up config are listed (500.30 KB, image/png)
2006-12-11 11:35 EST, Len DiMaggio
no flags Details

  None (edit)
Description Len DiMaggio 2006-12-11 11:34:01 EST
Description of problem:
Status of ricci service not reflected in luci status display for cluster node -
for example, if the ricci service on a cluster node is stopped, the node is
still listed as green (up) in the cluster display. There is no indication in the
luci display that the ricci service is down - other than the rgmanager and cman
services' status not being listed for the node (see screenshots attached). The
luci server log correctly shows that luci cannot connect to the (shutdown) ricci
agent on the node.

Version-Release number of selected component (if applicable):
luci-0.8-25.el5
ricci-0.8-25.el5

How reproducible:
100%

Steps to Reproduce:
1. In a functioning cluster, shutdown ricci on one of the cluster nodes.
2. View that cluster node's status via luci - see the first screenshot attachment.
3. Attempt an operation on that node (for example, try to reboot the node via luci)
4. The connection failure to ricci on that node is shown in the luci server log
(Dec 11 10:01:00 tng3-5 luci[15925]: ricci error from tng3-1.lab.msp.redhat.com:
Error connecting to tng3-1.lab.msp.redhat.com:11111: (111, 'Connection refused')
), but not in the luci wep app display.
  
Actual results:
Errors are written to the log, not shown in the web app.

Expected results:
The fact that ricci is down should be displayed in the luci web app.

Additional info:
See the attachments.
Comment 1 Len DiMaggio 2006-12-11 11:34:02 EST
Created attachment 143297 [details]
Screenshot - when ricci is down - compare to 2nd attachment
Comment 2 Len DiMaggio 2006-12-11 11:35:41 EST
Created attachment 143298 [details]
With ricci running - note how cman and rgmanager status/start up config are listed
Comment 3 Jim Parsons 2006-12-11 12:01:33 EST
This ticket now seems to be mixing behhavior in two separate interfaces. Let's
address your concerns about node display color in the cluster list page first.
When the cluster list is generated, a ricci agent on one node in each cluster is
contacted for information about the cluster. If node1 is contacted and there is
no response, then node2 is checked, and so on until contact is made. If no nodes
in a cluster can be reached, then this indicates a severe UI/ricci problem and
the user is informed of this. 

If node1 is unreachable, but node2 is, the cluster status will be retrieved from
it - NOTE: member status in a cluster is COMPLETELY separate from whether ricci
is running...ricci does not even need to be installed and a node can still be a
fuctioning member of the cluster, visible from other nodes.

Knowing if every ricci agent is running in a cluster is a conga health thing,
and not one we want to pursue...for example, a 50 node cluster would take awhile
to poll each node for ricci response, and you are actually not learning anything
about your cluster from this action.

If you take a node-specific action in the ui, such as, asserting that a node
leave the cluster, or callinng up the node config page and performing an action
on it, the ui will quickly discover that the ricci agent is not running and
inform the user. I *think* this is the best behavior. Until you need to actually
connect to the node with the ricci agent down, just display all you know about
the node accurately.
Comment 4 Len DiMaggio 2006-12-11 12:26:02 EST
That - comment #3 - makes sense - especially for a large cluster!

The problem that I'm seeing today is that on a node-specific action in luci -
for example, rebooting or deleting the node - when ricci is not running on that
node, the operation fails and no error is reported to the luci web gui. 

An error is reported if an attempt is made to retrieve the node's log via luci.



'...If you take a node-specific action in the ui, such as, asserting that a node
leave the cluster, or callinng up the node config page and performing an action
on it, the ui will quickly discover that the ricci agent is not running and
inform the user...'
Comment 5 Len DiMaggio 2006-12-11 12:37:31 EST
Changed the summary to better describe the problem
Comment 6 Ryan McCabe 2006-12-11 17:43:19 EST
Fixed in -HEAD
Comment 9 Len DiMaggio 2007-01-19 15:05:57 EST
Verified in:
luci-0.8-29.el5
ricci-0.8-29.el5

This error is displayed:

 The ricci agent for this node is unresponsive. Node-specific information is not
available at this time.
Comment 10 RHEL Product and Program Management 2007-02-07 21:24:53 EST
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.