Bug 743727

Summary: Improve availability reporting to users
Product: [Other] RHQ Project Reporter: Charles Crouch <ccrouch>
Component: Core ServerAssignee: Nobody <nobody>
Status: NEW --- QA Contact:
Severity: unspecified Docs Contact:
Priority: medium    
Version: unspecifiedCC: hrupp, jshaughn, loleary, mazz
Target Milestone: ---Keywords: FutureFeature, Improvement
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 854805, 741450, 855744    
Attachments:
Description Flags
resources-errors.png none

Description Charles Crouch 2011-10-05 20:08:23 UTC
This bug is around improving the level of information JON returns to the user when it shows that a resource is down.

When a user sees that a resource is red (or unknown if we've never collected information on it before) it would often be helpful for the user to also see a message about *why* the resource is red. Today we support this to some extent, by indicating on the single resource view if there is a connection error when initially trying to contact to the resource, but this could be extended:

a) Avail icon in resource *list* of JON UI simply shows UNAVAIL when the connection to the resource fails. Right now there is no message/user-feedback in such a condition.
b) Make sure that as the ability to connect to a resource changes, e.g. someone changes the underlying remote jmx password, that we propogate this back to the user. Similarly as soon as the connection is the resource is re-estabilished this error should stop be displayed. Possibly add a reason to the availability report in the plug-in API.
c) All messages need to be user-friendly and as specific as possible about whats going wrong and possible options to fix, i.e. we should not be displaying stacktraces

Comment 2 Charles Crouch 2011-10-05 20:10:57 UTC
More notes from Larry

The end goal should be that when an availability check results in a failure that a user-friendly error message be logged and the message be passed back to the server UI.
Give user visual feedback in the JON UI regarding connection failures
When connection failures occur, the user needs a clear description of why the failure occurred and some suggestions on how to fix the issue.
Network connectivity issues: Invalid host name, connection refused, no route to host, connection timed out, etc.
Credential failures: Invalid user name, invalid password, invalid security domain
Messages should be clear and require the interpretation of the stack trace or exception class name. The users are not Java developers.
Users may not even know what a JNP URL is or where a Principal is defined for a connection.

Comment 3 John Mazzitelli 2011-10-05 20:18:24 UTC
this is one of the reasons why the ResourceError entity is in existence. If we go by the assumption we even know why the avail went down (I'm not convinced we always know why, but if we do) we can store the info in the ResourceError table - this shows up in the GUI (yes, some people complain they don't like HOW it shows up - the yellow warning triangle in the top right) but at least we can have that information. How we show it in the GUI is a minor issue.

Comment 4 Jay Shaughnessy 2012-02-28 20:05:06 UTC
This has not been targeted for the current avail work. More detail is
needed to understand how to support this.  Right now the getAvailability()
calls to the resource component have no way of passing back a message.
We could, add a second method to fetch an optional cause, or something like
that. It's not clear the component will be able to supply much info, and
we's also need to ensure we don't repeatedly send the same message.

Comment 5 Larry O'Leary 2012-09-19 19:05:53 UTC
Another example of this is when a resource availability scan timeout occurs. We get no log message or feedback that shows why this happened or that it even happened. Leaving the "UNAVAILABILE" resource state as a guessing game on why it is DOWN when the user can obviously see it is UP.

Comment 6 Charles Crouch 2012-09-27 14:00:42 UTC
Created attachment 618088 [details]
resources-errors.png

Comment 7 Charles Crouch 2012-09-27 14:05:15 UTC
Relating to comments 3 and 5 I've attached a screenshot (resource-errors.png) showing some of the information we send back from the agent today. We clearly send back information on some events, e.g. the type of the underlying server we are connecting to has changed. But as Larry points out there are still scenarios we are not sending back resource errors for. 

As mazz points out, there is probably a separate discussion around how/if we should improve how those resources errors are displayed/dismissed etc in the UI

Comment 8 Lukas Krejci 2012-10-15 10:32:25 UTC
Another piece of information I think would be useful is the "age" of the availability - e.g. we show a resource green or red based on an information from 5 minutes ago, which might no longer reflect the reality. I think it would be useful for the user to see the age of the most recent avail check on a resource, together, perhaps, with way of requesting a new avail check out of schedule.