Red Hat Bugzilla – Bug 743727
Improve availability reporting to users
Last modified: 2018-01-30 12:56:56 EST
This bug is around improving the level of information JON returns to the user when it shows that a resource is down.
When a user sees that a resource is red (or unknown if we've never collected information on it before) it would often be helpful for the user to also see a message about *why* the resource is red. Today we support this to some extent, by indicating on the single resource view if there is a connection error when initially trying to contact to the resource, but this could be extended:
a) Avail icon in resource *list* of JON UI simply shows UNAVAIL when the connection to the resource fails. Right now there is no message/user-feedback in such a condition.
b) Make sure that as the ability to connect to a resource changes, e.g. someone changes the underlying remote jmx password, that we propogate this back to the user. Similarly as soon as the connection is the resource is re-estabilished this error should stop be displayed. Possibly add a reason to the availability report in the plug-in API.
c) All messages need to be user-friendly and as specific as possible about whats going wrong and possible options to fix, i.e. we should not be displaying stacktraces
More notes from Larry
The end goal should be that when an availability check results in a failure that a user-friendly error message be logged and the message be passed back to the server UI.
Give user visual feedback in the JON UI regarding connection failures
When connection failures occur, the user needs a clear description of why the failure occurred and some suggestions on how to fix the issue.
Network connectivity issues: Invalid host name, connection refused, no route to host, connection timed out, etc.
Credential failures: Invalid user name, invalid password, invalid security domain
Messages should be clear and require the interpretation of the stack trace or exception class name. The users are not Java developers.
Users may not even know what a JNP URL is or where a Principal is defined for a connection.
this is one of the reasons why the ResourceError entity is in existence. If we go by the assumption we even know why the avail went down (I'm not convinced we always know why, but if we do) we can store the info in the ResourceError table - this shows up in the GUI (yes, some people complain they don't like HOW it shows up - the yellow warning triangle in the top right) but at least we can have that information. How we show it in the GUI is a minor issue.
This has not been targeted for the current avail work. More detail is
needed to understand how to support this. Right now the getAvailability()
calls to the resource component have no way of passing back a message.
We could, add a second method to fetch an optional cause, or something like
that. It's not clear the component will be able to supply much info, and
we's also need to ensure we don't repeatedly send the same message.
Another example of this is when a resource availability scan timeout occurs. We get no log message or feedback that shows why this happened or that it even happened. Leaving the "UNAVAILABILE" resource state as a guessing game on why it is DOWN when the user can obviously see it is UP.
Created attachment 618088 [details]
Relating to comments 3 and 5 I've attached a screenshot (resource-errors.png) showing some of the information we send back from the agent today. We clearly send back information on some events, e.g. the type of the underlying server we are connecting to has changed. But as Larry points out there are still scenarios we are not sending back resource errors for.
As mazz points out, there is probably a separate discussion around how/if we should improve how those resources errors are displayed/dismissed etc in the UI
Another piece of information I think would be useful is the "age" of the availability - e.g. we show a resource green or red based on an information from 5 minutes ago, which might no longer reflect the reality. I think it would be useful for the user to see the age of the most recent avail check on a resource, together, perhaps, with way of requesting a new avail check out of schedule.