Bug 854805
| Summary: | Tracker: UI visual feedback on interactions of distributed components | ||
|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | Charles Crouch <ccrouch> |
| Component: | No Component | Assignee: | Nobody <nobody> |
| Status: | NEW --- | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | unspecified | CC: | hrupp |
| Target Milestone: | --- | Keywords: | Tracking |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 743727, 801153, 858826, 871936, 854807 | ||
| Bug Blocks: | |||
|
Description
Charles Crouch
2012-09-06 02:56:34 UTC
Changed title a little to remove the notion these issues are only related to latency. https://bugzilla.redhat.com/show_bug.cgi?id=743727 is a good example of an issue where better UI feedback is needed to help the user understand what they need to fix with their system. While not directly related to this, another important aspect of the "feedback loop" to the UI is the fact that the user has no clue about the load the RHQ server and individual agents are under and thus cannot anticipate the latency of the individual operations (which are the "things" this BZ is actually about). The tracker bug 855744 deals with that problem and should include the following areas: 1) Data purge job duration - every hour the RHQ server is doing data compaction and purge for measurement tables, etc. Having an indication of how busy this job is (i.e. the percentage of the hour the job has until another one kicks in) would a great indicator for the user on how is the RHQ server able to keep up with the inflow of the data the agents are generating. 2) A number of subsystems in the RHQ agent run different jobs on a schedule. As with the above, the user should be given an indication of how "saturated" these schedules are and thus how the agent is keeping up with the work laid upon it (discovery, availability, measurement, configuration, event, content subsystems - all of them run different kinds of schedules). The big challenge is that we currently lack a way to report errors, status updates, etc. in threads that are *not* processing UI requests. Here are some examples. When resources are imported into inventory, a quartz job is scheduled to periodically send the updated inventory status of resources to respective agents. This is part of the inventory sync work flow between server and agent. If any kind of error occurs, we write it out to the server log but have nowhere to report it in the UI. Another example is installing or updating a plugin at server start up. If the installation/upgrade fails at server start up, we have nowhere in the UI to report it. It is actually possible for the plugin to appear installed without any of its meta data actually being the database. This can and does lead to hard to debug situations for users. I should point out that there are plenty of things that happen in non-UI request threads that are reported in the UI. For example, when a user schedules a resource operation, that operation is sent to the agent and executed asynchronously. When the operation completes the agent sends back the results which can be viewed in the operation history for that resource. We other similar audit trails like resource configuration history, bundle deployment history, etc. Instead of all these separate audit trails, we need a global audit trail where events can be reported. Errors that happen outside of a UI request/response can be reported there and we can provide a place (or places) in the UI where that info can be viewed. This audit trail should also not be tied to individual resources so that valuable information is not lost when resources are removed from inventory. As an aside, this could be a very good fit for our new metrics database. (In reply to comment #3) One thing that we need to keep in mind is the difference between events/health/audit data about the JON system and that same data for the customers environment. IMHO they should be clearly separated, only JON admins should really care about the former, but all regular JON users should care about the latter. An audit trail for the JON system itself can be logically separated from resource audit trails, but I do not see a compelling reason for physical, implementation-level separation like we have today. We different classes and tables to represent the same type of data. All we really need is a filtering mechanism. |