Bug 534891 (RHQ-1643) - figure out a better way to load the current member resource configs that make up a group (for group Configure>Current subtab)
Summary: figure out a better way to load the current member resource configs that make...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: RHQ-1643
Product: RHQ Project
Classification: Other
Component: Configuration
Version: unspecified
Hardware: All
OS: All
high
medium
Target Milestone: ---
: ---
Assignee: Ian Springer
QA Contact:
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks: RHQ-1386
TreeView+ depends on / blocked
 
Reported: 2009-02-23 17:56 UTC by Ian Springer
Modified: 2013-08-06 00:32 UTC (History)
1 user (show)

Fixed In Version: 1.2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Ian Springer 2009-02-23 17:56:00 UTC
The current implementation of the Configure>Current subtab just loads the latest member resource configs that are persisted to the DB. This means the configs may very well not reflect the latest "live" configs from the managed resources. Also, if no "live" configs have yet been requested from the Agent, the configs will all come back null. After a fresh inventory import, it seems that "live" configs are not immediately reported by the Agent; perhaps they would get reported after an hour when the first managed resource config-change-check job runs. To force the "live" configs to be collected sooner than this it's necessary to visit the Configure>Current tab for each of the member resources.

Here's what we currently do when loading the config to be displayed on the resource Configure>Current subtab (see ConfigurationManagerBean.getLatestResourceConfigurationUpdate()):

1) lookup the current resource config persisted in the DB (this may be null if no resource config has been obtained from the Agent yet)
2) check if a resource config update is currently in progress. if so, return the DB persisted config from step 1. if not, proceed to step 3.
3) ask the Agent (if it's running) for the "live" config
4) check if the "live" config is different than the DB-persisted config from step 1 (using Configuration.equals()). if so, persist the "live" config as the current resource config and as part of a new config update and return it. if not, return the DB-persisted config.

Doing the same thing for group configs is risky, especially for large groups, because steps 3 and 4 could cause an unacceptably long delay in the GUI (i.e. while the user waits for the group Configure>Current subtab to load). It also could present some concurrency challenges.

Here are some ideas for how to attack this:

A) let the user wait however long it takes to load the member configs using the individual resource algorithm above - display a nice message and progress meter to let them know gears are turning. 
  1) One possible way to speed up the loading of live configs and comparison with current DB-persisted configs would be to compute the hashCode for the DB-persisted config and send that to the Agent. Agent computes the live config and then only sends it back to the server if its hashCode is different than the server-side hashCode
B) just load the DB-persisted member configs as we do today, but make sure the config-update-check gets kicked off immediately for newly committed resources and remove the option to disable the config update check; also, possibly make the check run more often than once an hour by default; this way, we know the DB-persisted config will be a fairly recent snapshot and will be non-null for recently committed resources
  1) On way to potentially improve the config update check would be to add a new polling API that plugin resource components can implement in order to have the PC periodically kick off a polling method that checks if the config has changed since the last poll (e.g. by checking the mtime on the underlying config file). This would be similar to the event polling plugin API.
C) like B, load the DB-persisted configs initially, but provide a button for the user to press to attempt to load the latest "live" configs (probably async)

A and C would also both need to address the fact that some Agents may be down or take too long to respond to the request for the live config. In this case, should we:
1) cause the whole group config load to fail?
2) display only the member configs we successfully loaded 
3) same as 2, but also indicate which configs failed to load and are thus missing from the aggregate and drill-down views

We need to agree on the best way to do this. I'd appreciate feedback on the ideas above, as well as additional suggestions.


Comment 1 Joseph Marques 2009-02-23 19:20:51 UTC
another issue that is common to A/B/C solutions is how we would handle errors.  if we for some reason can not initialize one (or more) of the individual configs, what would the UI look like?  what if they are all initialized, but one or more of the agents are currently down...so we can't get the live config, and we know the group update will have at least one failure.

Comment 2 Heiko W. Rupp 2009-02-23 19:57:30 UTC
Could we when the user wants to configure trigger a fetching of the config in background asynchronously and inform the user when 'the data is ready' ?
Of course we still need to decide what to do when an agent is down, but at least the user would not be blocked forever.

Comment 3 Ian Springer 2009-02-24 18:59:51 UTC
After discussing with the team we decided to go with option A, with the following additions:

1) check if current avails for all group members are UP; if not, abort with an error 
2) do all the requests to Agents for live configs in a thread with a timeout of 20-30 seconds; if it times out or any of the requests fail (e.g. because an Agent is down), abort with an error

NOTE: In the future, we should add a new client-side API for config updates that will send down the server-last-known config and the new config, and compare it to the live config, and only update it if the server-last-known config is identical to the live. This will allow us to skip the live config requests and comparisons at config load time, though we will still need to request the live configs for any resources whose current configs are still null in the DB.


Comment 4 Ian Springer 2009-02-25 19:58:40 UTC
Done - r3196. Note, the group config load will also fail if:

a) the group contains more than 100 members
or
b) there are config updates in progress for the group or any of its members


Comment 5 Red Hat Bugzilla 2009-11-10 20:37:22 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1643



Note You need to log in before you can comment on or make changes to this bug.