Bug 697061 - group config change histories are broken when one fails
group config change histories are broken when one fails
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Core Server (Show other bugs)
4.0.0.Beta1
Unspecified Unspecified
high Severity high (vote)
: ---
: ---
Assigned To: Ian Springer
Corey Welton
:
Depends On:
Blocks: rhq4 rhq401
  Show dependency treegraph
 
Reported: 2011-04-15 13:39 EDT by John Mazzitelli
Modified: 2013-08-05 20:39 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2011-04-15 13:39:07 EDT
Description of problem:

If you have a group configuration change where one of the resources fails, the other config change histories never get out of "inprogress" but in reality they really did finish. The ones in my test that showed inprogress, I went over to the agent machine and actually looked at the resource true config and it was changed successfully.

How to reproduce:

create a group config change with multiple resources. Ensure one of them fails. Notice that some of the others are still in in progress (though if you check the actual resource's config, you'll see they did change).
Comment 1 Ian Springer 2011-04-18 13:01:08 EDT
I was unable to reproduce this. I created a compat group of DefaultDS datasources from three different AS6 server Resources. I then stopped one of the AS6 servers, and then edited the group config, made a change, and saved it. The group updated completed with a status of failure, and the members had the expected statuses - two were success and the one corresponding to the down server was failure. None of them were stuck in in-progress. 

Mazz, if you give more detailed reproduction steps, I can try again.
Comment 2 John Mazzitelli 2011-04-18 14:36:07 EDT
this appears not to be easily reproducible.

I just submitted another, and my group resource history shows INPROGRESS but all individual resources show end states (success or failure).

Jay S. tried it and saw the same.
Comment 3 John Mazzitelli 2011-04-18 14:37:57 EDT
possible problem - we may need ConfigurationManagerBean.executeResourceConfigUpdate to check to see if its part of a group and if so, check to see if all individual histories are done and if so, update the group status.
Comment 4 Ian Springer 2011-04-18 17:42:40 EDT
I tried again and reproduced. I think I didn't see it the first time, because the CheckForTimedOutConfigUpdatesJob reaper job, which runs every 10 minutes, happened to run very soon after I initiated the group update.

In any case, it is fixed now - [master c643d2b].
Comment 5 Sunil Kondkar 2011-05-11 06:52:11 EDT
Verified on rhq4 release build (Version: 4.0.0 Build Number: db0c817)

created a compatible group with multiple resources, stopped one of the resource and did a group configuration change. The group update completed with failure status, and the members display the expected status.

Marking as verified.
Comment 6 Ian Springer 2011-05-19 12:05:34 EDT
[master f9e768f]:

1) in ConfigurationManagerBean.executeResourceConfigurationUpdate(), if the remote call to an Agent to update a config fails, make sure to call checkForCompletedGroupResourceConfigurationUpdate() to update the status of the parent group config update, in the case that the update is part of a group update

2) in ConfigurationServerServiceImpl.persistUpdatedResourceConfiguration(), remove a LOG.isDebugEnabled() check that was erroneously preventing a return call from executing when not debug logging was not enabled
Comment 7 Charles Crouch 2011-05-19 18:27:29 EDT
Setting this to ON_QA so it can be retested, after ips also committed into release-4.0.0 branch:

http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commit;h=c2b21e1d990a0c4b6ada7cc2c3aa38950fda5baf
Comment 8 Sunil Kondkar 2011-05-20 09:49:11 EDT
Verified on build#38 (Version: 4.0.1-SNAPSHOT Build Number: a104cdf)

created a compatible group with two rhq agents. Stopped one agent and navigated to the compatible group. When clicked on the 'Configuration' tab, it displayed below message in UI and the tab does not display the configuration details.


Failed to retrieve member Resource configurations for [ResourceGroup[id=10031, name=Group-Agent, category=COMPATIBLE, type=RHQ Agent, isDynaGroup=false, isClusterGroup=false]]

The server log displays:

2011-05-20 19:02:23,730 WARN  [gwt-log] Sending exception to client: [1305898343730] 
java.lang.Exception: Current group Resource configuration for 10031 cannot be calculated, because one or more of this group's member Resources are DOWN.
	at org.rhq.enterprise.server.configuration.ConfigurationManagerBean.getResourceConfigurationsForCompatibleGroup(ConfigurationManagerBean.java:560)
	at sun.reflect.GeneratedMethodAccessor1251.invoke(Unknown Source)


It is throwing exception and does not allow us to edit anything if one of the group member resource is down which is working like expected and so marking it verified.
Comment 9 Ian Springer 2011-05-20 22:15:09 EDT
This can be verified as follows:

1) make sure both the RHQ Agent resources in the group are UP
2) go to the group's Configuration>Current tab and wait for the page to fully load
3) stop the Agent corresponding to one of the RHQ Agent resources
4) wait a minute or so, then click the Save button to save the group config
5) from the History subtab, verify that the member update fails for the Agent that is down and that the group update also fails

Pushing back to ON_QA.
Comment 10 Sunil Kondkar 2011-05-23 05:02:55 EDT
Verified on Version: 4.0.1 Build Number: ecd91b2

Created a group of RHQ Agents. Navigated to group's Configuration>Current tab and waited for the page to fully load.
Stopped one of the member agents, made a change in group config and saved. The group config history shows the failed status and the view member history page displays one success status for the agent which is up and failed for the down agent.

Marking as verified.
Comment 11 Corey Welton 2011-05-23 21:14:41 EDT
Bookkeeping - closing bug - fixed in recent release.
Comment 12 Corey Welton 2011-05-23 21:14:42 EDT
Bookkeeping - closing bug - fixed in recent release.
Comment 13 Corey Welton 2011-05-23 21:14:42 EDT
Bookkeeping - closing bug - fixed in recent release.

Note You need to log in before you can comment on or make changes to this bug.