Bug 600543 - A failed group configuration update can leave member status as in progress
Summary: A failed group configuration update can leave member status as in progress
Alias: None
Product: RHQ Project
Classification: Other
Component: Configuration   
(Show other bugs)
Version: 1.4
Hardware: All Linux
high vote
Target Milestone: ---
: ---
Assignee: Jay Shaughnessy
QA Contact: Corey Welton
Depends On:
TreeView+ depends on / blocked
Reported: 2010-06-04 23:46 UTC by Jay Shaughnessy
Modified: 2010-08-12 16:46 UTC (History)
1 user (show)

Fixed In Version: 2.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-08-12 16:46:30 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 600365 None None None Never

Description Jay Shaughnessy 2010-06-04 23:46:54 UTC
Related to BZ 600635.

It deals with the fact that a completed (FAILED) group config update should never leave behind IN PROGRESS member updates.  This IN_PROGRESS state can mess up other stuff as it may appear that a specific resource update never completes.

To fix it looks like we need two things:

By the type an impl of AbstractGroupConfigurationUpdateJob.completeGroupConfiurationUpdate completes it should be ensured that no member updates are in the IN_PROGRESS state. Most likely this means they should be set FAILED.

Second, we need a recurring job to timeout member updates for group plugin config updates. This exists already for group resource config updates.

Comment 1 Jay Shaughnessy 2010-06-25 15:33:51 UTC
actually, the previous commentshould read BZ 600365...

Comment 2 Jay Shaughnessy 2010-06-25 19:10:01 UTC
The more I thought about this and discussed with others this sort of snowballed into a discussion of why plugin config updates are synchronous, and group updates sequrential, as opposed to asynch and parallel.  From discussions with jmarques it seems the reason for the synchronous approach may no longer apply, in short it was done for gui limitations and user experience reasons.

So, my real recommendation here is that we change plugin config update to be asynch, and as such change the impl to mirror resource config updates, marshalled with timeout logic applied by quartz jobs. This is now in RFE ???.

Making this change will prevent us from needing to do some fairly complex work to properly solve this issue, which would entail some questionable timeout logic (hard to guess what it should be given the sequential application to N resources), or possibly individual quartz jobs for each update.  There are server shutdown/crash/HA issues that make some of this non-trivial.

Anyway, with the fix for BZ 600386 it is unlikely that the group update will fail to launch completely. Instead, the main issue would likely be part 1, above.  This should be able to be solved with ensuring that all of the member updates are actually attempted. This is pretty easy fix and it's what I'll do as an interim solution until ??? is implemented.

Note, if the server goes down in the middle of a group plugin config update this issue may still occur.

Comment 3 Jay Shaughnessy 2010-06-25 19:17:58 UTC
see the RFE BZ 608135

Comment 4 Jay Shaughnessy 2010-06-28 13:41:06 UTC
One way to repro this:

1) Import at least two compatible resources with editable plugin config properties. AS Servers work, I think. You can edit something like shutdown method.

2) Create a compatible, non-recursive, group of the resources.

3) make sure the agents are not running (the resources are unavailable)

4) perform a group plugin config update.

These will fail because the agents are not able to be contacted.  The important thing is that when the group update gets set to failed that both of the resource updates also show FAILED and none are left INPROGRESS.

Comment 5 Sunil Kondkar 2010-06-30 06:25:55 UTC
Verified on JON 2.4 GA_QA build#44

Created a compatible group of JBoss As servers (eap5.0 and ewp 5.0) resources.
shut down the agents.
Navigated to Inventory-connection in compatible group created.
Edited and saved the setting for shutdown method from 'JMX Bean' to 'Shutdown Script'.
Observed that the history displays status 'Failure'. Clicking on 'View members updates' link displays the status as 'Failure' for both the members.

Comment 6 Corey Welton 2010-08-12 16:46:30 UTC
Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.