600543 – A failed group configuration update can leave member status as in progress

Bug 600543 - A failed group configuration update can leave member status as in progress

Summary: A failed group configuration update can leave member status as in progress

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Configuration
Sub Component:
Version:	1.4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jay Shaughnessy
QA Contact:	Corey Welton
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-06-04 23:46 UTC by Jay Shaughnessy
Modified:	2010-08-12 16:46 UTC (History)
CC List:	1 user (show)
Fixed In Version:	2.4
Clone Of:
Environment:
Last Closed:	2010-08-12 16:46:30 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	600365	0	urgent	CLOSED	Group config update failed, getting current state is stuck	2021-02-22 00:41:40 UTC

Description Jay Shaughnessy 2010-06-04 23:46:54 UTC

Related to BZ 600635.

It deals with the fact that a completed (FAILED) group config update should never leave behind IN PROGRESS member updates.  This IN_PROGRESS state can mess up other stuff as it may appear that a specific resource update never completes.

To fix it looks like we need two things:

By the type an impl of AbstractGroupConfigurationUpdateJob.completeGroupConfiurationUpdate completes it should be ensured that no member updates are in the IN_PROGRESS state. Most likely this means they should be set FAILED.

Second, we need a recurring job to timeout member updates for group plugin config updates. This exists already for group resource config updates.

Comment 1 Jay Shaughnessy 2010-06-25 15:33:51 UTC

actually, the previous commentshould read BZ 600365...

Comment 2 Jay Shaughnessy 2010-06-25 19:10:01 UTC

The more I thought about this and discussed with others this sort of snowballed into a discussion of why plugin config updates are synchronous, and group updates sequrential, as opposed to asynch and parallel.  From discussions with jmarques it seems the reason for the synchronous approach may no longer apply, in short it was done for gui limitations and user experience reasons.

So, my real recommendation here is that we change plugin config update to be asynch, and as such change the impl to mirror resource config updates, marshalled with timeout logic applied by quartz jobs. This is now in RFE ???.

Making this change will prevent us from needing to do some fairly complex work to properly solve this issue, which would entail some questionable timeout logic (hard to guess what it should be given the sequential application to N resources), or possibly individual quartz jobs for each update.  There are server shutdown/crash/HA issues that make some of this non-trivial.

Anyway, with the fix for BZ 600386 it is unlikely that the group update will fail to launch completely. Instead, the main issue would likely be part 1, above.  This should be able to be solved with ensuring that all of the member updates are actually attempted. This is pretty easy fix and it's what I'll do as an interim solution until ??? is implemented.

Note, if the server goes down in the middle of a group plugin config update this issue may still occur.

Comment 3 Jay Shaughnessy 2010-06-25 19:17:58 UTC

see the RFE BZ 608135

Comment 4 Jay Shaughnessy 2010-06-28 13:41:06 UTC

One way to repro this:

1) Import at least two compatible resources with editable plugin config properties. AS Servers work, I think. You can edit something like shutdown method.

2) Create a compatible, non-recursive, group of the resources.

3) make sure the agents are not running (the resources are unavailable)

4) perform a group plugin config update.

These will fail because the agents are not able to be contacted.  The important thing is that when the group update gets set to failed that both of the resource updates also show FAILED and none are left INPROGRESS.

Comment 5 Sunil Kondkar 2010-06-30 06:25:55 UTC

Verified on JON 2.4 GA_QA build#44

Created a compatible group of JBoss As servers (eap5.0 and ewp 5.0) resources.
shut down the agents.
Navigated to Inventory-connection in compatible group created.
Edited and saved the setting for shutdown method from 'JMX Bean' to 'Shutdown Script'.
Observed that the history displays status 'Failure'. Clicking on 'View members updates' link displays the status as 'Failure' for both the members.

Comment 6 Corey Welton 2010-08-12 16:46:30 UTC

Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.