1105742 – Storage node can get stuck in MAINTENANCE mode if cluster maintenance is executed when one or more agents are unavailable or restarting

Bug 1105742 - Storage node can get stuck in MAINTENANCE mode if cluster maintenance is executed when one or more agents are unavailable or restarting

Summary: Storage node can get stuck in MAINTENANCE mode if cluster maintenance is exec...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Storage Node
Sub Component:
Version:	JON 3.2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	JON 4.0.0
Assignee:	Michael Burman
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:	1120418
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-07 00:01 UTC by Larry O'Leary
Modified:	2019-08-05 14:50 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-08-05 14:50:55 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1105743	0	unspecified	CLOSED	Viewing operation details for operations with no results causes Globally uncaught exception in UI - java.lang.IllegalArg...	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	911393	0	None	None	None	Never

Internal Links: 1105743

Description Larry O'Leary 2014-06-07 00:01:44 UTC

Description of problem:
Weekly storage cluster maintenance or running storage cluster maintenance manually can put a storage node into what appears to be an unrecoverable maintenance state if its agent is offline due to a synchronized restart.

For example, if agent processes are auto-restarted once a week, this can potentially correspond with the storage cluster auto maintenance job which also runs once a week. The result is that the storage node is left in an operation mode of MAINTENANCE and therefore has its cluster status reported as DOWN.

Version-Release number of selected component (if applicable):
3.2.1

How reproducible:
Always

Steps to Reproduce:
1. Install and start JBoss ON 3.2 system.
2. Install a second agent and storage node.
3. Verify that both storage nodes are in inventory and are UP/NORMAL.
4. Shutdown one of the agents.
5. Invoke the following JBoss ON CLI command:

./rhq-cli.sh -u rhqadmin -p rhqadmin -c 'StorageNodeManager.runClusterMaintenance()'

Actual results:
Storage node running on the agent that was unavailable has its cluster status reported as DOWN and its operation mode indicates MAINTENANCE.

Expected results:
No error or bad state associated with the node.

Additional info:
Although it is understandable that maintenance can not complete while the agent is unavailable, this situation should be temporary. As soon as the agent comes back online, cluster maintenance should continue. In other words, the error state should only be reported/reflected while the agent is down.

The fact that the node is stuck in MAINTENANCE also seems to indicate a desegregated cluster. Auto maintenance shouldn't cause such situations.

Comment 1 John Sanda 2014-08-29 12:20:38 UTC

Bumping the target release due to time constraints. Work has been started though in the storage_workflow branch.

Comment 3 Filip Brychta 2019-08-05 14:50:55 UTC

JBoss ON is coming to the end of its product life cycle. For more information regarding this transition, see https://access.redhat.com/articles/3827121.
This bug report/request is being closed. If you feel this issue should not be closed or requires further review, please create a new bug report against the latest supported JBoss ON 3.3 version.

Note You need to log in before you can comment on or make changes to this bug.