1232847 – Increase timeout for repair operation

Bug 1232847 - Increase timeout for repair operation

Summary: Increase timeout for repair operation

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Storage Node
Sub Component:
Version:	JON 3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ER02
Target Release:	JON 3.3.4
Assignee:	Libor Zoubek
QA Contact:	Filip Brychta
Docs Contact:
URL:
Whiteboard:
Depends On:	1265309
Blocks:	1200594
TreeView+	depends on / blocked

Reported:	2015-06-17 15:55 UTC by John Sanda
Modified:	2015-11-17 00:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-10-28 14:36:43 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1232869	0	unspecified	CLOSED	Repair resource operation in storage node plugin should handle timeouts	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2015:1947	0	normal	SHIPPED_LIVE	Important: Red Hat JBoss Operations Network 3.3.4 update	2015-10-28 18:36:15 UTC

Internal Links: 1232869

Description John Sanda 2015-06-17 15:55:09 UTC

Description of problem:
Cluster wide read repair runs during the deploy/undeploy processes. The timeout for the repair operation is hard coded to 6 hours. This is done in the StorageNodeOperationHandlersBean.runRepair(Subject subject) method. There are times when the resource operation can take much longer to complete. In those situations, the resource operation times out, but repair continues to run. Then the user is left with no option other than retrying the deployment.

All resource operations have a timeout. If it not defined with the operation params or with the plugin meta data, then the plugin container default is used. Since repair can take an arbitrarily long time, we should set the timeout to something really high, like maybe a week.

It might also be good to make the timeout configurable. This would have to exposed through the storage cluster settings in some way. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Libor Zoubek 2015-09-22 17:38:46 UTC

branch:  master
link:    https://github.com/rhq-project/rhq/commit/d9a7ed8c6
time:    2015-09-22 19:37:01 +0200
commit:  d9a7ed8c64919d9252d5461c508256847f0c7b65
author:  Libor Zoubek - lzoubek
message: Bug 1232847 - Increase timeout for repair operation

         Increase timeout for all long running storage node operations
         to 7 days.

Comment 2 Michael Burman 2015-09-25 07:07:42 UTC

Cherry-picked to release/jon3.3.x:

commit a540faaf2d84be66dcddd6c7f533d6cb71e4ef9b
Author: Libor Zoubek <lzoubek>
Date:   Wed Sep 16 13:02:17 2015 +0200

    Bug 1232847 - Increase timeout for repair operation
    
    Increase timeout for all long running storage node operations to 7 days.
    
    (cherry picked from commit d9a7ed8c64919d9252d5461c508256847f0c7b65)

Comment 3 Libor Zoubek 2015-09-25 08:38:48 UTC

moving to ER1 as it was already cp-ed

Comment 4 Simeon Pinder 2015-10-09 04:40:18 UTC

Moving to ON_QA as available to test with the following build:
https://brewweb.devel.redhat.com/buildinfo?buildID=460382

 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-04.zip.

Comment 5 Simeon Pinder 2015-10-15 05:17:58 UTC

Moving target milestone to ER02 to retest after latest Cassandra changes.

Comment 6 Simeon Pinder 2015-10-15 05:22:35 UTC

Moving to ON_QA as available to test with the following build:
https://brewweb.devel.redhat.com//buildinfo?buildID=461043

 *Note: jon-server-patch-3.3.0.GA.zip maps to ER02 build of
 jon-server-3.3.0.GA-update-04.zip.

Comment 7 Filip Brychta 2015-10-22 06:25:40 UTC

Verified on:
Version :	
3.3.0.GA Update 04
Build Number :	
e9ed05b:aa79ebd

Repair operation started by CLI StorageNodeManager.runClusterMaintanance() is still in progress after 15 hours -> timeout is increased.

Comment 9 errata-xmlrpc 2015-10-28 14:36:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1947.html

Note You need to log in before you can comment on or make changes to this bug.