Bug 1232847 - Increase timeout for repair operation
Summary: Increase timeout for repair operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Storage Node
Version: JON 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ER02
: JON 3.3.4
Assignee: Libor Zoubek
QA Contact: Filip Brychta
URL:
Whiteboard:
Depends On: 1265309
Blocks: 1200594
TreeView+ depends on / blocked
 
Reported: 2015-06-17 15:55 UTC by John Sanda
Modified: 2015-11-17 00:05 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-28 14:36:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1232869 0 unspecified CLOSED Repair resource operation in storage node plugin should handle timeouts 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2015:1947 0 normal SHIPPED_LIVE Important: Red Hat JBoss Operations Network 3.3.4 update 2015-10-28 18:36:15 UTC

Internal Links: 1232869

Description John Sanda 2015-06-17 15:55:09 UTC
Description of problem:
Cluster wide read repair runs during the deploy/undeploy processes. The timeout for the repair operation is hard coded to 6 hours. This is done in the StorageNodeOperationHandlersBean.runRepair(Subject subject) method. There are times when the resource operation can take much longer to complete. In those situations, the resource operation times out, but repair continues to run. Then the user is left with no option other than retrying the deployment.

All resource operations have a timeout. If it not defined with the operation params or with the plugin meta data, then the plugin container default is used. Since repair can take an arbitrarily long time, we should set the timeout to something really high, like maybe a week.

It might also be good to make the timeout configurable. This would have to exposed through the storage cluster settings in some way. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Libor Zoubek 2015-09-22 17:38:46 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/d9a7ed8c6
time:    2015-09-22 19:37:01 +0200
commit:  d9a7ed8c64919d9252d5461c508256847f0c7b65
author:  Libor Zoubek - lzoubek
message: Bug 1232847 - Increase timeout for repair operation

         Increase timeout for all long running storage node operations
         to 7 days.

Comment 2 Michael Burman 2015-09-25 07:07:42 UTC
Cherry-picked to release/jon3.3.x:

commit a540faaf2d84be66dcddd6c7f533d6cb71e4ef9b
Author: Libor Zoubek <lzoubek>
Date:   Wed Sep 16 13:02:17 2015 +0200

    Bug 1232847 - Increase timeout for repair operation
    
    Increase timeout for all long running storage node operations to 7 days.
    
    (cherry picked from commit d9a7ed8c64919d9252d5461c508256847f0c7b65)

Comment 3 Libor Zoubek 2015-09-25 08:38:48 UTC
moving to ER1 as it was already cp-ed

Comment 4 Simeon Pinder 2015-10-09 04:40:18 UTC
Moving to ON_QA as available to test with the following build:
https://brewweb.devel.redhat.com/buildinfo?buildID=460382

 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-04.zip.

Comment 5 Simeon Pinder 2015-10-15 05:17:58 UTC
Moving target milestone to ER02 to retest after latest Cassandra changes.

Comment 6 Simeon Pinder 2015-10-15 05:22:35 UTC
Moving to ON_QA as available to test with the following build:
https://brewweb.devel.redhat.com//buildinfo?buildID=461043

 *Note: jon-server-patch-3.3.0.GA.zip maps to ER02 build of
 jon-server-3.3.0.GA-update-04.zip.

Comment 7 Filip Brychta 2015-10-22 06:25:40 UTC
Verified on:
Version :	
3.3.0.GA Update 04
Build Number :	
e9ed05b:aa79ebd

Repair operation started by CLI StorageNodeManager.runClusterMaintanance() is still in progress after 15 hours -> timeout is increased.

Comment 9 errata-xmlrpc 2015-10-28 14:36:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1947.html


Note You need to log in before you can comment on or make changes to this bug.