Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1232847

Summary: Increase timeout for repair operation
Product: [JBoss] JBoss Operations Network Reporter: John Sanda <jsanda>
Component: Storage NodeAssignee: Libor Zoubek <lzoubek>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: medium Docs Contact:
Priority: unspecified    
Version: JON 3.3.0CC: fbrychta, lzoubek, miburman, mmahoney, theute
Target Milestone: ER02Keywords: Triaged
Target Release: JON 3.3.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-28 14:36:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1265309    
Bug Blocks: 1200594    

Description John Sanda 2015-06-17 15:55:09 UTC
Description of problem:
Cluster wide read repair runs during the deploy/undeploy processes. The timeout for the repair operation is hard coded to 6 hours. This is done in the StorageNodeOperationHandlersBean.runRepair(Subject subject) method. There are times when the resource operation can take much longer to complete. In those situations, the resource operation times out, but repair continues to run. Then the user is left with no option other than retrying the deployment.

All resource operations have a timeout. If it not defined with the operation params or with the plugin meta data, then the plugin container default is used. Since repair can take an arbitrarily long time, we should set the timeout to something really high, like maybe a week.

It might also be good to make the timeout configurable. This would have to exposed through the storage cluster settings in some way. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Libor Zoubek 2015-09-22 17:38:46 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/d9a7ed8c6
time:    2015-09-22 19:37:01 +0200
commit:  d9a7ed8c64919d9252d5461c508256847f0c7b65
author:  Libor Zoubek - lzoubek
message: Bug 1232847 - Increase timeout for repair operation

         Increase timeout for all long running storage node operations
         to 7 days.

Comment 2 Michael Burman 2015-09-25 07:07:42 UTC
Cherry-picked to release/jon3.3.x:

commit a540faaf2d84be66dcddd6c7f533d6cb71e4ef9b
Author: Libor Zoubek <lzoubek>
Date:   Wed Sep 16 13:02:17 2015 +0200

    Bug 1232847 - Increase timeout for repair operation
    
    Increase timeout for all long running storage node operations to 7 days.
    
    (cherry picked from commit d9a7ed8c64919d9252d5461c508256847f0c7b65)

Comment 3 Libor Zoubek 2015-09-25 08:38:48 UTC
moving to ER1 as it was already cp-ed

Comment 4 Simeon Pinder 2015-10-09 04:40:18 UTC
Moving to ON_QA as available to test with the following build:
https://brewweb.devel.redhat.com/buildinfo?buildID=460382

 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-04.zip.

Comment 5 Simeon Pinder 2015-10-15 05:17:58 UTC
Moving target milestone to ER02 to retest after latest Cassandra changes.

Comment 6 Simeon Pinder 2015-10-15 05:22:35 UTC
Moving to ON_QA as available to test with the following build:
https://brewweb.devel.redhat.com//buildinfo?buildID=461043

 *Note: jon-server-patch-3.3.0.GA.zip maps to ER02 build of
 jon-server-3.3.0.GA-update-04.zip.

Comment 7 Filip Brychta 2015-10-22 06:25:40 UTC
Verified on:
Version :	
3.3.0.GA Update 04
Build Number :	
e9ed05b:aa79ebd

Repair operation started by CLI StorageNodeManager.runClusterMaintanance() is still in progress after 15 hours -> timeout is increased.

Comment 9 errata-xmlrpc 2015-10-28 14:36:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1947.html