Bug 966777

Summary: EAP 6 plug-in is using a hard-coded operation timeout for start and stop instead of using the operation timeout or agent's default operation timeout of 10 minutes
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: Plugin -- JBoss EAP 6Assignee: Thomas Segismont <tsegismo>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.1.2CC: hokuda, rhatlapa, skondkar, tsegismo
Target Milestone: ER01   
Target Release: JON 3.2.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-02 20:37:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Larry O'Leary 2013-05-23 22:59:50 UTC
Description of problem:
JBoss ON reports start/stop operation for EAP 6 as failed if it takes longer then 20 seconds to start or stop the server.

This results in inconsistent and incorrect results being reported for operations and those operations prematurely being aborted in the event that the user has explicitly set a timeout.

Version-Release number of selected component (if applicable):
4.4.0.JON312GA

How reproducible:
Always

Steps to Reproduce:
1.  Install JON system.
2.  Import EAP 6 standalone server into inventory.
3.  Shutdown EAP standalone resource.
4.  Modify standalone.sh to include a 60 second sleep after the export JAVA_HOME command:

        sed -i 's/^export JBOSS_HOME$/export JBOSS_HOME\nsleep 60/' "${JBOSS_HOME}/bin/standalone.sh"
        
5.  From JBoss ON, invoke the EAP resource's start operation.

Actual results:
The operation will report failure well before the default 10 minute operation timeout setting.

Expected results:
The operation should report success after approximately 1 minute.

Additional info:
As indicated by Marc in the support ticket, this is due to using hard-coded values in the BaseServerComponent.waitUntilDown() and BaseServerComponent.waitForServerToStart() methods. Specifically, we loop 20 times with a one second sleep between state checks and then error out.

Considering that there is a concept of an operation timeout utilized by the plug-in container and defined by the agent configuration with an option to override it when an operation is invoked, this seems like a major oversight. We should not be using hard coded configuration values anywhere in the code and especially when dealing with execution logic that is controlled by user configuration.

Comment 1 Thomas Segismont 2013-07-02 11:39:38 UTC
Fixed in master.

commit ba84865a0e0172cda10c1350fef054248c43ceed
Author: Thomas Segismont <tsegismo>
Date:   Thu Jun 27 16:09:58 2013 +0200

The component invocation handler now has a transferInterrupt parameter. If set to true, the component invocation thread will be interrupted when the caller thread is. 

Introduce ComponentInvocationContext class. An instance of this class is created by the plugin container and bound to facet-locked component invocation thread.  

Make BaseServerComponent use ComponentInvocationContext to deal with operation timeout or cancellation.

Comment 2 Thomas Segismont 2013-07-04 08:53:48 UTC
*** Bug 950448 has been marked as a duplicate of this bug. ***

Comment 3 Larry O'Leary 2013-09-06 14:33:48 UTC
As this is MODIFIED or ON_QA, setting milestone to ER1.

Comment 4 Larry O'Leary 2013-09-06 14:34:27 UTC
As this is MODIFIED or ON_QA, setting milestone to ER01.

Comment 5 Sunil Kondkar 2013-11-06 11:32:01 UTC
Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211

Followed the steps and verified that the start operation on EAP 6.1 standalone server shows success after approximately 1 minute.
Also verified that the reload and stop operations display success.