Bug 966777 - EAP 6 plug-in is using a hard-coded operation timeout for start and stop instead of using the operation timeout or agent's default operation timeout of 10 minutes
EAP 6 plug-in is using a hard-coded operation timeout for start and stop inst...
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: Plugin -- JBoss EAP 6 (Show other bugs)
JON 3.1.2
All All
unspecified Severity high
: ER01
: JON 3.2.0
Assigned To: Thomas Segismont
Mike Foley
:
: 950448 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-23 18:59 EDT by Larry O'Leary
Modified: 2016-01-06 17:54 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-02 15:37:07 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 380603 None None None Never

  None (edit)
Description Larry O'Leary 2013-05-23 18:59:50 EDT
Description of problem:
JBoss ON reports start/stop operation for EAP 6 as failed if it takes longer then 20 seconds to start or stop the server.

This results in inconsistent and incorrect results being reported for operations and those operations prematurely being aborted in the event that the user has explicitly set a timeout.

Version-Release number of selected component (if applicable):
4.4.0.JON312GA

How reproducible:
Always

Steps to Reproduce:
1.  Install JON system.
2.  Import EAP 6 standalone server into inventory.
3.  Shutdown EAP standalone resource.
4.  Modify standalone.sh to include a 60 second sleep after the export JAVA_HOME command:

        sed -i 's/^export JBOSS_HOME$/export JBOSS_HOME\nsleep 60/' "${JBOSS_HOME}/bin/standalone.sh"
        
5.  From JBoss ON, invoke the EAP resource's start operation.

Actual results:
The operation will report failure well before the default 10 minute operation timeout setting.

Expected results:
The operation should report success after approximately 1 minute.

Additional info:
As indicated by Marc in the support ticket, this is due to using hard-coded values in the BaseServerComponent.waitUntilDown() and BaseServerComponent.waitForServerToStart() methods. Specifically, we loop 20 times with a one second sleep between state checks and then error out.

Considering that there is a concept of an operation timeout utilized by the plug-in container and defined by the agent configuration with an option to override it when an operation is invoked, this seems like a major oversight. We should not be using hard coded configuration values anywhere in the code and especially when dealing with execution logic that is controlled by user configuration.
Comment 1 Thomas Segismont 2013-07-02 07:39:38 EDT
Fixed in master.

commit ba84865a0e0172cda10c1350fef054248c43ceed
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Thu Jun 27 16:09:58 2013 +0200

The component invocation handler now has a transferInterrupt parameter. If set to true, the component invocation thread will be interrupted when the caller thread is. 

Introduce ComponentInvocationContext class. An instance of this class is created by the plugin container and bound to facet-locked component invocation thread.  

Make BaseServerComponent use ComponentInvocationContext to deal with operation timeout or cancellation.
Comment 2 Thomas Segismont 2013-07-04 04:53:48 EDT
*** Bug 950448 has been marked as a duplicate of this bug. ***
Comment 3 Larry O'Leary 2013-09-06 10:33:48 EDT
As this is MODIFIED or ON_QA, setting milestone to ER1.
Comment 4 Larry O'Leary 2013-09-06 10:34:27 EDT
As this is MODIFIED or ON_QA, setting milestone to ER01.
Comment 5 Sunil Kondkar 2013-11-06 06:32:01 EST
Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211

Followed the steps and verified that the start operation on EAP 6.1 standalone server shows success after approximately 1 minute.
Also verified that the reload and stop operations display success.

Note You need to log in before you can comment on or make changes to this bug.