Bug 966777 - EAP 6 plug-in is using a hard-coded operation timeout for start and stop instead of using the operation timeout or agent's default operation timeout of 10 minutes
Summary: EAP 6 plug-in is using a hard-coded operation timeout for start and stop inst...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Plugin -- JBoss EAP 6
Version: JON 3.1.2
Hardware: All
OS: All
unspecified
high
Target Milestone: ER01
: JON 3.2.0
Assignee: Thomas Segismont
QA Contact: Mike Foley
URL:
Whiteboard:
: 950448 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-23 22:59 UTC by Larry O'Leary
Modified: 2019-09-12 07:44 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-01-02 20:37:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 982804 0 unspecified CLOSED AS7 reload behavior includes hardcoded timeout of 20 seconds 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 380603 0 None None None Never

Internal Links: 982804

Description Larry O'Leary 2013-05-23 22:59:50 UTC
Description of problem:
JBoss ON reports start/stop operation for EAP 6 as failed if it takes longer then 20 seconds to start or stop the server.

This results in inconsistent and incorrect results being reported for operations and those operations prematurely being aborted in the event that the user has explicitly set a timeout.

Version-Release number of selected component (if applicable):
4.4.0.JON312GA

How reproducible:
Always

Steps to Reproduce:
1.  Install JON system.
2.  Import EAP 6 standalone server into inventory.
3.  Shutdown EAP standalone resource.
4.  Modify standalone.sh to include a 60 second sleep after the export JAVA_HOME command:

        sed -i 's/^export JBOSS_HOME$/export JBOSS_HOME\nsleep 60/' "${JBOSS_HOME}/bin/standalone.sh"
        
5.  From JBoss ON, invoke the EAP resource's start operation.

Actual results:
The operation will report failure well before the default 10 minute operation timeout setting.

Expected results:
The operation should report success after approximately 1 minute.

Additional info:
As indicated by Marc in the support ticket, this is due to using hard-coded values in the BaseServerComponent.waitUntilDown() and BaseServerComponent.waitForServerToStart() methods. Specifically, we loop 20 times with a one second sleep between state checks and then error out.

Considering that there is a concept of an operation timeout utilized by the plug-in container and defined by the agent configuration with an option to override it when an operation is invoked, this seems like a major oversight. We should not be using hard coded configuration values anywhere in the code and especially when dealing with execution logic that is controlled by user configuration.

Comment 1 Thomas Segismont 2013-07-02 11:39:38 UTC
Fixed in master.

commit ba84865a0e0172cda10c1350fef054248c43ceed
Author: Thomas Segismont <tsegismo>
Date:   Thu Jun 27 16:09:58 2013 +0200

The component invocation handler now has a transferInterrupt parameter. If set to true, the component invocation thread will be interrupted when the caller thread is. 

Introduce ComponentInvocationContext class. An instance of this class is created by the plugin container and bound to facet-locked component invocation thread.  

Make BaseServerComponent use ComponentInvocationContext to deal with operation timeout or cancellation.

Comment 2 Thomas Segismont 2013-07-04 08:53:48 UTC
*** Bug 950448 has been marked as a duplicate of this bug. ***

Comment 3 Larry O'Leary 2013-09-06 14:33:48 UTC
As this is MODIFIED or ON_QA, setting milestone to ER1.

Comment 4 Larry O'Leary 2013-09-06 14:34:27 UTC
As this is MODIFIED or ON_QA, setting milestone to ER01.

Comment 5 Sunil Kondkar 2013-11-06 11:32:01 UTC
Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211

Followed the steps and verified that the start operation on EAP 6.1 standalone server shows success after approximately 1 minute.
Also verified that the reload and stop operations display success.


Note You need to log in before you can comment on or make changes to this bug.