Bug 1166383

Summary: JBoss Remoting SSL transport fails when performing streaming due to stale connections
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: Communications SubsystemAssignee: John Mazzitelli <mazz>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: fbrychta, gbonocor, miburman, mmahoney
Target Milestone: ER01Keywords: Regression
Target Release: JON 3.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1175851 (view as bug list) Environment:
Last Closed: 2015-02-27 19:58:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1176183    
Bug Blocks: 1175851    
Attachments:
Description Flags
patch to add new transport param none

Description Larry O'Leary 2014-11-20 22:44:08 UTC
Description of problem:
If the remote connection between the JBoss ON server and agent is using SSL and it has been idle for 1 minute, communication between the JON server and agent will fail on a subsequent request. For example, when deploying a resource bundle.

This is due to certain SSLExceptions being interpreted as non-recoverable conditions.

Version-Release number of selected component (if applicable):
3.2.3

How reproducible:
Always

Steps to Reproduce:
1.  Install JBoss ON 3.1.2 system.
2.  Configure agent/server SSL encryption.
3.  Start JBoss ON system.
4.  Import platform resource into inventory.
5.  Add platform resource to resource group.
6.  Create helloworld-bundle bundle.
7.  Check agent to ensure server is not currently connected to agent:

        _count=0; while true; do netstat -anpt | grep 16163; sleep 2s; _count=$(($_count+2)); echo "$_count seconds"; done

8.  Invoke the *View Process List* platform operation.
9.  Wait about a minute and check to see that the server's network socket to the agent is in the state *CLOSE_WAIT*.
10. Deploy helloworld-bundle to platform resource group.

Actual results:
Bundle fails to deploy and server.log includes the following error:

    ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] (http-/0.0.0.0:7080-5) {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.security-token=EhtA72he/fcxoEpuDEQITmL04UgkR4+Jvmaqz9vcYSE3+8D9XOkC4HnvW/uKbSeMi8Y=, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[schedule], targetInterfaceName=org.rhq.core.clientapi.agent.bundle.BundleAgentService}]]. Cause: org.jboss.remoting.InvocationFailureException:Unable to perform invocation; nested exception is: 
        javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe -> javax.net.ssl.SSLException:Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe -> javax.net.ssl.SSLException:java.net.SocketException: Broken pipe -> java.net.SocketException:Broken pipe. Cause: org.jboss.remoting.InvocationFailureException: Unable to perform invocation; nested exception is: 
        javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe

Expected results:
No errors and bundle gets deployed.

Additional Info:
This issue was fixed in JBoss Remoting 2.5.4 as identified in https://issues.jboss.org/browse/JBREM-1245.

Although the JBoss ON is using this version of JBoss Remoting -- 2.5.4.SP5 as of JBoss ON 3.2 -- the fix provided in JBREM-1245 also required the remoting transport parameter generalizeSocketException to be set to true.

I recommend that we add generalizeSocketException=true to the remoting transport-params if it isn't explicitly defined by user provided configuration.

Comment 2 John Mazzitelli 2014-11-21 19:08:14 UTC
Created attachment 959926 [details]
patch to add new transport param

attaching patch that should fix this. Have not done replication procedures to test, but I have run normally (using non-secure endpoints in server and agent) and it worked. So at least the defaults didn't break anything. Also added two unit-tests to make sure the transport params get this new param added properly if it wasn't specified by the user already.

Comment 3 Larry O'Leary 2014-11-22 00:03:13 UTC
Mazz, I built your patch and tested it locally with the reproducer steps mentioned above and this seems to resolve the issue.

I also tested overriding the value (i.e. setting it to false) and that too works as expected. Thank you very much.

I would say this should be good to get into master and queued up for a cherry-pick into JBoss ON 3.3.1 when we are ready.

Comment 4 John Mazzitelli 2014-11-22 00:08:46 UTC
committed to master branch:

commit 2a2ffa4dc443f0064365e7ef56deee4b9e3c688d
Author: John Mazzitelli <mazz>
Date:   Fri Nov 21 19:07:43 2014 -0500

    BZ 1166383 - ensure a transport param is added to the remoting locator URL

Comment 5 Giuseppe Bonocore 2014-12-02 12:37:19 UTC
Hello, there is any possibility to get this fix in JON 3.2 ?

Comment 7 Larry O'Leary 2014-12-03 01:40:06 UTC
The fix for this is already available for 3.2 by applying the configuration update mentioned in the solution 511833[1]. If you need further assistance, please contact Red Hat Global Support Services.

[1]: https://access.redhat.com/solutions/511833

Comment 9 Michael Burman 2015-01-15 08:40:14 UTC
Cherry-picked to release/jon3.3.x:

commit 9ae3875a38f4b70bb3ecf04413c81a87609fb2dc
Author: John Mazzitelli <mazz>
Date:   Fri Nov 21 19:07:43 2014 -0500

    BZ 1166383 - ensure a transport param is added to the remoting locator URL
    
    (cherry picked from commit 2a2ffa4dc443f0064365e7ef56deee4b9e3c688d)
    
    Conflicts:
        modules/enterprise/agent/src/test/java/org/rhq/enterprise/agent/AgentConfigurationTest.java

Comment 10 Larry O'Leary 2015-01-29 16:41:08 UTC
To verify, in addition to the steps listed in comment 0, confirm that the generalizeSocketException=true parameter appears in the agent's end-point address under the agent topology page. (Administration -> Topology > Agents >> view agent details and verify Remote Endpoint contains generalizeSocketException=true)

Comment 11 Larry O'Leary 2015-01-29 17:00:39 UTC
Moving this to ON_QA as this was in ER01 and ready for verification.