Bug 1136220

Summary: [QE] (6.4.0) Intermittent test failures due to 'Could not stop container' and 'Could not start container'
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: baranowb <bbaranow>
Component: TestsuiteAssignee: Petr Kremensky <pkremens>
Status: CLOSED EOL QA Contact: Petr Kremensky <pkremens>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4.0CC: cdewolf, jason.greene, jkudrnac, kkhan, olubyans, pkremens
Target Milestone: ---   
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1145994 (view as bug list) Environment:
Last Closed: 2019-08-19 12:46:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 996500, 1145994    
Attachments:
Description Flags
Hot fix
none
hot_fix none

Comment 1 baranowb 2014-09-02 08:03:23 UTC
Currently most of failing tests are ignored, but list potentially include whole testsuite. Two flags, if WIP, clone for proper branch.

Comment 4 Petr Kremensky 2014-09-10 12:32:15 UTC
Taking this build: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.7,label_exp=eap-sustaining%20&&%20w2k8r2%20&&%20x86_64/testReport/

ManagementOpTimeoutTestCase is first to fail with timeout exception here. It fails right on the first line of @Before method (container.start(DEFAULT_JBOSSAS);), server configuration should be clear at this point, but see server starting logs for all the mess:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.7,label_exp=eap-sustaining%20&&%20w2k8r2%20&&%20x86_64/testReport/org.jboss.as.test.manualmode.management.cli/ManagementOpTimeoutTestCase/testTimeoutCausesRestartRequired/

I've asked security guys to go trough manual-mode tests and try to localize the test breaking the configuration (see part which repeats all the time that server tries to boot up "Remoting "management-client" read-1, fatal error: 46: General SSLEngine problem")

Comment 5 Petr Kremensky 2014-09-10 13:38:32 UTC
Problem is caused by HTTPSConnectioWithCLITestCase which secures the ManagementNativeRealm with SSL and is unable to unsecure it due to issue descrined in BZ1105003.
See https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.8,label_exp=eap-sustaining && w2k8 && x86_64/testReport/org.jboss.as.test.manualmode.management.cli/HTTPSConnectioWithCLITestCase/resetConfigurationForNativeInterface/

Configuration snippet left by the test:
<security-realm name="ManagementNativeRealm">
<server-identities>
    <ssl>
        <keystore
                path="W:\workspace\eap-63x-patched-testsuite-windows\2e26cbc8\testsuite\integration\manualmode\target\workdir\native-if-workdir\server.keystore"
                keystore-password="123456"/>
    </ssl>
</server-identities>
<authentication>
    <truststore
            path="W:\workspace\eap-63x-patched-testsuite-windows\2e26cbc8\testsuite\integration\manualmode\target\workdir\native-if-workdir\server.truststore"
            keystore-password="123456"/>
</authentication>
</security-realm>

Comment 6 FIlip Bogyai 2014-09-10 14:09:50 UTC
In these tests there is intermittent problem with initialization of CLI tool, which is configured to take custom jboss-cli.xml file with SSL settings. For testing SSL connection to server CustomCLIExecutor class is used: CLIhttps://github.com/jbossas/jboss-eap/blob/6fe2590e7f3ae6adb6987752ba0f3e44401f335b/testsuite/shared/src/main/java/org/jboss/as/test/integration/management/util/CustomCLIExecutor.java

Sometimes the CLI initialization freezes, which results in that command is not executed and test therefore fail. This is the output from frozen initialization:
 
INFO  [org.jboss.modules] JBoss Modules version 1.3.4.Final-redhat-1
INFO  [org.xnio] XNIO Version 3.0.10.GA-redhat-1
INFO  [org.xnio.nio] XNIO NIO Implementation Version 3.0.10.GA-redhat-1
INFO  [org.jboss.remoting] JBoss Remoting version 3.3.3.Final-redhat-1

when the CLI is unable to connect to server and execute operations. 

@Alexey: Please can you look at HTTPSConnectioWithCLITestCase which uses this  CustomCLIExecutor, if it can be improved somehow to not cause these intermittent failures. I've tried to investigate the problems, but I haven't found any fix for that. Or is there any better approach to test SSL connection with CLI tool?

Comment 8 Petr Kremensky 2014-09-11 12:48:49 UTC
Created attachment 936531 [details]
Hot fix

Comment 9 Petr Kremensky 2014-09-11 12:49:23 UTC
Hot fix for HTTPSConnectioWithCLITestCase could be to use standalone.xml configuration just for this particular test, so if it fails due to BZ1105003 other tests wouldn't be affected by this.

See attachment 936531 [details]

Comment 10 Petr Kremensky 2014-09-11 13:12:43 UTC
Created attachment 936539 [details]
hot_fix

Comment 11 Alexey Loubyansky 2014-09-11 13:34:48 UTC
What I would try to do first in this case is to use :reload instead of reload and see if it makes any difference.

For the users we would advise to use the command instead of the operation. The command contains a waiting and reconnecting logic. While the operation simply returns immediately. This reconnecting part of the command is not 100% reliable in my experience. Simply because what is available to implement that logic didn't show a consistent result. It works most of the time but once in awhile I saw connection timeouts for whatever reason.

In this test, after reload is sent to the CLI (which has its own waiting-reconnecting logic), there is still another wait-for-the-server logic in place. So, it'll be fine to switch to :reload for the CLI just to see whether it helps and what you see is the problem of CLI waiting and reconnecting to the controller.

Comment 12 Petr Kremensky 2014-09-12 06:49:21 UTC
Thanks Alexey, we will try that.

I'll do a PR once we finish the CP testing 
https://github.com/pkremens/jboss-eap/commit/d7e44e9a5767856c05358413a4ad012d7c685dd3

Comment 13 JBoss JIRA Server 2014-09-24 09:22:48 UTC
Petr Kremensky <pkremens> updated the status of jira WFLY-3890 to Closed

Comment 14 Petr Kremensky 2014-09-24 09:48:18 UTC
EAP 6.4.0.DR2 run with fix included.
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-as-testsuite-solaris/lastCompletedBuild/RELEASE=6.4.0,jdk=java17_default,label_exp=solaris11%20&&%20sparc/testReport/org.jboss.as.test.manualmode.management.cli/HTTPSConnectioWithCLITestCase/resetConfigurationForNativeInterface/

Manual node tests no longer fails due to 'Could not stop container' and 'Could not start container', but the using :reload command doesn't fix the root cause.