1136220 – [QE] (6.4.0) Intermittent test failures due to 'Could not stop container' and 'Could not start container'

Bug 1136220 - [QE] (6.4.0) Intermittent test failures due to 'Could not stop container' and 'Could not start container'

Summary: [QE] (6.4.0) Intermittent test failures due to 'Could not stop container' and...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Testsuite
Sub Component:
Version:	6.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	EAP 6.4.0
Assignee:	Petr Kremensky
QA Contact:	Petr Kremensky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	996500 1145994
TreeView+	depends on / blocked

Reported:	2014-09-02 08:02 UTC by baranowb
Modified:	2019-08-19 12:46 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	1145994 (view as bug list)
Environment:
Last Closed:	2019-08-19 12:46:54 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
Hot fix (3.01 KB, text/plain) 2014-09-11 12:48 UTC, Petr Kremensky	no flags	Details
hot_fix (2.60 KB, text/plain) 2014-09-11 13:12 UTC, Petr Kremensky	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	WFLY-3862	0	Major	Open	Run HTTPSConnectionWithCLITestCase and VaultPasswordsInCLITestCase.java with standalone profile	2016-11-29 13:32:27 UTC

Description baranowb 2014-09-02 08:02:10 UTC

Example build: http://lightning.mw.lab.eng.bos.redhat.com/viewLog.html?buildId=5188&tab=buildResultsDiv&buildTypeId=EAP_6xIgnoreLinux


This is still going on in CI.

Comment 1 baranowb 2014-09-02 08:03:23 UTC

Currently most of failing tests are ignored, but list potentially include whole testsuite. Two flags, if WIP, clone for proper branch.

Comment 4 Petr Kremensky 2014-09-10 12:32:15 UTC

Taking this build: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.7,label_exp=eap-sustaining%20&&%20w2k8r2%20&&%20x86_64/testReport/

ManagementOpTimeoutTestCase is first to fail with timeout exception here. It fails right on the first line of @Before method (container.start(DEFAULT_JBOSSAS);), server configuration should be clear at this point, but see server starting logs for all the mess:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.7,label_exp=eap-sustaining%20&&%20w2k8r2%20&&%20x86_64/testReport/org.jboss.as.test.manualmode.management.cli/ManagementOpTimeoutTestCase/testTimeoutCausesRestartRequired/

I've asked security guys to go trough manual-mode tests and try to localize the test breaking the configuration (see part which repeats all the time that server tries to boot up "Remoting "management-client" read-1, fatal error: 46: General SSLEngine problem")

Comment 5 Petr Kremensky 2014-09-10 13:38:32 UTC

Problem is caused by HTTPSConnectioWithCLITestCase which secures the ManagementNativeRealm with SSL and is unable to unsecure it due to issue descrined in BZ1105003.
See https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.8,label_exp=eap-sustaining && w2k8 && x86_64/testReport/org.jboss.as.test.manualmode.management.cli/HTTPSConnectioWithCLITestCase/resetConfigurationForNativeInterface/

Configuration snippet left by the test:
<security-realm name="ManagementNativeRealm">
<server-identities>
    <ssl>
        <keystore
                path="W:\workspace\eap-63x-patched-testsuite-windows\2e26cbc8\testsuite\integration\manualmode\target\workdir\native-if-workdir\server.keystore"
                keystore-password="123456"/>
    </ssl>
</server-identities>
<authentication>
    <truststore
            path="W:\workspace\eap-63x-patched-testsuite-windows\2e26cbc8\testsuite\integration\manualmode\target\workdir\native-if-workdir\server.truststore"
            keystore-password="123456"/>
</authentication>
</security-realm>

Comment 6 FIlip Bogyai 2014-09-10 14:09:50 UTC

In these tests there is intermittent problem with initialization of CLI tool, which is configured to take custom jboss-cli.xml file with SSL settings. For testing SSL connection to server CustomCLIExecutor class is used: CLIhttps://github.com/jbossas/jboss-eap/blob/6fe2590e7f3ae6adb6987752ba0f3e44401f335b/testsuite/shared/src/main/java/org/jboss/as/test/integration/management/util/CustomCLIExecutor.java

Sometimes the CLI initialization freezes, which results in that command is not executed and test therefore fail. This is the output from frozen initialization:
 
INFO  [org.jboss.modules] JBoss Modules version 1.3.4.Final-redhat-1
INFO  [org.xnio] XNIO Version 3.0.10.GA-redhat-1
INFO  [org.xnio.nio] XNIO NIO Implementation Version 3.0.10.GA-redhat-1
INFO  [org.jboss.remoting] JBoss Remoting version 3.3.3.Final-redhat-1

when the CLI is unable to connect to server and execute operations. 

@Alexey: Please can you look at HTTPSConnectioWithCLITestCase which uses this  CustomCLIExecutor, if it can be improved somehow to not cause these intermittent failures. I've tried to investigate the problems, but I haven't found any fix for that. Or is there any better approach to test SSL connection with CLI tool?

Comment 7 Petr Kremensky 2014-09-10 14:25:23 UTC

Fixing the link from comment 5.

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-63x-patched-testsuite-windows/11/jdk=jdk1.8,label_exp=eap-sustaining%20&&%20w2k8%20&&%20x86_64/testReport/org.jboss.as.test.manualmode.management.cli/HTTPSConnectioWithCLITestCase/resetConfigurationForNativeInterface/

Comment 8 Petr Kremensky 2014-09-11 12:48:49 UTC

Created attachment 936531 [details]
Hot fix

Comment 9 Petr Kremensky 2014-09-11 12:49:23 UTC

Hot fix for HTTPSConnectioWithCLITestCase could be to use standalone.xml configuration just for this particular test, so if it fails due to BZ1105003 other tests wouldn't be affected by this.

See attachment 936531 [details]

Comment 10 Petr Kremensky 2014-09-11 13:12:43 UTC

Created attachment 936539 [details]
hot_fix

Comment 11 Alexey Loubyansky 2014-09-11 13:34:48 UTC

What I would try to do first in this case is to use :reload instead of reload and see if it makes any difference.

For the users we would advise to use the command instead of the operation. The command contains a waiting and reconnecting logic. While the operation simply returns immediately. This reconnecting part of the command is not 100% reliable in my experience. Simply because what is available to implement that logic didn't show a consistent result. It works most of the time but once in awhile I saw connection timeouts for whatever reason.

In this test, after reload is sent to the CLI (which has its own waiting-reconnecting logic), there is still another wait-for-the-server logic in place. So, it'll be fine to switch to :reload for the CLI just to see whether it helps and what you see is the problem of CLI waiting and reconnecting to the controller.

Comment 12 Petr Kremensky 2014-09-12 06:49:21 UTC

Thanks Alexey, we will try that.

I'll do a PR once we finish the CP testing 
https://github.com/pkremens/jboss-eap/commit/d7e44e9a5767856c05358413a4ad012d7c685dd3

Comment 13 JBoss JIRA Server 2014-09-24 09:22:48 UTC

Petr Kremensky <pkremens> updated the status of jira WFLY-3890 to Closed

Comment 14 Petr Kremensky 2014-09-24 09:48:18 UTC

EAP 6.4.0.DR2 run with fix included.
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-as-testsuite-solaris/lastCompletedBuild/RELEASE=6.4.0,jdk=java17_default,label_exp=solaris11%20&&%20sparc/testReport/org.jboss.as.test.manualmode.management.cli/HTTPSConnectioWithCLITestCase/resetConfigurationForNativeInterface/

Manual node tests no longer fails due to 'Could not stop container' and 'Could not start container', but the using :reload command doesn't fix the root cause.

Note You need to log in before you can comment on or make changes to this bug.