Bug 1015098

Summary: Management operations on slave host will corrupt preceding commands (CLI - batch)
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Petr Kremensky <pkremens>
Component: Domain ManagementAssignee: Emanuel Muckenhuber <emuckenh>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kremensky <pkremens>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: unspecified    
Version: 6.2.0CC: brian.stansberry, dandread, emuckenh, myarboro
Target Milestone: CR1   
Target Release: EAP 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-15 16:17:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Issue reproduced on RHEL 6.4 with 6.2.0.ER5 none

Description Petr Kremensky 2013-10-03 12:55:55 UTC
Description of problem:
 Preceding command of batch is corrupted once I execute some management command (start|stop|restart) on slave host controller.

Version-Release number of selected component (if applicable):
 EAP 6.2.0.ER3

How reproducible:
 always

Steps to Reproduce on single node:
 1. prepare env and start domain
export IP=aaa.bbb.ccc.ddd // different from localhost
unzip -q jboss-eap-6.2.0.ER3.zip
cp -r jboss-eap-6.2/domain/ jboss-eap-6.2/domain2
# start first domain controller
./jboss-eap-6.2/bin/domain.sh &
# start slave
./jboss-eap-6.2/bin/domain.sh --host-config=host-slave.xml -Djboss.domain.base.dir=jboss-eap-6.2/domain2 -Djboss.domain.master.address=127.0.0.1 -Djboss.bind.address=$IP -Djboss.bind.address.management=$IP -Djboss.bind.address.unsecure=$IP &

 2. connect to cli and execute following batch:
batch
/profile=test-profile:add
/host=${slave_name}/server-config=server-one:restart
run-batch

Actual results:
# profile is not created
[domain@localhost:9999 /] ls profile=
default  full     full-ha  ha

# but trying to create it will throw Duplicate resource
[domain@localhost:9999 /] /profile=test-profile:add
{
    "outcome" => "failed",
    "failure-description" => {"host-failure-descriptions" => {"dhcp-4-200.brq.redhat.com" => "JBAS014803: Duplicate resource [(\"profile\" => \"test\")]"}},
    "rolled-back" => true
}

# test-profile was not written into any of config files
.../jboss-eap-6.2]$ grep -r test-profle . | wc 
      0       0       0

Additional info:
Commands on master works:
#1 /profile=test:add
#2 /host=master/server-config=server-one:restart

Comment 1 Brian Stansberry 2013-10-05 12:44:28 UTC
I can't reproduce this with the code that will become EAP 6.2 ER5.

Following the steps indicated, there is some problem with the servers on the slave starting. Lots of messages like this, and the servers never complete start.

[Server:server-two] 19:55:28,414 WARN  [org.hornetq.core.server] (Thread-1 (HornetQ-server-HornetQServerImpl::serverUUID=0742d2b4-2d45-11e3-97e7-5ded94e6bdd7-1382700530)) HQ222137: Unable to announce backup, retrying

I suspect this is something to do with a conflict between the server in host-slave.xml vs those in the default host.xml. I don't see the problem when master uses host-master.xml.  In any case it's a separate issue from this BZ.

Here's what I get when I execute the CLI commands:

[domain@localhost:9999 /] batch 
[domain@localhost:9999 / #] /profile=test:add
#1 /profile=test:add
[domain@localhost:9999 / #] /host=taozi.local/server-config=server-one:restart
#2 /host=taozi.local/server-config=server-one:restart
[domain@localhost:9999 / #] r
read-attribute     read-operation     reload             remove-batch-line  rollout-plan       run-batch          
[domain@localhost:9999 / #] run-batch 
{"host-failure-descriptions" => {"taozi.local" => {"JBAS014653: Composite operation failed and was rolled back. Steps that failed:" => {"Operation step-2" => "JBAS010946: Cannot restart server server-one as it is not currently started; it is STARTING"}}}}

Proper error there, and when I check for the 'test' profile on both hosts, it does not exist and can be added. The slave servers can also be stopped via the CLI. It's just "restart" that doesn't work, which is valid.

When I use host-master.xml on the master, avoiding the server start completion issue, the batch completes successfully.

In a unit test I wrote using the configs used in the testsuite/domain tests, a composite operation that matches what the batch produces succeeds. In that test the master has servers on it as well, but the conflict with the slave servers mentioned above does not occur. I'm sure there are some differences in the master's host config or in domain.xml that account for that.

When ER5 comes out, I'm interested whether you get equivalent results.

Comment 4 Petr Kremensky 2013-10-08 14:20:00 UTC
Created attachment 809326 [details]
Issue reproduced on RHEL 6.4 with 6.2.0.ER5

Comment 5 Petr Kremensky 2013-10-08 14:29:02 UTC
I am getting same results also with ER5 see attachment 809326 [details]. Also, I get the same result if I use host-master.xml for DC.

Comment 9 JBoss JIRA Server 2013-11-01 14:18:57 UTC
Emanuel Muckenhuber <emuckenh> updated the status of jira WFLY-2410 to Resolved

Comment 10 Petr Kremensky 2013-11-11 11:53:17 UTC
This issue was verified using the 6.2.0.CR1 preview bits.