1015098 – Management operations on slave host will corrupt preceding commands (CLI - batch)

Bug 1015098 - Management operations on slave host will corrupt preceding commands (CLI - batch)

Summary: Management operations on slave host will corrupt preceding commands (CLI - ba...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Domain Management
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	CR1
Target Release:	EAP 6.2.0
Assignee:	Emanuel Muckenhuber
QA Contact:	Petr Kremensky
Docs Contact:	Russell Dickenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-03 12:55 UTC by Petr Kremensky
Modified:	2015-02-01 23:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-12-15 16:17:18 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
Issue reproduced on RHEL 6.4 with 6.2.0.ER5 (46.61 KB, text/plain) 2013-10-08 14:20 UTC, Petr Kremensky	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	WFLY-2410	0	Major	Resolved	composite operation containing a host op does not get rolled out to all hosts	2015-07-14 10:36:18 UTC

Description Petr Kremensky 2013-10-03 12:55:55 UTC

Description of problem:
 Preceding command of batch is corrupted once I execute some management command (start|stop|restart) on slave host controller.

Version-Release number of selected component (if applicable):
 EAP 6.2.0.ER3

How reproducible:
 always

Steps to Reproduce on single node:
 1. prepare env and start domain
export IP=aaa.bbb.ccc.ddd // different from localhost
unzip -q jboss-eap-6.2.0.ER3.zip
cp -r jboss-eap-6.2/domain/ jboss-eap-6.2/domain2
# start first domain controller
./jboss-eap-6.2/bin/domain.sh &
# start slave
./jboss-eap-6.2/bin/domain.sh --host-config=host-slave.xml -Djboss.domain.base.dir=jboss-eap-6.2/domain2 -Djboss.domain.master.address=127.0.0.1 -Djboss.bind.address=$IP -Djboss.bind.address.management=$IP -Djboss.bind.address.unsecure=$IP &

 2. connect to cli and execute following batch:
batch
/profile=test-profile:add
/host=${slave_name}/server-config=server-one:restart
run-batch

Actual results:
# profile is not created
[domain@localhost:9999 /] ls profile=
default  full     full-ha  ha

# but trying to create it will throw Duplicate resource
[domain@localhost:9999 /] /profile=test-profile:add
{
    "outcome" => "failed",
    "failure-description" => {"host-failure-descriptions" => {"dhcp-4-200.brq.redhat.com" => "JBAS014803: Duplicate resource [(\"profile\" => \"test\")]"}},
    "rolled-back" => true
}

# test-profile was not written into any of config files
.../jboss-eap-6.2]$ grep -r test-profle . | wc 
      0       0       0

Additional info:
Commands on master works:
#1 /profile=test:add
#2 /host=master/server-config=server-one:restart

Comment 1 Brian Stansberry 2013-10-05 12:44:28 UTC

I can't reproduce this with the code that will become EAP 6.2 ER5.

Following the steps indicated, there is some problem with the servers on the slave starting. Lots of messages like this, and the servers never complete start.

[Server:server-two] 19:55:28,414 WARN  [org.hornetq.core.server] (Thread-1 (HornetQ-server-HornetQServerImpl::serverUUID=0742d2b4-2d45-11e3-97e7-5ded94e6bdd7-1382700530)) HQ222137: Unable to announce backup, retrying

I suspect this is something to do with a conflict between the server in host-slave.xml vs those in the default host.xml. I don't see the problem when master uses host-master.xml.  In any case it's a separate issue from this BZ.

Here's what I get when I execute the CLI commands:

[domain@localhost:9999 /] batch 
[domain@localhost:9999 / #] /profile=test:add
#1 /profile=test:add
[domain@localhost:9999 / #] /host=taozi.local/server-config=server-one:restart
#2 /host=taozi.local/server-config=server-one:restart
[domain@localhost:9999 / #] r
read-attribute     read-operation     reload             remove-batch-line  rollout-plan       run-batch          
[domain@localhost:9999 / #] run-batch 
{"host-failure-descriptions" => {"taozi.local" => {"JBAS014653: Composite operation failed and was rolled back. Steps that failed:" => {"Operation step-2" => "JBAS010946: Cannot restart server server-one as it is not currently started; it is STARTING"}}}}

Proper error there, and when I check for the 'test' profile on both hosts, it does not exist and can be added. The slave servers can also be stopped via the CLI. It's just "restart" that doesn't work, which is valid.

When I use host-master.xml on the master, avoiding the server start completion issue, the batch completes successfully.

In a unit test I wrote using the configs used in the testsuite/domain tests, a composite operation that matches what the batch produces succeeds. In that test the master has servers on it as well, but the conflict with the slave servers mentioned above does not occur. I'm sure there are some differences in the master's host config or in domain.xml that account for that.

When ER5 comes out, I'm interested whether you get equivalent results.

Comment 4 Petr Kremensky 2013-10-08 14:20:00 UTC

Created attachment 809326 [details]
Issue reproduced on RHEL 6.4 with 6.2.0.ER5

Comment 5 Petr Kremensky 2013-10-08 14:29:02 UTC

I am getting same results also with ER5 see attachment 809326 [details]. Also, I get the same result if I use host-master.xml for DC.

Comment 9 JBoss JIRA Server 2013-11-01 14:18:57 UTC

Emanuel Muckenhuber <emuckenh> updated the status of jira WFLY-2410 to Resolved

Comment 10 Petr Kremensky 2013-11-11 11:53:17 UTC

This issue was verified using the 6.2.0.CR1 preview bits.

Note You need to log in before you can comment on or make changes to this bug.