| Summary: | Services calling XAResourceRecoveryRegistry.removeXAResourceRecovery in start/stop must use MSC async API | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Brian Stansberry <brian.stansberry> | ||||
| Component: | Server | Assignee: | Paul Ferraro <paul.ferraro> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Ondrej Chaloupka <ochaloup> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.2.0 | CC: | dandread, lthon, myarboro, rsvoboda | ||||
| Target Milestone: | CR1 | ||||||
| Target Release: | EAP 6.2.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-12-15 16:18:03 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1022054 | ||||||
| Attachments: |
|
||||||
|
Description
Brian Stansberry
2013-10-24 13:58:19 UTC
https://github.com/jbossas/jboss-eap/pull/621 completes most of this, but https://github.com/wildfly/wildfly/pull/5330 needs to be backported. The fix seems to be in the ER7 release (lthon compiled sources, I've checked the decompiled jar files) but the server stucks at the start. Adding standalone-full-ha.xml config file that simulates the problem. Created attachment 819801 [details]
standalone-full-ha.xml
Let me comment on this a bit too, as I was helping Ondra with the [failed] attempt to reproduce & verify.
Adding the following bits to the "infinispan" subsystem configuration in the XML makes EAP 6.2.0.ER7 hang during startup:
<cache-container name="aaa" default-cache="default" start="EAGER">
<transport lock-timeout="60000"/>
<replicated-cache name="default" mode="SYNC" batching="true">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="FULL_XA"/>
</replicated-cache>
</cache-container>
<cache-container name="bbb" default-cache="default" start="EAGER">
<transport lock-timeout="60000"/>
<replicated-cache name="default" mode="SYNC" batching="true">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="FULL_XA"/>
</replicated-cache>
</cache-container>
<cache-container name="ccc" default-cache="default" start="EAGER">
<transport lock-timeout="60000"/>
<replicated-cache name="default" mode="SYNC" batching="true">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="FULL_XA"/>
</replicated-cache>
</cache-container>
<cache-container name="eee" default-cache="default" start="EAGER">
<transport lock-timeout="60000"/>
<replicated-cache name="default" mode="SYNC" batching="true">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="FULL_XA"/>
</replicated-cache>
</cache-container>
<cache-container name="fff" default-cache="default" start="EAGER">
<transport lock-timeout="60000"/>
<replicated-cache name="default" mode="SYNC" batching="true">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="FULL_XA"/>
</replicated-cache>
</cache-container>
<cache-container name="ggg" default-cache="default" start="EAGER">
<transport lock-timeout="60000"/>
<replicated-cache name="default" mode="SYNC" batching="true">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="FULL_XA"/>
</replicated-cache>
</cache-container>
I'm not sure how much this is/isn't related to this particular issue, but it's definitely not OK.
Paul, are you looking at it? This is the final week to get fixes for CR, which should hopefully be the last build. Deadlocking on startup is indeed an issue, but a separate issue nonetheless. While caches service startup is asynchronous, the cache configuration service startup is synchronous. The cache configuration can depend on the TransactionManager or TransactionSychronizationRegistry, which are probably blocking - causing the deadlock. I will see if this fixes the issue. I built the tip of EAP branch and can confirm that the startup issue is fixed. Thanks Paul :-) OK, the server does not stuck. It seems fine for me. Thanks |