Bug 835628
Summary: | Broker crashes after creating federated links | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jason Dillaman <jdillama> |
Component: | qpid-cpp | Assignee: | Ken Giusti <kgiusti> |
Status: | CLOSED ERRATA | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 2.0 | CC: | iboverma, jneedle, jross, kgiusti, mcressma, pematous |
Target Milestone: | 2.2 | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qpid-cpp-0.14-19 | Doc Type: | Bug Fix |
Doc Text: |
Cause
Creating a federation link where the source broker is not a member of a cluster (source broker is stand-alone).
Consequence
This would occasionally cause the destination broker to crash as there was a race between the thread that configures the link and the thread that sends traffic over it.
Fix
The race was removed by moving the configuration code to the same thread as the data handling code.
Result
The link configuration is completed fully before traffic is sent over it.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-09-17 11:11:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 698367 |
Description
Jason Dillaman
2012-06-26 16:11:08 UTC
This bug was caused by the incorrect backport of this upstream fix: https://issues.apache.org/jira/browse/QPID-3963 The upstream fix allowed the broker to subscribe for failover events. This patch did not port cleanly to our MRG downstream repos, causing the above crash. The fix was originally submitted to the mrg_2_ptc_hotfix branch of the MRG git repo: http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=fa4ef35981defb5daa0256eebafa0e458a6c3af3 It is relevant only to 0.12 and 0.14-based MRG repos - 0.18 is not affected. I believe mcressman has ported the fix to 0.12 and 0.14 - Mike can you confirm? Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Creating a federation link where the source broker is not a member of a cluster (source broker is stand-alone). Consequence This would occasionally cause the destination broker to crash as there was a race between the thread that configures the link and the thread that sends traffic over it. Fix The race was removed by moving the configuration code to the same thread as the data handling code. Result The link configuration is completed fully before traffic is sent over it. While verifying Bug 831365 on the latest packages (qpid-cpp-*14-19), I have noticed that following error message is logged to the broker log. The broker is not crashing any more, but the error message shall definitely not appear in the log. The issue is easily reproducible by creating a federated link between two non-clustered brokers. src_broker log snip: 2012-07-27 17:56:16 notice Broker running 2012-07-27 17:56:28 info Connection is a federation link 2012-07-27 17:56:30 info Queue "qpid.link.032acf3f-260e-47c1-abfe-d51c95a66a85": Policy created: type=reject; maxCount=0; maxSize=104857600 2012-07-27 17:56:30 info Queue "qpid.link.032acf3f-260e-47c1-abfe-d51c95a66a85": Flow limit created: flowStopCount=0, flowResumeCount=0, flowStopSize=83886080, flowResumeSize=73400320 2012-07-27 17:56:30 error Execution exception: not-found: Exchange not found: amq.failover (qpid/broker/ExchangeRegistry.cpp:97) Ken, shall I move this bz back to assigned or create a separate bz for this issue? Hi Petr, The log message is, unfortunately, expected: the broker logs - as an error - any command that it cannot complete. In this case, the remote is attempting to bind to a non-existing exchange (amq.failover). The command will fail, resulting in the log message. Even though it has logged an error, there really is nothing wrong with the broker at this point - the bind fails, the session ends and both sides clean up. Ideally, the log message shouldn't be issued in this particular case: since the source broker is not part of a cluster, there is no need for failover. Thus, there is no amq.failover exchange. The 0.10 spec is pretty clear about this - amq.failover should only exist if the broker supports failover. The problem is that the other (destination) broker doesn't know if the source is part of a cluster (and has amq.failover) or not. It only finds out by attempting to bind to amq.failover, and dealing with the result (success or failure). You'll see the same result if you try to run qpid-receive with the --failover-updates parameter: qpid-receive -b 127.0.0.1:7777 -a amq.direct --failover-updates 2012-07-30 15:54:00 [Client] warning Exception received from broker: not-found: not-found: Queue not found: amq.failover (../../../qpid/cpp/src/qpid/broker/SessionAdapter.cpp:692) [caused by 2 \x08:\x01] qpid-receive: Queue amq.failover does not exist And the same error will be logged by the broker. Perhaps the destination broker could query for the existence of amq.failover exchange first, before deciding to bind. But that would add another level of complexity to the federation link setup. Another option would be to reduce the log level for failure to bind to amq.failover - though from my quick glance at the code this would be more difficult than it sounds. In either case, I'd open a new BZ. New bug 844655 created for the issue mentioned in Comment 5 and Comment 6. |