| Summary: | Nonoptimal failover strategy can lead to RPC timeout | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marian Krcmarik <mkrcmari> | ||||
| Component: | python-oslo-messaging | Assignee: | Victor Stinner <vstinner> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Marian Krcmarik <mkrcmari> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 8.0 (Liberty) | CC: | apevec, fdinitto, hguemar, jschluet, lhh, michele, oblaut, yeylon | ||||
| Target Milestone: | ga | Keywords: | AutomationBlocker, TestBlocker | ||||
| Target Release: | 8.0 (Liberty) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | python-oslo-messaging-2.5.0-5.el7ost | Doc Type: | Bug Fix | ||||
| Doc Text: |
Oslo Messaging used the "shuffle" strategy to select a RabbitMQ host from the list of RabbitMQ servers. When a node of the cluster running RabbitMQ was restarted, each OpenStack service connected to this server reconnected to a new RabbitMQ server. Unfortunately, this strategy does not handle dead RabbitMQ servers correctly; it can try to connect to the same dead server multiple times in a row. The strategy also leads to increased reconnection time, and sometimes it may lead to RPC operations timing out because no guarantee is provided on how long the reconnection process will take.
With this update, Oslo Messaging uses the "round-robin" strategy to select a RabbitMQ host. This strategy provides the least achievable reconnection time and avoids RPC timeout when a node is restarted. It also guarantees that if K of N RabbitMQ hosts are alive, it will take at most N - K + 1 attempts to successfully reconnect to the RabbitMQ cluster.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1311597 (view as bug list) | Environment: | |||||
| Last Closed: | 2016-04-07 21:26:22 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1311597 | ||||||
| Attachments: |
|
||||||
|
Description
Marian Krcmarik
2016-01-27 17:27:46 UTC
I think it's safe to backport this patch. I've proposed it upstream and I'll do the backport downstream after some feedback is provided on the upstream patch. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0603.html |