Bug 1086004
Summary: | Internal Error from python-qpid can cause qpid connection to never recover | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Russell Bryant <rbryant> |
Component: | openstack-neutron | Assignee: | Ihar Hrachyshka <ihrachys> |
Status: | CLOSED ERRATA | QA Contact: | Zdenek Kraus <zkraus> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 (RHEL 7) | CC: | chrisw, freznice, ihrachys, lpeer, mlopes, mrgqe-bugs, ndipanov, nyechiel, oblaut, yeylon, zkraus |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | 5.0 (RHEL 7) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-2014.1-23.el7ost | Doc Type: | Bug Fix |
Doc Text: |
Prior to this update, certain Qpid exceptions were not properly handled by the Qpid driver.
As a result, the Qpid connection would fail and stop processing subsequent messages.
With this update, all possible exceptions are handled to ensure the Qpid driver does not enter an unrecoverable failure loop. Consequently, Networking will continue to process Qpid messages, even after major exceptions occur.
|
Story Points: | --- |
Clone Of: | 1085997 | Environment: | |
Last Closed: | 2014-07-08 15:36:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1085995, 1085996, 1086001, 1086009, 1086010 |
Description
Russell Bryant
2014-04-09 20:49:49 UTC
Cloned to 5.0 as Neutron uses rpc from oslo-incubator in RHOS 5.0. We have the patch merged into d/s package as of initial Icehouse builds. Putting the latest build into Fixed in Version. Ofer, yes, as far as I know, we're going to support Qpid. Though RabbitMQ will be the default and recommended option. As for verification steps, the failure showed up under high load. We don't know how we got into the situation, so we can just make sure regression tests pass for neutron. We did the same at: https://bugzilla.redhat.com/show_bug.cgi?id=1085995#c3 when doing verification. > Scale will not work, so why to support it ?
Sorry, I didn't get this part. What's not to be supported, specifically?
As far as I know, there is PoC in the lab running that runs multiple neutron instances on single machine and load balancing them thru local haproxy. This could solve scale issues. It looks like the issue is very difficult to reproduce and will require significant effort. There is still possibility to trigger this using focused reproducer apart of RHOS which seems to be smaller effort and in my view worth to try. QA automation testing proposal (apart of RHOS): * VM single core * multiple clients simultaneously * all with low heartbeats * all looping over longer period * using multiple messaging patterns (receivers on queues as well as on exchanges/topics) It looks this defect is most probably dup of bug 1088004 (which was QAed already). Key is to verify that bug 1088004 backtrace is identical and analyze the testing scenarios. Also getting Ken's or Gordon's opinion about being it dup would help. I've checked the backtrace and the underlying code and I confirm that it is the same scenario. But I have to disagree with the duplicate. Bug 1088004 is the underlying bug filed for qpidd, on the other hand this bug is filed for openstack-neutron (and clones for another components) and all of those introduced own patched, thus we have to check if the openstack patches are valid since the underlying issue was fixed. (In reply to Zdenek Kraus from comment #23) > I've checked the backtrace and the underlying code and I confirm that it is > the same scenario. But I have to disagree with the duplicate. Bug 1088004 is > the underlying bug filed for qpidd, on the other hand this bug is filed for > openstack-neutron (and clones for another components) and all of those > introduced own patched, thus we have to check if the openstack patches are > valid since the underlying issue was fixed. Thanks, sounds like the plan. I believe we can skip comment 17 proposal as QA work was done on qpid side already (and tracked as bug 1088004). Patches were reviewed by me and gsim, and it's correct broadening exception catching. Since the underlying problem was fixed and verified (see Bug 1088004) -> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0848.html |