Bug 1340097
Summary: | Review fallback on RPC messaging | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Robin Cernin <rcernin> |
Component: | openstack-neutron | Assignee: | Hynek Mlnarik <hmlnarik> |
Status: | CLOSED ERRATA | QA Contact: | Toni Freger <tfreger> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 (Liberty) | CC: | abehl, adahms, amuller, chrisw, felipe.alfaro, hrivero, jlibosva, ndahiya, nyechiel, ochalups, pgadiya, sguha, srevivo |
Target Milestone: | async | Keywords: | ZStream |
Target Release: | 8.0 (Liberty) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-7.0.4-6.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: |
Previously, when the metadata agent experienced an issue with RPC communication, it would start using the neutron client interface as a fallback. The only way to configure the metadata agent to use RPC again would then be to restart the agent. Due to this, metadata agent services would become unavailable after any disruption of RPC communication if neutron API credentials are not configured for the agent. With this update, the code that activates a fallback using neutron client has been removed from the metadata agent so that RPC is always used.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-29 13:59:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1339488 |
Description
Robin Cernin
2016-05-26 12:44:19 UTC
To reflect the description change from Ihar Hrachyshka: Till Kilo we used neutronclient library to get data from neutron-server to metadata agent. Since Kilo we correctly introduced rpc communication between server and metadata agent - with fallback mechanism for those who upgrade agents first. That could lead to situation where server is still on Juno which don't have rpc api needed by Kilo agent. In such situation, we start again using neutron-client. https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/metadata/agent.py#L131 The fallback mechanism stayed there and got to Liberty, where it's not needed anymore. Also there is a problem here, because we fallback on any exception that comes from rpc communication. So for new Liberty deployments, that are not supposed to configure metadata agent with credentials for neutron api (as since kilo it's not used), on any error that happens on rpc, it switches to unconfigured neutron client till metadata agent is restarted. We should just remove the fallback mechanism as we did in Mitaka: https://review.openstack.org/#/c/231065/ I think that backporting the patch to OSP 8 is the right thing to do as the rewards outweigh the risks. Here's an explanation of the bug, the proposed solution, the concern with the solution, and why I think it makes sense to move forward. The Neutron metadata agent is used in most deployments during the "spawn a VM" flow. No metadata means that OpenStack won't be able to inject SSH keys in to VMs, rendering the VMs inaccessible in the common use case. The Neutron metadata agent needs to access information in the Neutron DB, which it used to obtain via the Neutron API. At some point, it was switched over to grab the same information via the messaging bus instead, keeping a fallback on API if the RPC implementation was found to be immature. We very recently found out that the fallback logic was: If any error (Including intermittent or transient errors) occur, fall back to using the API client. The issue is twofold: First, that doesn't make any sense, second: TripleO doesn't configure the Neutron metadata agent to use the API client. That means that after any intermittent messaging bus error, the metadata agent would become as useful as a bag of bricks, until the admin figured out that VMs cannot access metadata and the metadata agent was restarted. In short, this bug is a sitting time bomb. The solution that we are proposing is to backport a patch from Mitaka that removes the fallback logic entirely, leaving only the RPC client as a possibility, which will solve both problems outlined above. The outstanding concern are deployments that use: 1) A core plugin other than ML2 2) The metadata agent The RPC endpoint that is responsible for answering the metadata agent in the neutron-server process is implemented via the ML2 plugin. Any deployment with ML2, reference implementation or third party, will work fine. Core plugins that do not implement this RPC endpoint to answer metadata requests, and whose architecture includes the metadata agent, will no longer work if we backport the patch in question. The reason why I think it is reasonable to backport the patch is that I am not aware of Neutron solutions in certification that answer to both conditions outlined above. Furthermore, these solutions will have to adapt to the new world come OSP 9 as the patch was merged to Mitaka upstream, and so it is likely that a patch to fix the problem on the vendor side already exists, and if it becomes a problem with OSP 8, will simply be backported. We will be moving forward with the backport in one week unless anyone objects loudly. I am keeping the needinfo's up for now. Code tested in latest OSP8 - openstack-neutron-7.0.4-7.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1353 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |