| Summary: | L3 agent failing with "Failed to process compatible router" errors | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | PURANDHAR SAIRAM MANNIDI <pmannidi> | |
| Component: | openstack-neutron | Assignee: | anil venkata <vkommadi> | |
| Status: | CLOSED ERRATA | QA Contact: | Alexander Stafeyev <astafeye> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.0 (Kilo) | CC: | amuller, chrisw, cmedeiro, dalvarez, ggillies, ihrachys, jmelvin, jschwarz, ljozsa, nyechiel, oblaut, pmannidi, srevivo, tfreger, vkommadi | |
| Target Milestone: | zstream | Keywords: | Triaged, ZStream | |
| Target Release: | 7.0 (Kilo) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-neutron-2015.1.4-15.el7ost | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1437813 (view as bug list) | Environment: | ||
| Last Closed: | 2017-07-12 13:15:18 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1437813, 1437818, 1437820 | |||
|
Description
PURANDHAR SAIRAM MANNIDI
2016-12-14 03:56:53 UTC
The neutron network/router setup for IPV6 that was causing the issue has been identified and removed. Once these were removed and services restarted, the instability and packets loss have ceased. However, the issue could return anytime if a tenant creates a network using IPV6. In the logs, we see that the router failed to initialize because of NoFilterMatched error from rootwrap on executing the following:
wrapper.netns.execute(['sysctl', '-w',
'net.ipv4.conf.all.promote_secondaries=1'])
Sadly, rootwrap does not distinguish between command failure and filter matching failures. l3.filters file contains the needed filter, so I assume that it's the command that failed.
audit.log does not contain any selinux denials.
Since you mentioned ipv6, I wonder if the sysctl knob is available in namespaces that belong to HA routers that are ipv6 only. I guess it's something to validate on a test setup.
On second look, I see probably related errors in dhcp agent log long before:
2016-01-20 12:35:10.401 82301 TRACE neutron.agent.dhcp.agent Unserializable message: ('#ERROR', FilterMatchNotExecutable())
I wonder if rootwrap is somehow broken, or filters not deployed correctly. Now it seems like an issue that is not specific to HA routers.
Speaking of SELinux, I see lots of duplicate messages i journal log during the error like: Dec 13 18:36:41 rhqe-bare-ctrl-1.localdomain kernel: SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts Why does SELinux initialize something 10+ times per second? I concur with Ihar: the error first appeared with a NoFiltersMatch issue as mentioned. Since the l3-agent's way of handling this error is by retrying to request, this will retry the entire processing request again and again (even though nothing has changed), resulting in [1] (which is the result of the initial request being "partially" committed and the floating ip to be already gone. [1]: http://pastebin.test.redhat.com/439169 (In reply to Ihar Hrachyshka from comment #3) > In the logs, we see that the router failed to initialize because of > NoFilterMatched error from rootwrap on executing the following: > > wrapper.netns.execute(['sysctl', '-w', > 'net.ipv4.conf.all.promote_secondaries=1']) > > Sadly, rootwrap does not distinguish between command failure and filter > matching failures. l3.filters file contains the needed filter, so I assume > that it's the command that failed. > > audit.log does not contain any selinux denials. > > Since you mentioned ipv6, I wonder if the sysctl knob is available in > namespaces that belong to HA routers that are ipv6 only. I guess it's > something to validate on a test setup. So I used the following commands to check if this command completes successfully against controllers ansible -m shell -a 'ip netns add ggillies' '*ctrl*' ansible -m shell -a 'cmd="ip netns exec ggillies sysctl -w net.ipv4.conf.all.promote_secondaries=1"' '*ctrl*' they all returned 10.9.38.32 | SUCCESS | rc=0 >> net.ipv4.conf.all.promote_secondaries = 1 So it seems, at least in productions current state, that this command works without issue (It might have been broken before) @ Graeme Gillies , @ Caetano Medeiros Can you please add files under "etc/neutron/rootwrap.d" to sosreports? I am not seeing them in http://collab-shell.usersys.redhat.com/01757392/ Thanks Anil Rootwrap filter files are fine. I think issue is not related to rootwrap filters. I see many rabbitmq connection broken issues in neutron server, ovs agent and l3 logs when we see neutron errors. May be we see neutron errors because of broken rabbitmq and sql connections. Can the customer make sure that these connections are stable before creating neutron or nova resources? Note comment 18. Hi Purandhar, Could you please approve that the fix was tested? If it was ok I will perform a code existence and verify the bug. tnx Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1747 Unitest will validate this behavior doesn't occur - https://code.engineering.redhat.com/gerrit/#/c/102096/ |