Bug 1503818
| Summary: | [OSP9] Neutron L3-Agent silently stops updating router namespace | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Benjamin Schmaus <bschmaus> |
| Component: | openstack-neutron | Assignee: | Brian Haley <bhaley> |
| Status: | CLOSED ERRATA | QA Contact: | Toni Freger <tfreger> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 9.0 (Mitaka) | CC: | akaris, amuller, bhaley, chrisw, jlibosva, nyechiel, samccann, srevivo, vkommadi |
| Target Milestone: | zstream | Keywords: | Triaged, ZStream |
| Target Release: | 9.0 (Mitaka) | ||
| Hardware: | x86_64 | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-neutron-8.4.0-8.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-08 18:36:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Benjamin Schmaus
2017-10-18 20:17:49 UTC
Packages in configuration: openstack-neutron-8.4.0-6.el7ost.noarch openstack-neutron-metering-agent-8.4.0-6.el7ost.noarch python-neutron-8.4.0-6.el7ost.noarch python-neutron-lbaas-8.4.0-1.el7ost.noarch openstack-neutron-common-8.4.0-6.el7ost.noarch openstack-neutron-lbaas-8.4.0-1.el7ost.noarch python-neutronclient-4.1.1-2.el7ost.noarch openstack-neutron-bigswitch-lldp-8.40.7-2.el7ost.noarch openstack-neutron-openvswitch-8.4.0-6.el7ost.noarch openstack-neutron-ml2-8.4.0-6.el7ost.noarch python-neutron-lib-0.0.2-1.el7ost.noarch This still does have similarities to bz 1502572, it could be that we are tripping over the iptables-restore issue here, so the NAT rules are not getting added to the namespace, so adding the IP doesn't work. One thing we've been using to try and workaround this is to set admin_state_up=False then True on the affected router, which will trigger the agent to refresh things and get the rules and IP configured. We are still trying to root-cause the overlying issue and will update when I have more info. This does have the iptables traces in the log file, it also has this: 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task [req-19638c71-4ad9-412f-b5d7-dc9cb84eca4f - - - - -] Error during L3NATAgentWithStateReport.periodic_sync_routers_task 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task Traceback (most recent call last): 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/site-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task task(self, context) 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 568, in periodic_sync_routers_task 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task self.fetch_and_sync_all_routers(context, ns_manager) 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 603, in fetch_and_sync_all_routers 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task r['id'], r.get(l3_constants.HA_ROUTER_STATE_KEY)) 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha.py", line 120, in check_ha_state_for_router 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task if ri and current_state != TRANSLATION_MAP[ri.ha_state]: 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 81, in ha_state 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task ha_state_path = self.keepalived_manager.get_full_config_file_path( 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task AttributeError: 'NoneType' object has no attribute 'get_full_config_file_path' 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task I pinged someone to look at that since it could be related to why an IP did not get configured. Whenever a floating ip is added, l3 agent will 1) add it to its internal cache and then 2) writes to the config file and SIGHUP keepalived process to reload the new config But I suspect step 2 is not happening here because HA network port status is DOWN. https://review.openstack.org/#/c/512179/ addresses this issue. Once backports are merged in u/s we will backport it to d/s and can provide hotfix. note: restarting l2 agent(after restarting l3 agent) should fix this issue as well. Below error is different issue 2017-10-12 16:17:03.425 12387 ERROR oslo_service.periodic_task AttributeError: 'NoneType' object has no attribute 'get_full_config_file_path' I will propose a patch in u/s for that. But patch(https://review.openstack.org/#/c/512179/) in comment should fix floatingip issue. May be tomorrow if https://review.openstack.org/#/c/514138/ and https://review.openstack.org/#/c/514139/ gets merged today(I hope they can be merged today). Since there were a few related bugs with slightly different descriptions, another was cloned to track all the backports from upstream. https://bugzilla.redhat.com/show_bug.cgi?id=1505771 Steps to reproduce 1) In OSP9(or OSP10, OSP11) Restart L3 agent 2) Then spawn a vm and add floatingip 3) Ping floatingip, should succeed with the fix. 4) Also check if the floatingip is added to keepalived config file. Steps to reproduce 1) In OSP9(or OSP10, OSP11) Restart L3 agent 2) Then spawn a vm and add floatingip 3) Ping floatingip, should succeed with the fix. 4) Also check if the floatingip is added to keepalived config file. Tested on latest OSP9 openstack-neutron-8.4.0-8.el7ost.noarch
Setup: 3 Controllers,1 Compute
Reproduction steps:
1)VM spawned and floatingip attached, connectivity tested.
2) L3 Agent of MASTER router restarted several times during continuous ping to the floatingip of the VM.
3)Spawned additional 2 VMs with FIP.Connectivity to them tested.
4)Keeplived conf contains all FIP as expected, see below.
vrrp_instance VR_1 {
state BACKUP
interface ha-0d868774-03
virtual_router_id 1
priority 50
garp_master_delay 60
nopreempt
advert_int 2
track_interface {
ha-0d868774-03
}
virtual_ipaddress {
169.254.0.1/24 dev ha-0d868774-03
}
virtual_ipaddress_excluded {
10.0.0.210/24 dev qg-dacf0c94-d3
10.0.0.211/32 dev qg-dacf0c94-d3
10.0.0.212/32 dev qg-dacf0c94-d3
10.0.0.213/32 dev qg-dacf0c94-d3
30.30.30.1/24 dev qr-2520ff23-1b
fe80::f816:3eff:fe15:ac1/64 dev qg-dacf0c94-d3 scope link
fe80::f816:3eff:fe7f:30af/64 dev qr-2520ff23-1b scope link
}
virtual_routes {
0.0.0.0/0 via 10.0.0.1 dev qg-dacf0c94-d3
}
}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3152 |