Bug 1806963 - Succeed to delete subnet while trying to attach that subnet to the router
Summary: Succeed to delete subnet while trying to attach that subnet to the router
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Slawek Kaplonski
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-25 11:16 UTC by Alex Katz
Modified: 2020-03-17 16:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-17 16:06:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1865891 0 None None None 2020-03-03 14:43:51 UTC
OpenStack gerrit 713045 0 None MERGED Lock subnets during port creation and subnet deletion 2020-09-30 08:58:15 UTC

Description Alex Katz 2020-02-25 11:16:29 UTC
Description of problem:

As part of bug 1779654 verification I tried to perform the following actions in background:
 1. Create subnet from pool
 2. Attach subnet to router
 3. Detach subnet from router
 4. Delete subnet
 5. Sleep 2 seconds
 6. GOTO 1

It failed with one of the following errors in l3-agent.log:

[-] Error while deleting router 9935b2d9-65af-4d5e-b0d4-7988cd638e66:
KeyError: 'subnets'
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
line 385, in _safe_router_removed
    self._router_removed(router_id)
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
line 404, in _router_removed
    ri.delete()
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py",
line 459, in delete
    super(HaRouter, self).delete()
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
line 421, in delete
    self.process_delete()
  File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
line 165, in call
    self.logger(e)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
line 162, in call
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
line 1164, in process_delete
    self._process_internal_ports()
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
line 575, in _process_internal_ports
    for subnet in p['subnets']:
KeyError: 'subnets'


[-] Failed to process compatible router:
9935b2d9-65af-4d5e-b0d4-7988cd638e66: KeyError: 'mtu'
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
line 628, in _process_routers_if_compatible
    self._process_router_if_compatible(router)
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
line 486, in _process_router_if_compatible
    self._process_updated_router(router)
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
line 527, in _process_updated_router
    ri.process()
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py",
line 474, in process
    super(HaRouter, self).process()
  File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
line 165, in call
    self.logger(e)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
line 162, in call
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
line 1181, in process
    self._process_internal_ports()
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
line 567, in _process_internal_ports
    internal_ports)
  File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
line 515, in _get_updated_ports
    mtu_changed = existing_port['mtu'] != current_port['mtu']
KeyError: 'mtu'


In l3-agent.log file I can see that there is no information about subnet and IP address:

(output is formatted)
2020-02-25 09:59:33.846 875552 DEBUG neutron.agent.l3.router_info [-] appending port 
{
  u'allowed_address_pairs': [
    
  ],
  u'extra_dhcp_opts': [
    
  ],
  u'updated_at': u'2020-02-25T09:59:33Z',
  u'device_owner': u'network:ha_router_replicated_interface',
  u'revision_number': 11,
  u'port_security_enabled': False,
  u'binding:profile': {
    
  },
  u'fixed_ips': [
    
  ],
  u'id': u'30b654b9-0d09-407d-8553-b84c0d36e5ef',
  u'security_groups': [
    
  ],
  u'binding:vif_details': {
    u'port_filter': True,
    u'datapath_type': u'system',
    u'ovs_hybrid_plug': True
  },
  u'binding:vif_type': u'ovs',
  u'qos_policy_id': None,
  u'mac_address': u'fa:16:3e:6b:13:79',
  u'project_id': u'e364e04c62d845a0ac682782a07712ee',
  u'status': u'DOWN',
  u'binding:host_id': u'controller-0.redhat.local',
  u'description': u'',
  u'tags': [
    
  ],
  u'device_id': u'6b7a42d0-12ba-4e07-aa4b-3e58f11974f6',
  u'name': u'',
  u'admin_state_up': True,
  u'network_id': u'2506f745-6581-4b9a-8dde-8c11ebf1d7cb',
  u'tenant_id': u'e364e04c62d845a0ac682782a07712ee',
  u'created_at': u'2020-02-25T09:59:28Z',
  u'binding:vnic_type': u'normal',
  u'ip_allocation': u'immediate'
} 
to internal_ports cache _process_internal_ports /usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py:583



Based on openvswitch-agent.log it seems that the subnet is deleted before the port configuration is compleate:

2020-02-25 09:59:29.901 107901 DEBUG neutron.agent.resource_cache [req-c5a7718c-c4d6-4dbf-a43b-286f2fb09956 9c16552bff264e21a01ba4b3e8ba0d90 e364e04c62d
845a0ac682782a07712ee - - -] Received new resource Port: Port(admin_state_up=True,allowed_address_pairs=[],binding=PortBinding,binding_levels=[],created
_at=2020-02-25T09:59:28Z,data_plane_status=<?>,description='',device_id='6b7a42d0-12ba-4e07-aa4b-3e58f11974f6',device_owner='network:ha_router_replicate
d_interface',dhcp_options=[],distributed_binding=None,dns=None,fixed_ips=[IPAllocation],id=30b654b9-0d09-407d-8553-b84c0d36e5ef,mac_address=fa:16:3e:6b:
13:79,name='',network_id=2506f745-6581-4b9a-8dde-8c11ebf1d7cb,project_id='e364e04c62d845a0ac682782a07712ee',qos_policy_id=None,revision_number=5,securit
y=PortSecurity(30b654b9-0d09-407d-8553-b84c0d36e5ef),security_group_ids=set([]),status='DOWN',updated_at=2020-02-25T09:59:29Z) record_resource_update /u
sr/lib/python2.7/site-packages/neutron/agent/resource_cache.py:187

2020-02-25 09:59:30.022 107901 DEBUG neutron.agent.resource_cache [req-ee7510d2-69cc-49a6-bdfa-4455d7df47ee 9c16552bff264e21a01ba4b3e8ba0d90 e364e04c62d
845a0ac682782a07712ee - - -] Resource Subnet deleted: 5561834a-9bf3-41e7-ac87-d2d0eae65ca7 record_resource_delete /usr/lib/python2.7/site-packages/neutr
on/agent/resource_cache.py:197

2020-02-25 09:59:30.436 107901 DEBUG neutron.agent.resource_cache [req-c5a7718c-c4d6-4dbf-a43b-286f2fb09956 9c16552bff264e21a01ba4b3e8ba0d90 e364e04c62d
845a0ac682782a07712ee - - -] Resource Port 30b654b9-0d09-407d-8553-b84c0d36e5ef updated (revision_number 5->7). Old fields: {'fixed_ips': [IPAllocation(ip_address=10.108.108.1,network_id=2506f745-6581-4b9a-8dde-8c11ebf1d7cb,port_id=30b654b9-0d09-407d-8553-b84c0d36e5ef,subnet_id=5561834a-9bf3-41e7-ac87-d2d0eae65ca7)]} New fields: {'fixed_ips': []} record_resource_update /usr/lib/python2.7/site-packages/neutron/agent/resource_cache.py:185





Version-Release number of selected component (if applicable):
OpenStack-13.0-RHEL-7-20200214.1




Steps to Reproduce:

openstack subnet pool create --pool-prefix 10.108.108.0/24 the_new_subnet_pool
openstack network create the_new_network_1
openstack router create the_new_router

for i in {1..10};
do
    openstack subnet create --subnet-pool the_new_subnet_pool
--prefix-length 27 --network the_new_network_1 the_new_subnet_1 &
    openstack router add subnet the_new_router the_new_subnet_1 &
    openstack router remove subnet the_new_router the_new_subnet_1 &
    openstack subnet delete the_new_subnet_1 &
    sleep 2
done



The issue causes the following errors:
1. All the interfaces are removed from router's namespace
2. Can't assign new subnets/ports to the router
3. Can't delete router

Comment 2 Slawek Kaplonski 2020-03-03 12:26:24 UTC
So that is indeed race condition between removing of subnet and removing subnet's interface from the router.
In "normal" case when You are doing first deletion of subnet, it should fail with error like:

Failed to delete subnet with name or ID...One or more ports have an IP allocation from this subnet

But in this case as first subnet is removed from the router, on db side all is good but later on l3 agent's side it's not fine.
So IMO we need to add some prevention against such strange case on L3 agent's side and it should be good.

I don't think this is very critical issue in fact as this is kind of corner case and shouldn't happened too often in real life.

Comment 4 Slawek Kaplonski 2020-03-04 13:43:38 UTC

After some more digging into it I found that minimal reproducer for this issue is something like:

openstack subnet create --subnet-pool demo-subnetpool4 --prefix-length 27 --network the_new_network_1 the_new_subnet_1;
openstack router add subnet the_other_router the_new_subnet_1 &
openstack subnet delete the_new_subnet_1 &

And the real problem is on server side as sometimes in such case there is one router's interface left without fixed ips, like:

neutron port-show b0db49d0-1c70-4e4c-a46d-9ac34a22ba7c
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-----------------------+---------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | devstack-ubuntu-ovs |
| binding:profile | {} |
| binding:vif_details | {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int"} |
| binding:vif_type | ovs |
| binding:vnic_type | normal |
| created_at | 2020-03-02T01:01:28Z |
| description | |
| device_id | 4a650478-54cb-4270-9dcc-fc3383971b2e |
| device_owner | network:router_interface |
| extra_dhcp_opts | |
| fixed_ips | |
| id | b0db49d0-1c70-4e4c-a46d-9ac34a22ba7c |
| ip_allocation | immediate |
| mac_address | fa:16:3e:1e:6e:d0 |
| name | |
| network_id | 5ac97c24-7c51-47cd-b006-aec70b59fdc7 |
| port_security_enabled | False |
| project_id | 0c5d93b067784b609fb5d07873e1b80d |
| qos_network_policy_id | |
| qos_policy_id | |
| resource_request | |
| revision_number | 4 |
| security_groups | |
| status | DOWN |
| tags | |
| tenant_id | 0c5d93b067784b609fb5d07873e1b80d |
| updated_at | 2020-03-02T01:01:31Z |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------+


Note You need to log in before you can comment on or make changes to this bug.