Bug 2081766 - [Neutron][OVN] - Synchronizing Neutron and OVN databases maintenance task failures on port groups.
Summary: [Neutron][OVN] - Synchronizing Neutron and OVN databases maintenance task fai...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z10
: 16.1 (Train on RHEL 8.2)
Assignee: Jakub Libosvar
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 2080222 2086899
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-04 15:03 UTC by Matt Flusche
Modified: 2023-07-31 17:16 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2086899 (view as bug list)
Environment:
Last Closed: 2022-11-14 16:25:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-15043 0 None None None 2022-05-04 15:17:25 UTC

Description Matt Flusche 2022-05-04 15:03:09 UTC
Description of problem:
In this environment, new instances are failing launch with neutron internal server errors[1].

In the neutron server log we found a failure to find port group[2].

In OVN db, we verified missing port group:

  # export NBDB=$(sudo ovs-vsctl get open . external_ids:ovn-remote | sed -e 's/\"//g' | sed -e 's/6642/6641/g')
  # alias ovn-nbctl='sudo podman exec ovn_controller ovn-nbctl --db=$NBDB'
  # ovn-nbctl list Port_Group pg_0f5d1826_f5a1_4702_bf1e_b157b5e95b55
  (nil)

There is a resync maintenance task every 5 min to fix such issue but it seems broken[3].

Instances are successful with a new security group.


[1] - Nova instance failure.

{'code': 500, 'created': '2022-04-20T20:02:53Z',           |
|                                     | 'message': "Exceeded maximum number of retries. Exceeded   |
|                                     | max scheduling attempts 3 for instance                     |
|                                     | dbfc727e-9ccc-4560-99c9-e3a4747ae4b7. Last exception:      |
|                                     | Request Failed: internal server error while processing     |
|                                     | your request.\nNeutron server returns request_ids:         |
|                                     | ['req-6283cc", 'details': 'Traceback (most recent call     |
|                                     | last):\n  File "/usr/lib/python3.6/site-                   |
|                                     | packages/nova/conductor/manager.py", line 637, in          |
|                                     | build_instances\n    filter_properties,                    |
|                                     | instances[0].uuid)\n  File "/usr/lib/python3.6/site-       |
|                                     | packages/nova/scheduler/utils.py", line 895, in            |
|                                     | populate_retry\n    raise exception.MaxRetriesExceeded(rea |
|                                     | son=msg)\nnova.exception.MaxRetriesExceeded: Exceeded      |
|                                     | maximum number of retries. Exceeded max scheduling         |
|                                     | attempts 3 for instance                                    |
|                                     | dbfc727e-9ccc-4560-99c9-e3a4747ae4b7. Last exception:      |
|                                     | Request Failed: internal server error while processing     |
|                                     | your request.\nNeutron server returns request_ids:         |
|                                     | [\'req-6283ccf0-6fe5-46cf-8342-80f706bd86d8\']\n'}         |
| flavor                              | disk='80', ephemeral='0', , original_name='m1.large',      |
|                                     | ram='8192', swap='0', vcpus='4' 


[2] - port group failure from neutron server.log

2022-04-20 16:02:51.479 30 ERROR ovsdbapp.backend.ovs_idl.transaction [req-6283ccf0-6fe5-46cf-8342-80f706bd86d8 b7c2463c9c8b1856fb969afd935dbe8666185eba22b83b1f7050d91f9b9fdbfc 6d443e
80563646ba8ccfddeeed2380f1 - 2896954cd77544dc8e673b41d318f3e9 2896954cd77544dc8e673b41d318f3e9] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 1329, in run_idl
    pg = self.api.lookup('Port_Group', self.port_group)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 204, in lookup
    return self._lookup(table, record)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 260, in _lookup
    row = idlutils.row_by_value(self, rl.table, rl.column, record)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 130, in row_by_value
    raise RowNotFound(table=table, col=column, match=match)
ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Port_Group with name=pg_040f6daa_0ab4_4914_b4c7_0ff8228f0fb7


[3] - ovn sync failure example from neutron server.log

2022-04-20 16:03:02.297 50 DEBUG networking_ovn.common.maintenance [req-4fd77dba-3997-426b-b5da-e465c9ebc689 - - - - -] Maintenance task: Synchronizing Neutron and OVN databases check_for_inconsistencies /usr/lib/python3.6/site-packages/networking_ovn/common/maintenance.py:341
2022-04-20 16:03:02.298 50 DEBUG networking_ovn.common.maintenance [req-4fd77dba-3997-426b-b5da-e465c9ebc689 - - - - -] Maintenance task: Number of inconsistencies found at create/update: security_group_rules=6 _log /usr/lib/python3.6/site-packages/networking_ovn/common/maintenance.py:322
2022-04-20 16:03:02.298 50 DEBUG networking_ovn.common.maintenance [req-4fd77dba-3997-426b-b5da-e465c9ebc689 - - - - -] Maintenance task: Fixing resource 9fd01595-30fa-4b65-bd7c-3caa67d1e518 (type: security_group_rules) at create/update check_for_inconsistencies /usr/lib/python3.6/site-packages/networking_ovn/common/maintenance.py:353
2022-04-20 16:03:02.307 50 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): PgAclAddCommand(entity=pg_b3bd2359_6630_479e_84da_ec3cbff07ea7, direction=from-lport, priority=1002, match=inport == @pg_b3bd2359_6630_479e_84da_ec3cbff07ea7 && ip4 && ip4.dst == 0.0.0.0/0 && tcp && tcp.dst == 25, action=allow-related, log=False, may_exist=False, severity=[], name=[], external_ids={'neutron:security_group_rule_id': '9fd01595-30fa-4b65-bd7c-3caa67d1e518'}) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2022-04-20 16:03:02.308 50 ERROR ovsdbapp.backend.ovs_idl.transaction [req-4fd77dba-3997-426b-b5da-e465c9ebc689 - - - - -] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
    command.run_idl(txn)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 121, in run_idl
    self.direction, self.priority, self.match))
RuntimeError: ACL (from-lport, 1002, inport == @pg_b3bd2359_6630_479e_84da_ec3cbff07ea7 && ip4 && ip4.dst == 0.0.0.0/0 && tcp && tcp.dst == 25) already exists

2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command [req-4fd77dba-3997-426b-b5da-e465c9ebc689 - - - - -] Error executing command: RuntimeError: ACL (from-lport, 1002, inport == @pg_b3bd2359_6630_479e_84da_ec3cbff07ea7 && ip4 && ip4.dst == 0.0.0.0/0 && tcp && tcp.dst == 25) already exists
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     yield self._nested_txns_map[cur_thread_id]
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command KeyError: 139979826286216
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command 
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command During handling of the above exception, another exception occurred:
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command 
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 42, in execute
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     t.add(self)
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     next(self.gen)
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 196, in transaction
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     yield t
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     next(self.gen)
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     del self._nested_txns_map[cur_thread_id]
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     raise result.ex
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     txn.results.put(txn.do_commit())
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     command.run_idl(txn)
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 121, in run_idl
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command     self.direction, self.priority, self.match))
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command RuntimeError: ACL (from-lport, 1002, inport == @pg_b3bd2359_6630_479e_84da_ec3cbff07ea7 && ip4 && ip4.dst == 0.0.0.0/0 && tcp && tcp.dst == 25) already exists
2022-04-20 16:03:02.309 50 ERROR ovsdbapp.backend.ovs_idl.command 
2022-04-20 16:03:02.309 50 ERROR networking_ovn.common.maintenance [req-4fd77dba-3997-426b-b5da-e465c9ebc689 - - - - -] Maintenance task: Failed to fix resource 9fd01595-30fa-4b65-bd7c-3caa67d1e518 (type: security_group_rules): RuntimeError: ACL (from-lport, 1002, inport == @pg_b3bd2359_6630_479e_84da_ec3cbff07ea7 && ip4 && ip4.dst == 0.0.0.0/0 && tcp && tcp.dst == 25) already exists


Version-Release number of selected component (if applicable):

container version: openstack-neutron-server-ovn:16.1.7-12

$ podman run --net host registry.redhat.io/rhosp-rhel8/openstack-neutron-server-ovn:16.1.7-12 rpm -qa |grep neutron                                        
python3-neutron-dynamic-routing-15.0.1-1.20210528043020.56de1c4.el8ost.noarch
puppet-neutron-15.5.1-1.20210614113305.7d0406b.el8ost.noarch
python3-neutron-lib-1.29.1-1.20210527195021.4ef4b71.el8ost.noarch
python3-neutron-15.2.1-1.20210712133309.el8ost.noarch
openstack-neutron-ml2-15.2.1-1.20210712133309.el8ost.noarch
python3-neutronclient-6.14.1-1.20210528021924.a09e824.el8ost.noarch
openstack-neutron-common-15.2.1-1.20210712133309.el8ost.noarch
openstack-neutron-15.2.1-1.20210712133309.el8ost.noarch

$ podman run --net host registry.redhat.io/rhosp-rhel8/openstack-neutron-server-ovn:16.1.7-12 rpm -qa |grep ovn                                            
puppet-ovn-15.4.1-1.20210528102649.192ac4e.el8ost.noarch
python3-networking-ovn-7.3.1-1.20210714143310.el8ost.noarch

How reproducible:
100% in this specific env.

Steps to Reproduce:
1.  Launch instance with this specific security group


Additional info:
Will provide

Comment 2 Jakub Libosvar 2022-05-16 18:57:43 UTC
It looks like maintenance task inconsistency detection doesn't work on port groups.

Comment 10 Jakub Libosvar 2022-11-14 16:25:46 UTC
This will be fixed in 16.2


Note You need to log in before you can comment on or make changes to this bug.