Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2177931

Summary: [OVN] neutron-server sometimes doesn't detect LogicalSwitchPortUpdateUpEvent and instance creation times out
Product: Red Hat OpenStack Reporter: yatanaka
Component: openstack-neutronAssignee: OSP Team <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: averdagu, chrisw, jlibosva, mlavalle, scohen, twilson
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-networking-ovn-7.4.2-2.20220409154873.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-27 14:16:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2143592    
Bug Blocks:    

Description yatanaka 2023-03-14 00:21:23 UTC
Description of problem:

Sometimes instances fails with the error message "Failed to allocate the network(s), not rescheduling".

</var/log/containers/nova-compute.log on a compute node in the customer's environment>
~~~
Timeout waiting for [('network-vif-plugged', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds
~~~

In my lab, I can see the following messages in /var/log/containers/neutron/server.log after ovn_controller on a compute node creates the port, but this messages didn't appear when the issue occurs in the customer's environment.

</var/log/containers/neutron/server.log on a controller node in my lab environment>
~~~
DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: LogicalSwitchPortUpdateUpEvent(events=('update',), table='Logical_Switch_Port', ........
INFO networking_ovn.ml2.mech_driver [-] OVN reports status up for port: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
~~~

I think the cause of this issue is the lack of "Matched UPDATE: LogicalSwitchPortUpdateUpEvent" but I'm not sure why this message didn't appear in the customer's environment.


Version-Release number of selected component (if applicable):
- Red Hat OpenStack Platform 16.2.4
- RHEL 8.4
- ml2/OVN


How reproducible:
Create a instance with a port.

Actual results:
Sometimes, instance creation failed
This issue only occurs occasionally. We didn't notice any regularity.
VM creation is slow only when it will fail with this error.

Expected results:
Instance creation succeeds everytime.

Additional information:
My colleague suggested me the following bugzilla.
  https://bugzilla.redhat.com/show_bug.cgi?id=2065897
But I'm guessing this is irrelevant because I cannot find the following message in the customer's environment.
  ovsdb_idl|WARN|transaction error: {"details":"cannot delete HA_Chassis_Group row aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa because of 1 remaining reference(s)","error":"referential integrity violation"}

Comment 5 Jakub Libosvar 2023-03-27 14:16:55 UTC

*** This bug has been marked as a duplicate of bug 2143592 ***