Bug 2177931
| Summary: | [OVN] neutron-server sometimes doesn't detect LogicalSwitchPortUpdateUpEvent and instance creation times out | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | yatanaka |
| Component: | openstack-neutron | Assignee: | OSP Team <rhos-maint> |
| Status: | CLOSED DUPLICATE | QA Contact: | Eran Kuris <ekuris> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.2 (Train) | CC: | averdagu, chrisw, jlibosva, mlavalle, scohen, twilson |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | python-networking-ovn-7.4.2-2.20220409154873.el8ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-27 14:16:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2143592 | ||
| Bug Blocks: | |||
*** This bug has been marked as a duplicate of bug 2143592 *** |
Description of problem: Sometimes instances fails with the error message "Failed to allocate the network(s), not rescheduling". </var/log/containers/nova-compute.log on a compute node in the customer's environment> ~~~ Timeout waiting for [('network-vif-plugged', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ~~~ In my lab, I can see the following messages in /var/log/containers/neutron/server.log after ovn_controller on a compute node creates the port, but this messages didn't appear when the issue occurs in the customer's environment. </var/log/containers/neutron/server.log on a controller node in my lab environment> ~~~ DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: LogicalSwitchPortUpdateUpEvent(events=('update',), table='Logical_Switch_Port', ........ INFO networking_ovn.ml2.mech_driver [-] OVN reports status up for port: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa ~~~ I think the cause of this issue is the lack of "Matched UPDATE: LogicalSwitchPortUpdateUpEvent" but I'm not sure why this message didn't appear in the customer's environment. Version-Release number of selected component (if applicable): - Red Hat OpenStack Platform 16.2.4 - RHEL 8.4 - ml2/OVN How reproducible: Create a instance with a port. Actual results: Sometimes, instance creation failed This issue only occurs occasionally. We didn't notice any regularity. VM creation is slow only when it will fail with this error. Expected results: Instance creation succeeds everytime. Additional information: My colleague suggested me the following bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=2065897 But I'm guessing this is irrelevant because I cannot find the following message in the customer's environment. ovsdb_idl|WARN|transaction error: {"details":"cannot delete HA_Chassis_Group row aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa because of 1 remaining reference(s)","error":"referential integrity violation"}