Bug 2103149
| Summary: | compute tempest tests are failing with ssh connection timed out | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | myadla |
| Component: | openstack-neutron | Assignee: | Jakub Libosvar <jlibosva> |
| Status: | CLOSED DUPLICATE | QA Contact: | Eran Kuris <ekuris> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 17.0 (Wallaby) | CC: | bgibizer, chrisw, dasmith, eglynn, elicohen, jhakimra, jkreger, jlibosva, kchamart, mlavalle, myadla, sbaker, sbauza, scohen, sgordon, skovili, smooney, vromanso |
| Target Milestone: | --- | Flags: | elicohen:
needinfo+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-17 18:08:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
myadla
2022-07-01 14:49:33 UTC
I'm moving this to the networking dfg and neutron component I believe what is happening here is caused by a delta in behaviour with how ml2/ovs works vs ml2/ovn. I think there is a regression in how ovn works with regards to when it creates the metadata proxy vs when the metadata proxy is created. for ml2/ovs or ml2/linux bridge the metadata proxy is configurable as provided either via the l3 agent or the DHCP agent. when a neutron network or router is create the metadata proxy for that network/subnet is also created. as a result the metadata paroxy when using ml2/ovs and ml2/linuxbridg and most other backend is create before the port is create and before it is ever bound. for ml2/ovn i belive this is not the case, the metadata proxy is only instancated after the port is created and bound to a host. as such there is not a gurentee that it is provisioned before the VM is booted because the ovn mech driver is not ensuring the metadata proxy is operation before it send the network-vif-plugged event. this would intoducece a race between the metadata proxy being provisioned and the VM being unpasused. i say would because while i have obsverd som bug report and had irc conversation that suggest this is infact what is happening but i have not proved that form the logs in this case. can someone form the networking dfg look at this case and review the provisioning blocks code in neutron to ensure that when neutron sends network-vif plugged that the ovn mech driver as ensured the metadata proxy is provisioned and functional to ensure we cannot race on its creation when nova start the VM. the contract between nova and neutron is that neutron must not return network-vif-plugged until all networking for the port is configured. that include the metadata proxy even if that was an implicit requirement previously due to the architecture of other driver. for ovn we may need to make this an explicit depnecy to ensure there is no race. if you think we have overlooked something feel free to send this back to us. I can't see anything wrong that would prevent DHCP server not working for the given port. 2022-06-28T21:29:25.720Z|18232|binding|INFO|Claiming lport 6770bdf5-5aaf-45d7-9051-5aeea2eda8a2 for this chassis. 2022-06-28T21:29:25.720Z|18233|binding|INFO|6770bdf5-5aaf-45d7-9051-5aeea2eda8a2: Claiming fa:16:3e:b1:98:e7 10.100.0.7 2022-06-28T21:29:25.762Z|18245|binding|INFO|Setting lport 6770bdf5-5aaf-45d7-9051-5aeea2eda8a2 ovn-installed in OVS 2022-06-28T21:29:25.762Z|18246|binding|INFO|Setting lport 6770bdf5-5aaf-45d7-9051-5aeea2eda8a2 up in Southbound THe port was claimed and openflow installed. I'll probably need to see this live to see if there is an issue with flows. I noticed the environment has a little misconfiguration on OVN side. The bridge mappings on compute node is set for br-ex but the br-ex bridge is not created and DVR is not enabled. If this is non-dvr environment that the bridge mappings shouldn't be set on the compute nodes. I executed the failing test with another RHEL 8.2 image that contains fix for the bug 1846393 - rhel-guest-image-8.2-326.x86_64.qcow2 - and it passes. I'm closing it as a duplicate. *** This bug has been marked as a duplicate of bug 1846393 *** |