Description of problem: 2022-06-28 21:29:31,655 366044 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.46.47.191:22' as 'cloud-user' with public key authentication 2022-06-28 21:29:34,730 366044 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cloud-user.47.191 ([Errno None] Unable to connect to port 22 on 10.46.47.191). Number attempts: 1. Retry after 2 seconds ... 2022-06-28 21:34:34,058 366044 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cloud-user.47.191 after 20 attempts. Proxy client: no proxy client 2022-06-28 21:34:34.058 366044 ERROR tempest.lib.common.ssh Traceback (most recent call last): 2022-06-28 21:34:34.058 366044 ERROR tempest.lib.common.ssh File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 131, in _get_ssh_connection 2022-06-28 21:34:34.058 366044 ERROR tempest.lib.common.ssh ssh.connect(self.host, port=self.port, username=self.username, 2022-06-28 21:34:34.058 366044 ERROR tempest.lib.common.ssh File "/usr/lib/python3.9/site-packages/paramiko/client.py", line 368, in connect 2022-06-28 21:34:34.058 366044 ERROR tempest.lib.common.ssh raise NoValidConnectionsError(errors) 2022-06-28 21:34:34.058 366044 ERROR tempest.lib.common.ssh paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 10.46.47.191 Tempest log: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/enterprise/view/scenario/job/DFG-enterprise-baremetal-scenario-17.0-3control_2compute_1freeipavm_externalceph-anycluster_tls/3/testReport/tempest.api.compute.admin.test_create_server/ServersWithSpecificFlavorTestJSON/test_verify_created_server_ephemeral_disk_id_b3c7bcfc_bb5b_4e22_b517_c7f686b802ca_/ Logs: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/enterprise/view/scenario/job/DFG-enterprise-baremetal-scenario-17.0-3control_2compute_1freeipavm_externalceph-anycluster_tls/3/testReport/ Failed tests: http://pastebin.test.redhat.com/1062420 Version-Release number of selected component (if applicable): core_puddle: RHOS-17.0-RHEL-9-20220623.n.1 How reproducible: 100% Steps to Reproduce: 1. Deploy openstack 17.0 in jenkins https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/enterprise/view/scenario/job/DFG-enterprise-baremetal-scenario-17.0-3control_2compute_1freeipavm_externalceph-anycluster_tls/ 2. At the stage "Second Tempest Run", tempest tests will fail Actual results: 25 tests with compute are failing Expected results: All the tempest runs should pass Additional info:
I'm moving this to the networking dfg and neutron component I believe what is happening here is caused by a delta in behaviour with how ml2/ovs works vs ml2/ovn. I think there is a regression in how ovn works with regards to when it creates the metadata proxy vs when the metadata proxy is created. for ml2/ovs or ml2/linux bridge the metadata proxy is configurable as provided either via the l3 agent or the DHCP agent. when a neutron network or router is create the metadata proxy for that network/subnet is also created. as a result the metadata paroxy when using ml2/ovs and ml2/linuxbridg and most other backend is create before the port is create and before it is ever bound. for ml2/ovn i belive this is not the case, the metadata proxy is only instancated after the port is created and bound to a host. as such there is not a gurentee that it is provisioned before the VM is booted because the ovn mech driver is not ensuring the metadata proxy is operation before it send the network-vif-plugged event. this would intoducece a race between the metadata proxy being provisioned and the VM being unpasused. i say would because while i have obsverd som bug report and had irc conversation that suggest this is infact what is happening but i have not proved that form the logs in this case. can someone form the networking dfg look at this case and review the provisioning blocks code in neutron to ensure that when neutron sends network-vif plugged that the ovn mech driver as ensured the metadata proxy is provisioned and functional to ensure we cannot race on its creation when nova start the VM. the contract between nova and neutron is that neutron must not return network-vif-plugged until all networking for the port is configured. that include the metadata proxy even if that was an implicit requirement previously due to the architecture of other driver. for ovn we may need to make this an explicit depnecy to ensure there is no race. if you think we have overlooked something feel free to send this back to us.
I can't see anything wrong that would prevent DHCP server not working for the given port. 2022-06-28T21:29:25.720Z|18232|binding|INFO|Claiming lport 6770bdf5-5aaf-45d7-9051-5aeea2eda8a2 for this chassis. 2022-06-28T21:29:25.720Z|18233|binding|INFO|6770bdf5-5aaf-45d7-9051-5aeea2eda8a2: Claiming fa:16:3e:b1:98:e7 10.100.0.7 2022-06-28T21:29:25.762Z|18245|binding|INFO|Setting lport 6770bdf5-5aaf-45d7-9051-5aeea2eda8a2 ovn-installed in OVS 2022-06-28T21:29:25.762Z|18246|binding|INFO|Setting lport 6770bdf5-5aaf-45d7-9051-5aeea2eda8a2 up in Southbound THe port was claimed and openflow installed. I'll probably need to see this live to see if there is an issue with flows. I noticed the environment has a little misconfiguration on OVN side. The bridge mappings on compute node is set for br-ex but the br-ex bridge is not created and DVR is not enabled. If this is non-dvr environment that the bridge mappings shouldn't be set on the compute nodes.
I executed the failing test with another RHEL 8.2 image that contains fix for the bug 1846393 - rhel-guest-image-8.2-326.x86_64.qcow2 - and it passes. I'm closing it as a duplicate. *** This bug has been marked as a duplicate of bug 1846393 ***