Bug 2143963

Summary: OSP16.2 pre-provisioned node tempest failures due to no network connectivity
Product: Red Hat OpenStack Reporter: David Rosenfeld <drosenfe>
Component: openstack-neutronAssignee: Elvira <egarciar>
Status: CLOSED NEXTRELEASE QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: chrisw, egarciar, froyo, ihrachys, mburns, scohen, slinaber
Target Milestone: asyncKeywords: Reopened, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2308660 (view as bug list) Environment:
Last Closed: 2024-09-26 09:10:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Rosenfeld 2022-11-18 14:23:58 UTC
Description of problem: In phase 3 regression pre-provisioned node deployments are successful. Tempest sanity is then run and all tests fail with errors like:

Public network connectivity check failed or SSHTimeout: Connection to the 10.0.0.165 via SSH timed out

Failure does not seem to be in infrared or tempest because submitting latest cdn 16.2 builds to the same job pass all tempest tests.


Version-Release number of selected component (if applicable): RHOS-16.2-RHEL-8-20221111.n.1


How reproducible: Every time


Steps to Reproduce:
1. Execute one of the Phase 3 pre-provisioned node jobs Ex: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/splitstack/job/DFG-df-splitstack-16.2-virsh-3cont_2comp_3ceph-scaleup/
2.
3.

Actual results: Deployment is successful, but tempest tests fail due to lack of network connectivity. 


Expected results: Deployment is successful and tempest tests pass.


Additional info:

Comment 2 Ihar Hrachyshka 2022-11-21 15:36:13 UTC
OVN metadata agent logs full of:

2022-11-14 20:12:07.725 49120 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingChassisCreatedEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(parent_port=[], chassis=[<ovs.db.idl.Row object at 0x7f2b2b002390>], mac=['fa:16:3e:08:16:92 10.100.0.8'], options={'mcast_flood_reports': 'true', 'requested-chassis': 'compute-1.redhat.local'}, ha_chassis_group=[], type=, tag=[], requested_chassis=[<ovs.db.idl.Row object at 0x7f2b2b002390>], tunnel_key=3, up=[False], logical_port=43151b08-cd1d-43c9-9deb-e3bfd2f3509c, gateway_chassis=[], encap=[], external_ids={'neutron:cidrs': '10.100.0.8/28', 'neutron:device_id': '5ce923ac-2191-4e30-ade7-c0dc2262b815', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-83e49fbe-343d-45b7-bdd2-d436843aa680', 'neutron:port_name': '', 'neutron:project_id': 'c4a54d25daae41d28c49568e443e3469', 'neutron:revision_number': '2', 'neutron:security_group_ids': '08f18423-4579-4cf5-bef1-d5e639d4b767'}, virtual_parent=[], nat_addresses=[], datapath=a43237b8-5a50-4fe2-a714-81e8ea96fa41) old=Port_Binding(chassis=[]) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2022-11-14 20:12:07.727 49120 INFO networking_ovn.agent.metadata.agent [-] Port 43151b08-cd1d-43c9-9deb-e3bfd2f3509c in datapath 83e49fbe-343d-45b7-bdd2-d436843aa680 bound to our chassis
2022-11-14 20:12:07.731 49120 DEBUG networking_ovn.agent.metadata.agent [-] Provisioning metadata for network 83e49fbe-343d-45b7-bdd2-d436843aa680 provision_datapath /usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py:408
2022-11-14 20:12:07.732 49120 DEBUG networking_ovn.agent.metadata.agent [-] There is no metadata port for network 83e49fbe-343d-45b7-bdd2-d436843aa680 or it has no MAC or IP addresses configured, tearing the namespace down if needed provision_datapath /usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py:418

So metadata is not provisioned?

Comment 3 Elvira 2022-11-30 08:37:07 UTC
ovn-controller.log is filled with these:

2022-11-14T20:23:05.504Z|00068|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 6aed2d52-34f5-4bb9-b5cd-578e88ce53aa to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.504Z|00069|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.519Z|00070|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.519Z|00071|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.527Z|00072|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 6aed2d52-34f5-4bb9-b5cd-578e88ce53aa to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.527Z|00073|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.538Z|00074|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.538Z|00075|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.551Z|00076|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.551Z|00077|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.557Z|00078|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.557Z|00079|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.563Z|00080|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.563Z|00081|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.569Z|00082|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.569Z|00083|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64
2022-11-14T20:23:05.577Z|00084|binding|INFO|Changing chassis for lport cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0 from 2307827e-b7b0-4557-a4c2-cff4ee47a3d7 to 4c1c1f42-78ae-4361-a7f8-1018a1bd9795.
2022-11-14T20:23:05.577Z|00085|binding|INFO|cr-lrp-43c4c39c-1c78-43c9-aae9-11de1cb5b3b0: Claiming fa:16:3e:34:2e:4d 10.0.0.159/24 2620:52:0:13b8::1000:a/64

Comment 8 Elvira 2024-07-23 10:49:05 UTC
On the gates we can see that an error is triggered because while doing the update, the resolv.conf from the environment we are looking at has an ipv6 nameserver with address fe80::5054:ff:fe96:8af7%eth2, which netutils accepts as valid ipv6 when going through the drivers code (see get_system_dns_resolvers) but then netaddr doesn't recognize that as valid:



    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers [req-6b73b85f-d3d4-430d-8f42-43226341debd 3c961ad0090746b788508b60222e9772 45e663e8cc324da89e5272a96a8287de - default default] Mechanism driver 'ovn' failed in update_port_postcommit: netaddr.core.AddrFormatError: failed to detect a valid IP address from 'fe80::5054:ff:fe96:8af7%eth2'
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py", line 477, in _call_on_drivers
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     getattr(driver.obj, method_name)(context)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 774, in update_port_postcommit
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     retry_on_revision_mismatch=True)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 660, in _ovn_update_port
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     self._ovn_client.update_port(port, port_object=original_port)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 547, in update_port
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     self._update_subnet_dhcp_options(subnet, network, txn)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 2088, in _update_subnet_dhcp_options
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     new_options = self._get_ovn_dhcp_options(subnet, network, mac)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 1912, in _get_ovn_dhcp_options
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     subnet, network, server_mac=server_mac)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 1975, in _get_ovn_dhcpv4_opts
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     dns_servers = utils.get_dhcp_dns_servers(subnet)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/utils.py", line 551, in get_dhcp_dns_servers
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     filter_ips(get_system_dns_resolvers(), ip_version))
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/utils.py", line 546, in filter_ips
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     return [ip for ip in ips
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/utils.py", line 547, in <listcomp>
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     if netaddr.IPAddress(ip).version == ip_version]
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/netaddr/ip/__init__.py", line 306, in __init__
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers     'address from %r' % addr)
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers netaddr.core.AddrFormatError: failed to detect a valid IP address from 'fe80::5054:ff:fe96:8af7%eth2'
    2024-07-18 04:37:36.731 16 ERROR neutron.plugins.ml2.managers 
    2024-07-18 04:37:36.734 16 ERROR neutron.plugins.ml2.plugin [req-6b73b85f-d3d4-430d-8f42-43226341debd 3c961ad0090746b788508b60222e9772 45e663e8cc324da89e5272a96a8287de - default default] mechanism_manager.update_port_postcommit failed for port fe2e8198-5bf8-4fb2-90c0-5359bcd3bd2b: neutron.plugins.ml2.common.exceptions.MechanismDriverError

I'm currenty developing a fix for this for neutron master and will backport it to 16.2