Bug 1955579

Summary: [OVN migration] Networks created before migration have no route to the metadata port
Product: Red Hat OpenStack Reporter: Eduardo Olivares <eolivare>
Component: python-networking-ovnAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: apevec, egarciar, lhh, majopela, ralonsoh, scohen
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-11 14:02:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduardo Olivares 2021-04-30 12:41:32 UTC
Description of problem:
Issue reproduced with the ovn migration job [1]. See build #123. Four tempest tests failed during this job, all of them were run after the OVN migration was completed and all of them failed due to the same issue:
- VMs connected directly to the external network (the external network called nova was created before migration).
- The test cannot connect from the via ssh to those VMs.

The reason the ssh connection fails is that the VMs were unable to connect to the metadata port when they were spawned.
The same issue affected the tenant networks that were created before the migration.
The issue does not affect networks created after the migration.


The following command shows that the dhcp_options for the network crated before the migration (f239b2ea-984f-4da0-9ee6-5527ccd202f6) does not include the classless_static_route parameter and that is the reason why the VMs cannot connect to the metadata IP (169.254.169.254), while the other network includes that route:
[root@controller-0 neutron]# podman exec -it ovn-dbs-bundle-podman-2 ovn-nbctl list dhcp_opt | grep "external_ids\|options"
external_ids        : {"neutron:revision_number"="0", subnet_id="ffad794b-28ab-4f43-861c-1e11db2b51a1"}
options             : {classless_static_route="{169.254.169.254/32,1.2.3.2, 0.0.0.0/0,1.2.3.1}", dns_server="{172.16.0.1, 10.0.0.1}", domain_name="\"openstackgate.local\"", lease_time="43200", mtu="1442", router="1.2.3.1", server_id="1.2.3.1", server_mac="fa:16:3e:65:c0:88"}
external_ids        : {"neutron:revision_number"="0", subnet_id="d80e1fc0-3b1e-46c6-a904-e2d3b4a0b64c"}
options             : {dns_server="{172.16.0.1, 10.0.0.1}", domain_name="\"openstackgate.local\"", lease_time="43200", mtu="1442", router="192.168.168.1", server_id="192.168.168.1", server_mac="fa:16:3e:57:0e:9f"}



According to the ovn-db-sync logs, the metadata port for the external network was created, but it could not be found [2]:
2021-04-29 16:49:59.207 60 WARNING networking_ovn.ovn_db_sync [req-a515e376-7b76-4c70-9de0-d4525a4ae750 - - - - -] Metadata port 40b802e8-ec99-4bb9-9543-446f04131821 for network de06a6ac-23ca-44c6-bc03-115b48818cc5 found in Neutron but not in OVN
2021-04-29 16:49:59.207 60 WARNING networking_ovn.ovn_db_sync [req-a515e376-7b76-4c70-9de0-d4525a4ae750 - - - - -] Creating metadata port 40b802e8-ec99-4bb9-9543-446f04131821 for network de06a6ac-23ca-44c6-bc03-115b48818cc5 in OVN
...
2021-04-29 16:49:59.320 60 DEBUG networking_ovn.db.revision [req-a515e376-7b76-4c70-9de0-d4525a4ae750 - - - - -] create_initial_revision uuid=40b802e8-ec99-4bb9-9543-446f04131821, type=ports, rev=-1 create_initial_revision /usr/lib/python3.6/site-packages/networking_ovn/db/revision.py:59
2021-04-29 16:49:59.506 60 ERROR networking_ovn.common.ovn_client [req-a515e376-7b76-4c70-9de0-d4525a4ae750 - - - - -] Metadata port couldn't be found for network de06a6ac-23ca-44c6-bc03-115b48818cc5






Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210421.n.1


Steps to Reproduce:
1. Run ovn migration job [1]


How reproducible:
2/3 -> build #121 did not reproduce it, builds #122 and #123 did
See this job [1], all those three builds were executed with RHOS-16.1-RHEL-8-20210421.n.1, the difference between them is that the workload created before the migration was deleted after migration at #121, but was not deleted after migration at #122 and #123 (see [3]).


[1] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ml2ovs-to-ovn-migration/
[2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ml2ovs-to-ovn-migration/123/controller-0/var/log/containers/neutron/neutron-ovn-db-sync-util.log.gz
[3] https://code.engineering.redhat.com/gerrit/#/c/238068/

Comment 2 Elvira 2021-05-11 14:02:15 UTC
The patch that fixed this is already submitted and related to https://bugzilla.redhat.com/show_bug.cgi?id=1956060, so we can consider this as a duplicate.

*** This bug has been marked as a duplicate of bug 1956060 ***