Bug 1994427 - [OVN] SR-IOV port cannot connect to metadata
Summary: [OVN] SR-IOV port cannot connect to metadata
Keywords:
Status: ON_DEV
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: Elvira
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 2196286 1974898 2189267
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 09:12 UTC by Eduardo Olivares
Modified: 2023-08-16 08:03 UTC (History)
9 users (show)

Fixed In Version: ovn22.03-22.03.0-95.el8fdp ovn22.03-22.03.0-95.el9fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-7124 0 None None None 2021-11-15 12:57:02 UTC

Description Eduardo Olivares 2021-08-17 09:12:26 UTC
Description of problem:
This issue is similar to BZ1985717, which was verified recently. I think it is a different issue because it happens very infrequently. While BZ1985717 seemed to be a deterministic issue, this bug seems to be related to some race condition.

This warning message from the VM console logs shows that the metadata service could not be contacted:
[   38.974877] cloud-init[812]: Cloud-init v. 18.5 running 'init' at Mon, 16 Aug 2021 11:15:24 +0000. Up 28.88 seconds.
[   38.977105] cloud-init[812]: ci-info: ++++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
[   38.979302] cloud-init[812]: ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
[   38.981469] cloud-init[812]: ci-info: | Device |  Up  |           Address            |       Mask      | Scope  |     Hw-Address    |
[   38.983646] cloud-init[812]: ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
[   38.985801] cloud-init[812]: ci-info: |  eth0  | True |         10.46.21.241         | 255.255.255.192 | global | fa:16:3e:a2:5c:c9 |
[   38.987963] cloud-init[812]: ci-info: |  eth0  | True | fe80::f816:3eff:fea2:5cc9/64 |        .        |  link  | fa:16:3e:a2:5c:c9 |
[   38.990136] cloud-init[812]: ci-info: |   lo   | True |          127.0.0.1           |    255.0.0.0    |  host  |         .         |
[   38.992279] cloud-init[812]: ci-info: |   lo   | True |           ::1/128            |        .        |  host  |         .         |
[   38.994445] cloud-init[812]: ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
[   38.996601] cloud-init[812]: ci-info: ++++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++++
[   38.998586] cloud-init[812]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[   39.000690] cloud-init[812]: ci-info: | Route |   Destination   |   Gateway    |     Genmask     | Interface | Flags |
[   39.002692] cloud-init[812]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[   39.004696] cloud-init[812]: ci-info: |   0   |     0.0.0.0     | 10.46.21.254 |     0.0.0.0     |    eth0   |   UG  |
[   39.006704] cloud-init[812]: ci-info: |   1   |   10.46.21.192  |   0.0.0.0    | 255.255.255.192 |    eth0   |   U   |
[   39.008691] cloud-init[812]: ci-info: |   2   | 169.254.169.254 | 10.46.21.213 | 255.255.255.255 |    eth0   |  UGH  |
[   39.010681] cloud-init[812]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[   39.012669] cloud-init[812]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   39.014339] cloud-init[812]: ci-info: +-------+-------------+---------+-----------+-------+
[   39.015997] cloud-init[812]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   39.017661] cloud-init[812]: ci-info: +-------+-------------+---------+-----------+-------+
[   39.019341] cloud-init[812]: ci-info: |   1   |  fe80::/64  |    ::   |    eth0   |   U   |
[   39.021013] cloud-init[812]: ci-info: |   3   |   ff00::/8  |    ::   |    eth0   |   U   |
[   39.022677] cloud-init[812]: ci-info: +-------+-------------+---------+-----------+-------+
[   39.024349] cloud-init[812]: 2021-08-16 11:15:34,391 - util.py[WARNING]: No active metadata service found



The corresponding neutron port is 2bdd32d4-efea-477c-92d3-ec84cbdaa61c:
$ openstack port list --fixed-ip ip-address=10.46.21.241 -c ID -f value
2bdd32d4-efea-477c-92d3-ec84cbdaa61c


For some reason, that port was bound and unbound from the chassis for several times. That took more than a minute and I think this is the reason why the metadata information for this port was not ready yet:
2021-08-16 11:14:12.734 111507 INFO networking_ovn.agent.metadata.agent [-] Port 14f0abf0-4704-40e7-8a02-2b58bf584fa0 in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis                                              
2021-08-16 11:14:13.473 111507 INFO networking_ovn.agent.metadata.agent [-] Port 65446967-02b4-4a76-991d-a1c2ec1410d2 in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis                                              
2021-08-16 11:14:13.483 111507 INFO networking_ovn.agent.metadata.agent [-] Cleaning up ovnmeta-7e3c462c-7771-49ff-afb9-e6d7ecd620c6 namespace which is not needed anymore                                                                   
2021-08-16 11:14:46.288 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
2021-08-16 11:14:48.616 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:14:48.802 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
2021-08-16 11:14:48.984 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:14:49.155 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
2021-08-16 11:14:49.330 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
...
2021-08-16 11:15:36.888 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:15:37.056 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis



When I rebooted the VM, the metadata information was obtained successfully:
[   42.184307] cloud-init[787]: Cloud-init v. 18.5 running 'init' at Mon, 16 Aug 2021 11:29:40 +0000. Up 30.23 seconds.
...
[[0;32m  OK  [0m] Started Initial cloud-init job (metadata service crawler).

As we can see in the ovn-metadata-agent logs, several requests were received from the VM IP:
2021-08-16 11:29:41.426 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack HTTP/1.1" status: 200  len: 230 time: 0.4844251                                                                                            
2021-08-16 11:29:41.907 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack HTTP/1.1" status: 200  len: 230 time: 0.4769990                                                                                            
2021-08-16 11:29:43.014 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/meta_data.json HTTP/1.1" status: 200  len: 2187 time: 1.1038783
2021-08-16 11:29:43.514 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/user_data HTTP/1.1" status: 200  len: 135 time: 0.4969020
2021-08-16 11:29:44.043 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/vendor_data.json HTTP/1.1" status: 200  len: 128 time: 0.5247002
2021-08-16 11:29:45.139 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/network_data.json HTTP/1.1" status: 200  len: 451 time: 1.0909605
...





I will provide the full console logs from two VMs that were spawned almost simultaneously and failed to contact the metadata service and the ovn metadata agent log file.






Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20210811.n.1
[root@controller-1 ~]# podman exec -ituroot ovn_metadata_agent rpm -qa | grep ovn
python3-networking-ovn-7.4.2-2.20210601204824.el8ost.10.noarch
python3-networking-ovn-metadata-agent-7.4.2-2.20210601204824.el8ost.10.noarch
[root@controller-1 ~]# podman exec -ituroot ovn_controller rpm -qa | grep ovn                                                                                                                                                                
ovn-2021-21.06.0-17.el8fdp.x86_64
rhosp-ovn-2021-4.el8ost.1.noarch
ovn-2021-host-21.06.0-17.el8fdp.x86_64
rhosp-ovn-host-2021-4.el8ost.1.noarch


How reproducible:
It doesn't happen often. I had to rerun all our tempest SR-IOV tests several times in order to reproduce it once.

Steps to Reproduce:
1. Create a direct type port, connected to the external network
2. Create a VM with that port
3. Check the VM could not connect to the metadata service (hostname is wrong, authorized_keys file is empty)

Actual results:
Metadata service is not accessible from a VM so SSH key can not be obtained. It is not possible to connect to VM using SSH.

Expected results:
Metadata service is accessible from VM. It is possible to connect to VM using SSH.


Note You need to log in before you can comment on or make changes to this bug.