DescriptionEduardo Olivares
2021-08-17 09:12:26 UTC
Description of problem:
This issue is similar to BZ1985717, which was verified recently. I think it is a different issue because it happens very infrequently. While BZ1985717 seemed to be a deterministic issue, this bug seems to be related to some race condition.
This warning message from the VM console logs shows that the metadata service could not be contacted:
[ 38.974877] cloud-init[812]: Cloud-init v. 18.5 running 'init' at Mon, 16 Aug 2021 11:15:24 +0000. Up 28.88 seconds.
[ 38.977105] cloud-init[812]: ci-info: ++++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
[ 38.979302] cloud-init[812]: ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
[ 38.981469] cloud-init[812]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 38.983646] cloud-init[812]: ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
[ 38.985801] cloud-init[812]: ci-info: | eth0 | True | 10.46.21.241 | 255.255.255.192 | global | fa:16:3e:a2:5c:c9 |
[ 38.987963] cloud-init[812]: ci-info: | eth0 | True | fe80::f816:3eff:fea2:5cc9/64 | . | link | fa:16:3e:a2:5c:c9 |
[ 38.990136] cloud-init[812]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
[ 38.992279] cloud-init[812]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 38.994445] cloud-init[812]: ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+
[ 38.996601] cloud-init[812]: ci-info: ++++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++++
[ 38.998586] cloud-init[812]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[ 39.000690] cloud-init[812]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
[ 39.002692] cloud-init[812]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[ 39.004696] cloud-init[812]: ci-info: | 0 | 0.0.0.0 | 10.46.21.254 | 0.0.0.0 | eth0 | UG |
[ 39.006704] cloud-init[812]: ci-info: | 1 | 10.46.21.192 | 0.0.0.0 | 255.255.255.192 | eth0 | U |
[ 39.008691] cloud-init[812]: ci-info: | 2 | 169.254.169.254 | 10.46.21.213 | 255.255.255.255 | eth0 | UGH |
[ 39.010681] cloud-init[812]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[ 39.012669] cloud-init[812]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[ 39.014339] cloud-init[812]: ci-info: +-------+-------------+---------+-----------+-------+
[ 39.015997] cloud-init[812]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[ 39.017661] cloud-init[812]: ci-info: +-------+-------------+---------+-----------+-------+
[ 39.019341] cloud-init[812]: ci-info: | 1 | fe80::/64 | :: | eth0 | U |
[ 39.021013] cloud-init[812]: ci-info: | 3 | ff00::/8 | :: | eth0 | U |
[ 39.022677] cloud-init[812]: ci-info: +-------+-------------+---------+-----------+-------+
[ 39.024349] cloud-init[812]: 2021-08-16 11:15:34,391 - util.py[WARNING]: No active metadata service found
The corresponding neutron port is 2bdd32d4-efea-477c-92d3-ec84cbdaa61c:
$ openstack port list --fixed-ip ip-address=10.46.21.241 -c ID -f value
2bdd32d4-efea-477c-92d3-ec84cbdaa61c
For some reason, that port was bound and unbound from the chassis for several times. That took more than a minute and I think this is the reason why the metadata information for this port was not ready yet:
2021-08-16 11:14:12.734 111507 INFO networking_ovn.agent.metadata.agent [-] Port 14f0abf0-4704-40e7-8a02-2b58bf584fa0 in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:14:13.473 111507 INFO networking_ovn.agent.metadata.agent [-] Port 65446967-02b4-4a76-991d-a1c2ec1410d2 in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:14:13.483 111507 INFO networking_ovn.agent.metadata.agent [-] Cleaning up ovnmeta-7e3c462c-7771-49ff-afb9-e6d7ecd620c6 namespace which is not needed anymore
2021-08-16 11:14:46.288 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
2021-08-16 11:14:48.616 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:14:48.802 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
2021-08-16 11:14:48.984 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:14:49.155 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
2021-08-16 11:14:49.330 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
...
2021-08-16 11:15:36.888 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 unbound from our chassis
2021-08-16 11:15:37.056 111507 INFO networking_ovn.agent.metadata.agent [-] Port 2bdd32d4-efea-477c-92d3-ec84cbdaa61c in datapath 7e3c462c-7771-49ff-afb9-e6d7ecd620c6 bound to our chassis
When I rebooted the VM, the metadata information was obtained successfully:
[ 42.184307] cloud-init[787]: Cloud-init v. 18.5 running 'init' at Mon, 16 Aug 2021 11:29:40 +0000. Up 30.23 seconds.
...
[[0;32m OK [0m] Started Initial cloud-init job (metadata service crawler).
As we can see in the ovn-metadata-agent logs, several requests were received from the VM IP:
2021-08-16 11:29:41.426 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack HTTP/1.1" status: 200 len: 230 time: 0.4844251
2021-08-16 11:29:41.907 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack HTTP/1.1" status: 200 len: 230 time: 0.4769990
2021-08-16 11:29:43.014 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/meta_data.json HTTP/1.1" status: 200 len: 2187 time: 1.1038783
2021-08-16 11:29:43.514 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/user_data HTTP/1.1" status: 200 len: 135 time: 0.4969020
2021-08-16 11:29:44.043 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/vendor_data.json HTTP/1.1" status: 200 len: 128 time: 0.5247002
2021-08-16 11:29:45.139 111911 INFO eventlet.wsgi.server [-] 10.46.21.241,<local> "GET /openstack/2018-08-27/network_data.json HTTP/1.1" status: 200 len: 451 time: 1.0909605
...
I will provide the full console logs from two VMs that were spawned almost simultaneously and failed to contact the metadata service and the ovn metadata agent log file.
Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20210811.n.1
[root@controller-1 ~]# podman exec -ituroot ovn_metadata_agent rpm -qa | grep ovn
python3-networking-ovn-7.4.2-2.20210601204824.el8ost.10.noarch
python3-networking-ovn-metadata-agent-7.4.2-2.20210601204824.el8ost.10.noarch
[root@controller-1 ~]# podman exec -ituroot ovn_controller rpm -qa | grep ovn
ovn-2021-21.06.0-17.el8fdp.x86_64
rhosp-ovn-2021-4.el8ost.1.noarch
ovn-2021-host-21.06.0-17.el8fdp.x86_64
rhosp-ovn-host-2021-4.el8ost.1.noarch
How reproducible:
It doesn't happen often. I had to rerun all our tempest SR-IOV tests several times in order to reproduce it once.
Steps to Reproduce:
1. Create a direct type port, connected to the external network
2. Create a VM with that port
3. Check the VM could not connect to the metadata service (hostname is wrong, authorized_keys file is empty)
Actual results:
Metadata service is not accessible from a VM so SSH key can not be obtained. It is not possible to connect to VM using SSH.
Expected results:
Metadata service is accessible from VM. It is possible to connect to VM using SSH.