+++ This bug was initially created as a clone of Bug #1803928 +++ The same race condition as the one described in BZ#1803928 with cloud-init for RHEL8 happens on Fedora 34 with cloud-init-20.4-2.fc34.noarch +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Description of problem: A race condition occurs when starting cloud-init. Sometimes it's started before NetworkManager starts resulting in unreachable instance when running in OpenStack. Version-Release number of selected component (if applicable): cloud-init-18.5-9.src.rpm (RHEL-8.2.0-20200207.n.0) How reproducible: sometimes Steps to Reproduce: 1. Launch an instance on OpenStack (http://rhos-d.infra.prod.upshift.rdu2.redhat.com) with RHEL-8.2.0-20200207.n.0 image, can be the same as soon as cloud-init matches 2. Attempt to log in (or check openstack console log show <uuid>) Actual results: Sometimes we get into non-working state: [[0;32m OK [0m] Reached target Network. Starting Dynamic System Tuning Daemon... Starting Hostname Service... [[0;32m OK [0m] Started Hostname Service. Starting Network Manager Script Dispatcher Service... [ 22.082582] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 22.092830] cloud-init[797]: Cloud-init v. 18.5 running 'init' at Tue, 11 Feb 2020 17:24:51 +0000. Up 21.99 seconds. [ 22.096952] cloud-init[797]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++ [ 22.100207] cloud-init[797]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+ [ 22.103408] cloud-init[797]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 22.106786] cloud-init[797]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+ [ 22.109515] cloud-init[797]: ci-info: | eth0 | False | . | . | . | fa:16:3e:6d:fe:0c | [ 22.112389] cloud-init[797]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | [ 22.115274] cloud-init[797]: ci-info: | lo | True | ::1/128 | . | host | . | [ 22.118561] cloud-init[797]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+ [ 22.121644] cloud-init[797]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ [ 22.124375] cloud-init[797]: ci-info: +-------+-------------+---------+-----------+-------+ [ 22.127132] cloud-init[797]: ci-info: | Route | Destination | Gateway | Interface | Flags | [ 22.129836] cloud-init[797]: ci-info: +-------+-------------+---------+-----------+-------+ [ 22.132641] cloud-init[797]: ci-info: +-------+-------------+---------+-----------+-------+ [ 22.135363] cloud-init[797]: 2020-02-11 17:24:51,261 - util.py[WARNING]: No active metadata service found [[0;32m OK [0m] Started Network Manager Script Dispatcher Service. [[0;32m OK [0m] Started Network Manager Wait Online. [[0;32m OK [0m] Started Dynamic System Tuning Daemon. [[0;32m OK [0m] Started Initial cloud-init job (metadata service crawler). Starting OpenSSH server daemon... [[0;32m OK [0m] Reached target Network is Online. Starting System Logging Service... Starting Crash recovery kernel arming... Starting Notify NFS peers of a restart... Starting Permit User Sessions... Sometimes it works: [[0;32m OK [0m] Started Hostname Service. Starting Network Manager Script Dispatcher Service... [ 11.516135] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [[0;32m OK [0m] Started Network Manager Script Dispatcher Service. [[0;32m OK [0m] Started Network Manager Wait Online. [[0;32m OK [0m] Started Dynamic System Tuning Daemon. [ 12.360228] cloud-init[742]: Cloud-init v. 18.5 running 'init' at Wed, 12 Feb 2020 11:05:46 +0000. Up 11.97 seconds. [ 12.363435] cloud-init[742]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++ [ 12.366925] cloud-init[742]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+ [ 12.370133] cloud-init[742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 12.373383] cloud-init[742]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+ [ 12.376795] cloud-init[742]: ci-info: | eth0 | True | 10.0.154.137 | 255.255.252.0 | global | fa:16:3e:b5:f9:03 | [ 12.380451] cloud-init[742]: ci-info: | eth0 | True | fe80::f816:3eff:feb5:f903/64 | . | link | fa:16:3e:b5:f9:03 | [ 12.384549] cloud-init[742]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | [ 12.387948] cloud-init[742]: ci-info: | lo | True | ::1/128 | . | host | . | [ 12.391372] cloud-init[742]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+ [ 12.394757] cloud-init[742]: ci-info: ++++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++++ [ 12.397952] cloud-init[742]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+ [ 12.401026] cloud-init[742]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | [ 12.404052] cloud-init[742]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+ [ 12.407339] cloud-init[742]: ci-info: | 0 | 0.0.0.0 | 10.0.155.254 | 0.0.0.0 | eth0 | UG | [ 12.410376] cloud-init[742]: ci-info: | 1 | 10.0.152.0 | 0.0.0.0 | 255.255.252.0 | eth0 | U | [ 12.413578] cloud-init[742]: ci-info: | 2 | 169.254.169.254 | 10.0.152.41 | 255.255.255.255 | eth0 | UGH | [ 12.416569] cloud-init[742]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+ [ 12.419499] cloud-init[742]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ [ 12.421897] cloud-init[742]: ci-info: +-------+-------------+---------+-----------+-------+ [ 12.424975] cloud-init[742]: ci-info: | Route | Destination | Gateway | Interface | Flags | [ 12.428249] cloud-init[742]: ci-info: +-------+-------------+---------+-----------+-------+ [ 12.431559] cloud-init[742]: ci-info: | 1 | fe80::/64 | :: | eth0 | U | [ 12.434824] cloud-init[742]: ci-info: | 3 | ff00::/8 | :: | eth0 | U | [ 12.438151] cloud-init[742]: ci-info: +-------+-------------+---------+-----------+-------+ [[0;32m OK [0m] Started Initial cloud-init job (metadata service crawler). Starting Permit User Sessions... Starting OpenSSH server daemon... [[0;32m OK [0m] Reached target Cloud-config availability. [[0;32m OK [0m] Reached target Network is Online. Starting Crash recovery kernel arming... Starting System Logging Service... Starting Notify NFS peers of a restart... Starting Apply the settings specified in cloud-config... If it is not working and we attempt to log in as cloud-user.155.12 we're getting: cloud-user.upshift.rdu2.redhat.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). Expected results: Able to ssh in. Network manager started before cloud-init. Additional info: If you need any assistance with OpenStack please reach out to me or kaolivei. --- Additional comment from Trivino on 2020-04-21 10:33:01 UTC --- rhel-guest-image-8.2-bug1803928-0420 works for me. I disabled the workaround and configured a simple pipeline where I spawn tons of instances using the scratch image. All instances (~300 VMs) are accessible. These are the pipelines: https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/blue/organizations/jenkins/ftrivino-dev-pipeline/detail/bug1803928/3/pipeline/33 https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/blue/organizations/jenkins/ftrivino-dev-pipeline/detail/bug1803928/4/pipeline/ Additional info. ----------------- The console log for a particular instance using rhel-guest-image-8.2-bug1803928-0420 image shows: ... ... [[0;32m OK [0m] Stopped OpenSSH server daemon. [[0;32m OK [0m] Stopped target sshd-keygen.target. Stopping sshd-keygen.target. [[0;32m OK [0m] Reached target sshd-keygen.target. Starting OpenSSH server daemon... [[0;32m OK [0m] Started OpenSSH server daemon. [ 30.249663] cloud-init[942]: Cloud-init v. 18.5 running 'modules:config' at Tue, 21 Apr 2020 10:09:40 +0000. Up 29.85 seconds. [[0;32m OK [0m] Started Apply the settings specified in cloud-config. Starting Execute cloud user/final scripts... ci-info: +++Authorized keys from /home/cloud-user/.ssh/authorized_keys for user cloud-user+++ ci-info: +---------+-------------------------------------------------+---------+------------+ ci-info: | Keytype | Fingerprint (md5) | Options | Comment | ci-info: +---------+-------------------------------------------------+---------+------------+ ci-info: | ssh-rsa | 35:e4:6b:01:0d:6f:9b:66:d0:5b:bc:65:09:c7:eb:b1 | - | jslave-idm | ci-info: +---------+-------------------------------------------------+---------+------------+ <14>Apr 21 06:09:42 ec2: <14>Apr 21 06:09:42 ec2: ############################################################# <14>Apr 21 06:09:42 ec2: -----BEGIN SSH HOST KEY FINGERPRINTS----- <14>Apr 21 06:09:42 ec2: 256 SHA256:poGl7+PHFJZ55uH7WYK6dTrXoZEmOmgVXMhaRaaQWI0 no comment (ECDSA) <14>Apr 21 06:09:42 ec2: 256 SHA256:3R5CUaSIsWWiexq/bGKMHGBj1ANteO+AtXsZpYXXMu8 no comment (ED25519) <14>Apr 21 06:09:42 ec2: 3072 SHA256:mN/JhUeBlvnS2+EBAjJbQFyALPjp6bMd6iWJQlHBMS8 no comment (RSA) <14>Apr 21 06:09:42 ec2: -----END SSH HOST KEY FINGERPRINTS----- <14>Apr 21 06:09:42 ec2: ############################################################# -----BEGIN SSH HOST KEY KEYS----- ... ... -----END SSH HOST KEY KEYS----- [ 31.199284] cloud-init[1343]: Cloud-init v. 18.5 running 'modules:final' at Tue, 21 Apr 2020 10:09:41 +0000. Up 30.94 seconds. [ 31.204040] cloud-init[1343]: Cloud-init v. 18.5 finished at Tue, 21 Apr 2020 10:09:42 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 31.17 seconds [ 31.282796] echo[1911]: try restart NetworkManager.service [[0;32m OK [0m] Stopped Network Manager Wait Online. Stopping Network Manager Wait Online... Stopping Network Manager... [[0;32m OK [0m] Stopped Network Manager. Starting Network Manager... [[0;32m OK [0m] Started Network Manager. Starting Network Manager Wait Online... [[0;32m OK [0m] Started Execute cloud user/final scripts. [[0;32m OK [0m] Reached target Cloud-init target. [[0;32m OK [0m] Started Network Manager Wait Online. Red Hat Enterprise Linux 8.2 (Ootpa) Kernel 4.18.0-193.el8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket bug1 login: ftrivino@t480s:~>openstack server show bug1.ipadomain.test_c7aef6 +-----------------------------+-----------------------------------------------------------------------------+ | Field | Value | +-----------------------------+-----------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2020-04-21T10:09:07.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | provider_net_cci_6=10.0.154.218 | | config_drive | | | created | 2020-04-21T10:08:50Z | | flavor | ci.m1.medium (df66ba33-2528-4cdd-9a01-f770961bc25b) | | hostId | 6589ef6f29b0136fa57af8a7856b997e2af00991cd8e33a32dbf6173 | | id | 67cdda7f-07bd-4408-b42f-9540cce6f562 | | image | rhel-guest-image-8.2-bug1803928-0420 (9cfc3340-bf9d-412a-a4a9-ef3aab889fd4) | | key_name | idm-jenkins | | name | bug1.ipadomain.test_c7aef6 | | progress | 0 | | project_id | d11b9f000a1a4682905f244022d9c995 | | properties | | | security_groups | name='default' | | status | ACTIVE | | updated | 2020-04-21T10:09:07Z | | user_id | 8168cf6928354163271027d8f7eea126703fec4f1717195acbef886b4db3b03e | | volumes_attached | | +-----------------------------+-----------------------------------------------------------------------------+ --- Additional comment from Rick Barry on 2020-04-21 13:00:11 UTC --- --- Additional comment from Trivino on 2020-06-10 14:42:39 UTC --- I have tested RHEL-8.3.0-x86_64-nightly-latest that points to RHEL-8.3.0-20200608.n.0 (one compose after the one is solving the problem) with _no WA_ here: https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/blue/organizations/jenkins/load_pipeline/detail/master/37/pipeline/28 the test provisioned > 40 VMs with no issues at all. I verify the bug fix is in place and works fine. --- Additional comment from errata-xmlrpc on 2020-11-04 02:40:39 UTC --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: cloud-init security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4650
This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07. Fedora Linux 34 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. Thank you for reporting this bug and we are sorry it could not be fixed.