RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1803928 - [RHEL8.3] Race condition of starting cloud-init and NetworkManager
Summary: [RHEL8.3] Race condition of starting cloud-init and NetworkManager
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: cloud-init
Version: 8.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Eduardo Otubo
QA Contact: xiachen
URL:
Whiteboard:
Depends On:
Blocks: 1789719 1826262 1831646
TreeView+ depends on / blocked
 
Reported: 2020-02-17 17:49 UTC by Ladislav Jozsa
Modified: 2021-09-13 06:55 UTC (History)
14 users (show)

Fixed In Version: cloud-init-19.4-3.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1826262 1831646 2006693 (view as bug list)
Environment:
Last Closed: 2020-11-04 02:40:39 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ladislav Jozsa 2020-02-17 17:49:59 UTC
Description of problem:
A race condition occurs when starting cloud-init. Sometimes it's started before NetworkManager starts resulting in unreachable instance when running in OpenStack.

Version-Release number of selected component (if applicable):
cloud-init-18.5-9.src.rpm (RHEL-8.2.0-20200207.n.0)

How reproducible:
sometimes

Steps to Reproduce:
1. Launch an instance on OpenStack (http://rhos-d.infra.prod.upshift.rdu2.redhat.com) with RHEL-8.2.0-20200207.n.0 image, can be the same as soon as cloud-init matches
2. Attempt to log in (or check openstack console log show <uuid>)


Actual results:
Sometimes we get into non-working state:

[[0;32m  OK  [0m] Reached target Network.
         Starting Dynamic System Tuning Daemon...
         Starting Hostname Service...
[[0;32m  OK  [0m] Started Hostname Service.
         Starting Network Manager Script Dispatcher Service...
[   22.082582] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   22.092830] cloud-init[797]: Cloud-init v. 18.5 running 'init' at Tue, 11 Feb 2020 17:24:51 +0000. Up 21.99 seconds.
[   22.096952] cloud-init[797]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
[   22.100207] cloud-init[797]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   22.103408] cloud-init[797]: ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
[   22.106786] cloud-init[797]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   22.109515] cloud-init[797]: ci-info: |  eth0  | False |     .     |     .     |   .   | fa:16:3e:6d:fe:0c |
[   22.112389] cloud-init[797]: ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
[   22.115274] cloud-init[797]: ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
[   22.118561] cloud-init[797]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   22.121644] cloud-init[797]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   22.124375] cloud-init[797]: ci-info: +-------+-------------+---------+-----------+-------+
[   22.127132] cloud-init[797]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   22.129836] cloud-init[797]: ci-info: +-------+-------------+---------+-----------+-------+
[   22.132641] cloud-init[797]: ci-info: +-------+-------------+---------+-----------+-------+
[   22.135363] cloud-init[797]: 2020-02-11 17:24:51,261 - util.py[WARNING]: No active metadata service found
[[0;32m  OK  [0m] Started Network Manager Script Dispatcher Service.
[[0;32m  OK  [0m] Started Network Manager Wait Online.
[[0;32m  OK  [0m] Started Dynamic System Tuning Daemon.
[[0;32m  OK  [0m] Started Initial cloud-init job (metadata service crawler).
         Starting OpenSSH server daemon...
[[0;32m  OK  [0m] Reached target Network is Online.
         Starting System Logging Service...
         Starting Crash recovery kernel arming...
         Starting Notify NFS peers of a restart...
         Starting Permit User Sessions...


Sometimes it works:

[[0;32m  OK  [0m] Started Hostname Service.
         Starting Network Manager Script Dispatcher Service...
[   11.516135] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[[0;32m  OK  [0m] Started Network Manager Script Dispatcher Service.
[[0;32m  OK  [0m] Started Network Manager Wait Online.
[[0;32m  OK  [0m] Started Dynamic System Tuning Daemon.
[   12.360228] cloud-init[742]: Cloud-init v. 18.5 running 'init' at Wed, 12 Feb 2020 11:05:46 +0000. Up 11.97 seconds.
[   12.363435] cloud-init[742]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
[   12.366925] cloud-init[742]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+
[   12.370133] cloud-init[742]: ci-info: | Device |  Up  |           Address            |      Mask     | Scope  |     Hw-Address    |
[   12.373383] cloud-init[742]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+
[   12.376795] cloud-init[742]: ci-info: |  eth0  | True |         10.0.154.137         | 255.255.252.0 | global | fa:16:3e:b5:f9:03 |
[   12.380451] cloud-init[742]: ci-info: |  eth0  | True | fe80::f816:3eff:feb5:f903/64 |       .       |  link  | fa:16:3e:b5:f9:03 |
[   12.384549] cloud-init[742]: ci-info: |   lo   | True |          127.0.0.1           |   255.0.0.0   |  host  |         .         |
[   12.387948] cloud-init[742]: ci-info: |   lo   | True |           ::1/128            |       .       |  host  |         .         |
[   12.391372] cloud-init[742]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+
[   12.394757] cloud-init[742]: ci-info: ++++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++++
[   12.397952] cloud-init[742]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[   12.401026] cloud-init[742]: ci-info: | Route |   Destination   |   Gateway    |     Genmask     | Interface | Flags |
[   12.404052] cloud-init[742]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[   12.407339] cloud-init[742]: ci-info: |   0   |     0.0.0.0     | 10.0.155.254 |     0.0.0.0     |    eth0   |   UG  |
[   12.410376] cloud-init[742]: ci-info: |   1   |    10.0.152.0   |   0.0.0.0    |  255.255.252.0  |    eth0   |   U   |
[   12.413578] cloud-init[742]: ci-info: |   2   | 169.254.169.254 | 10.0.152.41  | 255.255.255.255 |    eth0   |  UGH  |
[   12.416569] cloud-init[742]: ci-info: +-------+-----------------+--------------+-----------------+-----------+-------+
[   12.419499] cloud-init[742]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   12.421897] cloud-init[742]: ci-info: +-------+-------------+---------+-----------+-------+
[   12.424975] cloud-init[742]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   12.428249] cloud-init[742]: ci-info: +-------+-------------+---------+-----------+-------+
[   12.431559] cloud-init[742]: ci-info: |   1   |  fe80::/64  |    ::   |    eth0   |   U   |
[   12.434824] cloud-init[742]: ci-info: |   3   |   ff00::/8  |    ::   |    eth0   |   U   |
[   12.438151] cloud-init[742]: ci-info: +-------+-------------+---------+-----------+-------+
[[0;32m  OK  [0m] Started Initial cloud-init job (metadata service crawler).
         Starting Permit User Sessions...
         Starting OpenSSH server daemon...
[[0;32m  OK  [0m] Reached target Cloud-config availability.
[[0;32m  OK  [0m] Reached target Network is Online.
         Starting Crash recovery kernel arming...
         Starting System Logging Service...
         Starting Notify NFS peers of a restart...
         Starting Apply the settings specified in cloud-config...

If it is not working and we attempt to log in as cloud-user.155.12 we're getting:


cloud-user.upshift.rdu2.redhat.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Expected results:

Able to ssh in. Network manager started before cloud-init.


Additional info:
If you need any assistance with OpenStack please reach out to me or kaolivei.

Comment 34 Trivino 2020-04-21 10:33:01 UTC
rhel-guest-image-8.2-bug1803928-0420 works for me.

I disabled the workaround and configured a simple pipeline where I spawn tons of instances using the scratch image. All instances (~300 VMs) are accessible.

These are the pipelines:

https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/blue/organizations/jenkins/ftrivino-dev-pipeline/detail/bug1803928/3/pipeline/33

https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/blue/organizations/jenkins/ftrivino-dev-pipeline/detail/bug1803928/4/pipeline/


Additional info.
-----------------

The console log for a particular instance using rhel-guest-image-8.2-bug1803928-0420 image shows: 

...
...
[[0;32m  OK  [0m] Stopped OpenSSH server daemon.
[[0;32m  OK  [0m] Stopped target sshd-keygen.target.
         Stopping sshd-keygen.target.
[[0;32m  OK  [0m] Reached target sshd-keygen.target.
         Starting OpenSSH server daemon...
[[0;32m  OK  [0m] Started OpenSSH server daemon.
[   30.249663] cloud-init[942]: Cloud-init v. 18.5 running 'modules:config' at Tue, 21 Apr 2020 10:09:40 +0000. Up 29.85 seconds.
[[0;32m  OK  [0m] Started Apply the settings specified in cloud-config.
         Starting Execute cloud user/final scripts...
ci-info: +++Authorized keys from /home/cloud-user/.ssh/authorized_keys for user cloud-user+++
ci-info: +---------+-------------------------------------------------+---------+------------+
ci-info: | Keytype |                Fingerprint (md5)                | Options |  Comment   |
ci-info: +---------+-------------------------------------------------+---------+------------+
ci-info: | ssh-rsa | 35:e4:6b:01:0d:6f:9b:66:d0:5b:bc:65:09:c7:eb:b1 |    -    | jslave-idm |
ci-info: +---------+-------------------------------------------------+---------+------------+
<14>Apr 21 06:09:42 ec2: 
<14>Apr 21 06:09:42 ec2: #############################################################
<14>Apr 21 06:09:42 ec2: -----BEGIN SSH HOST KEY FINGERPRINTS-----
<14>Apr 21 06:09:42 ec2: 256 SHA256:poGl7+PHFJZ55uH7WYK6dTrXoZEmOmgVXMhaRaaQWI0 no comment (ECDSA)
<14>Apr 21 06:09:42 ec2: 256 SHA256:3R5CUaSIsWWiexq/bGKMHGBj1ANteO+AtXsZpYXXMu8 no comment (ED25519)
<14>Apr 21 06:09:42 ec2: 3072 SHA256:mN/JhUeBlvnS2+EBAjJbQFyALPjp6bMd6iWJQlHBMS8 no comment (RSA)
<14>Apr 21 06:09:42 ec2: -----END SSH HOST KEY FINGERPRINTS-----
<14>Apr 21 06:09:42 ec2: #############################################################
-----BEGIN SSH HOST KEY KEYS-----
...
...
-----END SSH HOST KEY KEYS-----
[   31.199284] cloud-init[1343]: Cloud-init v. 18.5 running 'modules:final' at Tue, 21 Apr 2020 10:09:41 +0000. Up 30.94 seconds.
[   31.204040] cloud-init[1343]: Cloud-init v. 18.5 finished at Tue, 21 Apr 2020 10:09:42 +0000. Datasource DataSourceOpenStack [net,ver=2].  Up 31.17 seconds
[   31.282796] echo[1911]: try restart NetworkManager.service
[[0;32m  OK  [0m] Stopped Network Manager Wait Online.
         Stopping Network Manager Wait Online...
         Stopping Network Manager...
[[0;32m  OK  [0m] Stopped Network Manager.
         Starting Network Manager...
[[0;32m  OK  [0m] Started Network Manager.
         Starting Network Manager Wait Online...
[[0;32m  OK  [0m] Started Execute cloud user/final scripts.
[[0;32m  OK  [0m] Reached target Cloud-init target.
[[0;32m  OK  [0m] Started Network Manager Wait Online.

Red Hat Enterprise Linux 8.2 (Ootpa)
Kernel 4.18.0-193.el8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

bug1 login: 


ftrivino@t480s:~>openstack server show bug1.ipadomain.test_c7aef6 

+-----------------------------+-----------------------------------------------------------------------------+
| Field                       | Value                                                                       |
+-----------------------------+-----------------------------------------------------------------------------+
| OS-DCF:diskConfig           | MANUAL                                                                      |
| OS-EXT-AZ:availability_zone | nova                                                                        |
| OS-EXT-STS:power_state      | Running                                                                     |
| OS-EXT-STS:task_state       | None                                                                        |
| OS-EXT-STS:vm_state         | active                                                                      |
| OS-SRV-USG:launched_at      | 2020-04-21T10:09:07.000000                                                  |
| OS-SRV-USG:terminated_at    | None                                                                        |
| accessIPv4                  |                                                                             |
| accessIPv6                  |                                                                             |
| addresses                   | provider_net_cci_6=10.0.154.218                                             |
| config_drive                |                                                                             |
| created                     | 2020-04-21T10:08:50Z                                                        |
| flavor                      | ci.m1.medium (df66ba33-2528-4cdd-9a01-f770961bc25b)                         |
| hostId                      | 6589ef6f29b0136fa57af8a7856b997e2af00991cd8e33a32dbf6173                    |
| id                          | 67cdda7f-07bd-4408-b42f-9540cce6f562                                        |
| image                       | rhel-guest-image-8.2-bug1803928-0420 (9cfc3340-bf9d-412a-a4a9-ef3aab889fd4) |
| key_name                    | idm-jenkins                                                                 |
| name                        | bug1.ipadomain.test_c7aef6                                                  |
| progress                    | 0                                                                           |
| project_id                  | d11b9f000a1a4682905f244022d9c995                                            |
| properties                  |                                                                             |
| security_groups             | name='default'                                                              |
| status                      | ACTIVE                                                                      |
| updated                     | 2020-04-21T10:09:07Z                                                        |
| user_id                     | 8168cf6928354163271027d8f7eea126703fec4f1717195acbef886b4db3b03e            |
| volumes_attached            |                                                                             |
+-----------------------------+-----------------------------------------------------------------------------+

Comment 38 Rick Barry 2020-04-21 13:00:11 UTC
*** Bug 1789719 has been marked as a duplicate of this bug. ***

Comment 50 Trivino 2020-06-10 14:42:39 UTC
I have tested RHEL-8.3.0-x86_64-nightly-latest that points to RHEL-8.3.0-20200608.n.0 (one compose after the one is solving the problem) with _no WA_ here:

https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/blue/organizations/jenkins/load_pipeline/detail/master/37/pipeline/28

the test provisioned > 40 VMs with no issues at all.

I verify the bug fix is in place and works fine.

Comment 54 errata-xmlrpc 2020-11-04 02:40:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: cloud-init security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4650


Note You need to log in before you can comment on or make changes to this bug.