Bug 1983773

Summary: [4.8] coreos-installer fails to download Ignition (DNS error, failed to lookup address)
Product: OpenShift Container Platform Reporter: Jonathan Lebon <jlebon>
Component: RHCOSAssignee: Jonathan Lebon <jlebon>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: low    
Version: 4.7CC: aivaras.laimikis, bgalvani, bgilbert, chdeshpa, dornelas, dustymabe, hhei, jlebon, jligon, jnordell, lucab, miabbott, mnguyen, mrussell, nstielau
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: NetworkManager-wait-online.service timed out too early, preventing a connection to be established before coreos-installer started. Consequence: coreos-installer failed to fetch the Ignition config if the network took too long to come up. Fix: The NetworkManager-wait-online.service time out has been increased to its default upstream value. Result: coreos-installer no longer fails to fetch Ignition config since it only runs after networking is up.
Story Points: ---
Clone Of: 1967483 Environment:
Last Closed: 2022-06-30 16:35:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1967483, 1991712, 2006965    
Bug Blocks: 1983774    

Comment 1 Micah Abbott 2021-08-09 18:47:49 UTC
*** Bug 1991712 has been marked as a duplicate of this bug. ***

Comment 2 RHCOS Bug Bot 2021-09-02 16:36:08 UTC
This bug has been reported fixed in a new RHCOS build.  Do not move this bug to MODIFIED until the fix has landed in a new bootimage.

Comment 3 RHCOS Bug Bot 2021-09-28 14:05:25 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 1982001 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 6 Michael Nguyen 2021-10-01 14:39:28 UTC
The timeout is still being overrided on 48.84.202109241901-0. 

[core@localhost 35coreos-live]$ ls | grep nm-wait-online
coreos-liveiso-reconfigure-nm-wait-online.service
[core@localhost 35coreos-live]$ grep -R nm-wait-online
live-generator:add_requires coreos-liveiso-reconfigure-nm-wait-online.service initrd.target
module-setup.sh:    inst_simple "$moddir/coreos-liveiso-reconfigure-nm-wait-online.service" \
module-setup.sh:        "$systemdsystemunitdir/coreos-liveiso-reconfigure-nm-wait-online.service"
[core@localhost 35coreos-live]$ rpm-ostree status
State: idle
Deployments:
* ostree://13c18da5e6fee09fade484c3903209730cbb73e9ebcab806b9e9000cf97fd719
                   Version: 48.84.202109241901-0 (2021-09-24T19:04:29Z)
rvice | grep ExecStartos-live]$ cat coreos-liveiso-reconfigure-nm-wait-online.ser
# Right now we are keeping the same ExecStart but we are making it
ExecStartPre=/usr/bin/mkdir -p /run/systemd/system/NetworkManager-wait-online.service.d
ExecStart=/bin/bash -c 'echo -e "[Service]\nExecStart=\nExecStart=-/usr/bin/nm-online -s -q --timeout=5" > /run/systemd/system/NetworkManager-wait-online.service.d/liveiso.conf'

Comment 8 HuijingHei 2021-10-11 13:56:14 UTC
Pretest with RHCOS 48.84.202110072059-0 which includes the fixed patch

[core@cosa-devsh 35coreos-live]$ ls | grep nm-wait-online
[core@cosa-devsh 35coreos-live]$ pwd
/usr/lib/dracut/modules.d/35coreos-live
[core@cosa-devsh 35coreos-live]$ cat live-generator | grep nm-wait-online
[core@cosa-devsh 35coreos-live]$ cat module-setup.sh | grep nm-wait-online

$ cd ../35coreos-multipath
$ grep -E "^After|OnFailure" coreos-propagate-multipath-conf.service
After=initrd-root-fs.target
OnFailure=emergency.target
OnFailureJobMode=isolate

[core@cosa-devsh 35coreos-live]$ rpm-ostree status
State: idle
Deployments:
* ostree://1eabb5b58514f98afc3a2b31970e66ac34a18109f8f219dc0499944b10753bf8
                   Version: 48.84.202110072059-0 (2021-10-07T21:02:47Z)

Comment 9 RHCOS Bug Bot 2022-06-17 15:26:01 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2006965 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 11 HuijingHei 2022-06-21 12:00:33 UTC
Verify passed with rhcos-48.84.202206172122-0-qemu.x86_64.qcow2 according to steps in Comment #8

[core@cosa-devsh ~]$ cd /usr/lib/dracut/modules.d/35coreos-live

[core@cosa-devsh 35coreos-live]$ ls | grep nm-wait-online
[core@cosa-devsh 35coreos-live]$ cat live-generator | grep nm-wait-online
[core@cosa-devsh 35coreos-live]$ cat module-setup.sh | grep nm-wait-online
[core@cosa-devsh 35coreos-live]$ cd ../35coreos-multipath
[core@cosa-devsh 35coreos-multipath]$ grep -E "^After|OnFailure" coreos-propagate-multipath-conf.service
After=initrd-root-fs.target
OnFailure=emergency.target
OnFailureJobMode=isolate
[core@cosa-devsh 35coreos-multipath]$ rpm-ostree status
State: idle
Deployments:
● ostree://170ace4e7eb28e850782ecb4532cab0c53dfbf33748dbfb18ad4ec69b19cc255
                   Version: 48.84.202206172122-0 (2022-06-17T21:25:43Z)

Comment 14 errata-xmlrpc 2022-06-30 16:35:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.45 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5167