Bug 2005507

Summary: SNO spoke cluster failing to reach coreos.live.rootfs_url is missing url in console
Product: OpenShift Container Platform Reporter: Chad Crum <ccrum>
Component: RHCOSAssignee: Joseph Marrero <jmarrero>
Status: CLOSED ERRATA QA Contact: HuijingHei <hhei>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.9CC: aos-bugs, bgilbert, ccrum, dornelas, fpercoco, hhei, jligon, mfilanov, mrussell, nstielau, rfreiman, trwest, yfirst
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:11:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2016008    
Bug Blocks:    

Description Chad Crum 2021-09-17 20:08:43 UTC
Description of problem:
In the case of a SNO cluster that is booting from a discovery iso, if the RHCOS image host is not reachable, the console outputs "coreos.live.rootfs_url= " , with the value being empty. In this case the value was definitely set, confirmed by checking the discovery iso, with the issue being connectivity between the image mirror and the spoke node.

This adds confusion when troubleshooting as one may think the issue is that the url is missing from the discovery iso, when that is not the case.

Version-Release number of selected component (if applicable):
assisted index image master git_revision: 34c986435c1e85776b2bcb0544eab2e6e77a2c2b

Hub/spoke 4.9.0-rc.0

How reproducible:
100%

Steps to Reproduce:
1. Start a spoke cluster from discovery image, but make the rhcos iso url unreachable from the spoke node
2. Check the spoke node console

Actual results:
"Couldn't establish connectivity with the server specified by coreos.live.rootfs_url= "

Expected results:
"Couldn't establish connectivity with the server specified by coreos.live.rootfs_url=https://the-url-to-the-iso-image"


Additional info:

Comment 3 Pawan Pinjarkar 2021-10-12 16:12:08 UTC
This looks to be an issue from RHCOS. I don't see any reference in our code regarding this error.

Comment 5 Flavio Percoco 2021-10-21 13:39:21 UTC
This is an issue in fedora-coreos-config:

https://github.com/coreos/fedora-coreos-config/blob/818e650768998c5462e23c0a10d31cb8567fd7c9/overlay.d/05core/usr/lib/dracut/modules.d/35coreos-live/coreos-livepxe-rootfs.sh#L34-L60

(not sure if `fedora-coreos` is the right component here)

Comment 6 Joseph Marrero 2021-10-21 19:56:13 UTC
Proposed fix, under test/review: https://github.com/coreos/fedora-coreos-config/pull/1302

Comment 7 HuijingHei 2021-10-28 09:38:15 UTC
Seems this bug miss the `Bootimage bump tracker` bug, and PR is merged to https://github.com/openshift/os/pull/658

Build locally with latest repo os, and test with ipxe libvirt according to https://dustymabe.com/2019/09/13/update-on-easy-pxe-boot-testing-post-minus-pxelinux/


1) Test with coreos.live.rootfs_url=http://abcdddd.eeeee.com/ddddd

[   38.865316] coreos-livepxe-rootfs[704]: Couldn't establish connectivity with the server specified by:
[   38.882251] coreos-livepxe-rootfs[704]: coreos.live.rootfs_url=http://abcdddd.eeeee.com/ddddd
[   38.897457] coreos-livepxe-rootfs[704]: Retrying in 5s...


2) Test with coreos.live.rootfs_url=ftp://abcdddd.eeeee.com/ddddd

-- Logs begin at Thu 2021-10-28 07:55:58 UTC, end at Thu 2021-10-28 07:56:10 UTC. --
Oct 28 07:56:08 systemd[1]: Starting Acquire live PXE rootfs image...
Oct 28 07:56:08 coreos-livepxe-rootfs[693]: Fetching rootfs image from ftp://abcdddd.eeeee.com/ddddd...
Oct 28 07:56:08 coreos-livepxe-rootfs[693]: Unsupported scheme for image specified by:
Oct 28 07:56:08 coreos-livepxe-rootfs[693]: coreos.live.rootfs_url=ftp://abcdddd.eeeee.com/ddddd
Oct 28 07:56:08 coreos-livepxe-rootfs[693]: Only HTTP and HTTPS are supported. Please fix your PXE configuration.
Oct 28 07:56:08 systemd[1]: coreos-livepxe-rootfs.service: Main process exited, code=exited, status=1/FAILURE
Oct 28 07:56:08 systemd[1]: coreos-livepxe-rootfs.service: Failed with result 'exit-code'.
Oct 28 07:56:08 systemd[1]: Failed to start Acquire live PXE rootfs image.
Oct 28 07:56:08 systemd[1]: coreos-livepxe-rootfs.service: Triggering OnFailure= dependencies.


3) Test with coreos.live.rootfs_url=http://192.168.122.1/testfile

-- Logs begin at Thu 2021-10-28 08:04:26 UTC, end at Thu 2021-10-28 08:04:37 UTC. --
Oct 28 08:04:35 coreos-livepxe-rootfs[692]: Fetching rootfs image from http://192.168.122.1/testfile...
Oct 28 08:04:35 systemd[1]: Starting Acquire live PXE rootfs image...
Oct 28 08:04:35 coreos-livepxe-rootfs[692]: Error: hash mismatch at offset 0; expected bb53568a974f88beaf5f52f81d1df38d2a7a8643c4ec42c3753477ccb9bb565e, found 80c3fe2ae1062abf56456f52518bd670f9ec3917b7f85e152b347ac6b6faf880
Oct 28 08:04:36 coreos-livepxe-rootfs[692]: Couldn't fetch, verify, and unpack image specified by:
Oct 28 08:04:36 coreos-livepxe-rootfs[692]: coreos.live.rootfs_url=http://192.168.122.1/testfile
Oct 28 08:04:36 coreos-livepxe-rootfs[692]: Check that the URL is correct and that the rootfs version matches the initramfs.


4) Test with no parameter coreos.live.rootfs_url

Oct 28 09:23:55 coreos-livepxe-rootfs[692]: No rootfs image found.  Modify your PXE configuration to add the rootfs
Oct 28 09:23:55 coreos-livepxe-rootfs[692]: image as a second initrd, or use the coreos.live.rootfs_url kernel parameter
Oct 28 09:23:55 coreos-livepxe-rootfs[692]: to specify an HTTP or HTTPS URL to the rootfs.

Comment 8 RHCOS Bug Bot 2021-10-28 12:34:26 UTC
This bug has been reported fixed in a new RHCOS build and is ready for QE verification.  To mark the bug verified, set the Verified field to Tested.  This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.

Comment 9 Micah Abbott 2021-10-28 12:34:48 UTC
The fix for this landed in RHCOS 410.84.202110261839-0 and can be verified

Comment 10 RHCOS Bug Bot 2021-10-28 13:37:43 UTC
The fix for this bug will not be delivered to customers until it lands in an updated bootimage.  That process is tracked in bug 2016008, which has status ASSIGNED.  Moving this bug back to POST.

Comment 11 HuijingHei 2021-10-29 02:22:08 UTC
Thanks Micah!
Pre-verify passed with 410.84.202110261839-0, result is the same as Comment #7, change Verified status to tested.

Comment 12 RHCOS Bug Bot 2021-11-13 05:35:57 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2016008 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 14 HuijingHei 2021-11-15 10:37:12 UTC
As dependent bootimage bump BZ#2016008 is VERIFIED
Verify passed with RHCOS 410.84.202111111322-0, results are same as Comment 7

Comment 17 errata-xmlrpc 2022-03-10 16:11:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056