Bug 1867091 - RHCOS 46.82.202008030340-0 not starting on baremetal when using live artifacts
Summary: RHCOS 46.82.202008030340-0 not starting on baremetal when using live artifacts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: 4.6.0
Assignee: Benjamin Gilbert
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-07 10:42 UTC by David Sanz
Modified: 2021-04-21 15:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:25:54 UTC
Target Upstream Version:


Attachments (Terms of Use)
PXE boot (141.63 KB, text/plain)
2020-08-07 10:42 UTC, David Sanz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github coreos fedora-coreos-config pull 551 0 None closed 20live: add missing coreos-livepxe-rootfs dep to persist-osmet 2021-02-01 12:34:17 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:26:16 UTC

Description David Sanz 2020-08-07 10:42:23 UTC
Created attachment 1710781 [details]
PXE boot

Description of problem:

PXE Script:

#!ipxe

kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 ip=dhcp rd.neednet=1 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst=yes coreos.inst.install_dev=sda coreos.inst.image_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-metal.x86_64.raw.gz coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions//bootstrap.ign coreos.live.rootfs_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img
initrd https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img
boot || reboot


Version-Release number of selected component (if applicable):
46.82.202008030340-0


bootstrap log as attach.

How reproducible:


Steps to Reproduce:
1.Install UPI on baremetal using live artifacts
2.
3.

Actual results:
RHCOS is not finishing installation and SSH is not up to enter node and check status

Expected results:


Additional info:

Comment 2 Micah Abbott 2020-08-07 13:58:37 UTC
You are providing an coreos.inst.image_url along with the coreos.live.rootfs_url at the same time; this looks incorrect.

@Benjamin could you help out here by providing an example ipxe script for using coreos.inst.image_url and another example using coreos.live.rootfs_url?

I think we need better docs/examples on this ASAP.


Reducing severity to medium for now, as I think this is just a misconfiguration.

Comment 3 Benjamin Gilbert 2020-08-07 14:32:32 UTC
It's legal to supply an image_url and a rootfs_url at the same time, since they serve two different purposes.  The rootfs_url provides the rootfs for the live system, and the image_url supplies the bare-metal image to be installed to disk.  However, image_url usually shouldn't be specified anymore; the live image has the ability to install its own contents, and image_url will override automatic image selection for 512b/4Kn disks.

While your kernel command line contains several deprecated arguments and specifies the initrd twice, there's nothing there that should cause a boot to fail.  Recommended config:

#!ipxe

kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img coreos.live.rootfs_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions/bootstrap.ign
boot || reboot

If you're still having trouble with that config, please post a console log.

Comment 4 Benjamin Gilbert 2020-08-07 16:21:03 UTC
It turns out there's an RHCOS bug when invoked in this configuration.  (The osmet file copy happens before the rootfs is downloaded.)  This should work as a workaround for now:

#!ipxe

kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img,https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions/bootstrap.ign
boot || reboot

(Note the two comma-separated initrds.)  But that config is a worse one in the long run, so it'd be good to switch back to using coreos.live.rootfs_url once we get it fixed.

Comment 5 David Sanz 2020-08-07 17:41:27 UTC
Tried following iPXE script (without the second line containing the initrd again, it causes a kernel panic [1])

It seems that coreos-livepxe-rootfs is not recognising the comma separated initrd

#!ipxe

kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img,https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions/mrnd-packet-46-live-7905/bootstrap.ign
initrd https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img
boot || reboot




[   11.745028] coreos-livepxe-rootfs[1213]: No rootfs image found.  Modify your PXE configuration to add the rootfs
[   11.745043] coreos-livepxe-rootfs[1213]: image as a second initrd, or use the coreos.live.rootfs_url= kernel parameter
[   11.745054] coreos-livepxe-rootfs[1213]: to specify an HTTP or HTTPS URL to the rootfs.




[1] log without the second line of initrd in the ipxe file

[    7.035427] List of all partitions:
[    7.039024] No filesystem could mount root, tried: 
[    7.039024] 
[    7.045610] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    7.054040] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.18.0-211.el8.x86_64 #1
[    7.061423] Hardware name: Dell Inc. PowerEdge R6515/04F3CJ, BIOS 1.3.1 01/31/2020
[    7.069152] Call Trace:
[    7.071715]  dump_stack+0x5c/0x80
[    7.075145]  panic+0xe7/0x2a9
[    7.078222]  mount_block_root+0x2c5/0x2e9
[    7.082346]  ? do_early_param+0x91/0x91
[    7.086294]  prepare_namespace+0x135/0x16b
[    7.090498]  kernel_init_freeable+0x22e/0x258
[    7.094964]  ? rest_init+0xaa/0xaa
[    7.098472]  kernel_init+0xa/0x104
[    7.101982]  ret_from_fork+0x22/0x40
[    7.106965] Kernel Offset: 0x7400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    7.117845] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

Comment 6 Benjamin Gilbert 2020-08-07 18:29:27 UTC
Great.  Okay, might just be best to wait for the fix.

Comment 9 Benjamin Gilbert 2020-08-08 03:33:27 UTC
Repro:

KERNEL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64
INITRD=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img
ROOTFS=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img
wget $KERNEL
wget $INITRD
qemu-img create -f qcow2 test.qcow2 16G
qemu-system-x86_64 -m 4096 -accel kvm -object rng-random,filename=/dev/urandom,id=rng0 \
	-netdev user,id=eth0 -device virtio-net-pci,netdev=eth0 -hda test.qcow2 \
	-kernel $(basename $KERNEL) -initrd $(basename $INITRD) \
	-append "coreos.live.rootfs_url=$ROOTFS coreos.inst.install_dev=/dev/sda"

If it installs Fedora CoreOS, you have the bug.

Comment 10 Micah Abbott 2020-08-11 19:49:45 UTC
Verified using 4.6.0-0.nightly-2020-08-11-151449

```
$ RELEASE=$(oc image info --output json $(oc adm release info -a ~/openshift-cluster-installs/all-the-pull-secrets.json --image-for=machine-os-content registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-11-151449) | jq -r .config.config.Labels.version)
$ KERNEL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/$RELEASE/x86_64/rhcos-$RELEASE-live-kernel-x86_64
$ INITRD=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/$RELEASE/x86_64/rhcos-$RELEASE-live-initramfs.x86_64.img
$ ROOTFS=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/$RELEASE/x86_64/rhcos-$RELEASE-live-rootfs.x86_64.img
$ wget $KERNEL
$ wget $INITRD
$ qemu-img create -f qcow2 test.qcow2 16G
$ qemu-system-x86_64 -m 4096 -accel kvm -object rng-random,filename=/dev/urandom,id=rng0 \
	-netdev user,id=eth0 -device virtio-net-pci,netdev=eth0 -hda test.qcow2 \
	-kernel $(basename $KERNEL) -initrd $(basename $INITRD) \
	-append "coreos.live.rootfs_url=$ROOTFS coreos.inst.install_dev=/dev/sda"
```

Observed RHCOS being installed in the QEMU session

Comment 12 errata-xmlrpc 2020-10-27 16:25:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.