Created attachment 1710781 [details] PXE boot Description of problem: PXE Script: #!ipxe kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 ip=dhcp rd.neednet=1 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst=yes coreos.inst.install_dev=sda coreos.inst.image_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-metal.x86_64.raw.gz coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions//bootstrap.ign coreos.live.rootfs_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img initrd https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img boot || reboot Version-Release number of selected component (if applicable): 46.82.202008030340-0 bootstrap log as attach. How reproducible: Steps to Reproduce: 1.Install UPI on baremetal using live artifacts 2. 3. Actual results: RHCOS is not finishing installation and SSH is not up to enter node and check status Expected results: Additional info:
You are providing an coreos.inst.image_url along with the coreos.live.rootfs_url at the same time; this looks incorrect. @Benjamin could you help out here by providing an example ipxe script for using coreos.inst.image_url and another example using coreos.live.rootfs_url? I think we need better docs/examples on this ASAP. Reducing severity to medium for now, as I think this is just a misconfiguration.
It's legal to supply an image_url and a rootfs_url at the same time, since they serve two different purposes. The rootfs_url provides the rootfs for the live system, and the image_url supplies the bare-metal image to be installed to disk. However, image_url usually shouldn't be specified anymore; the live image has the ability to install its own contents, and image_url will override automatic image selection for 512b/4Kn disks. While your kernel command line contains several deprecated arguments and specifies the initrd twice, there's nothing there that should cause a boot to fail. Recommended config: #!ipxe kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img coreos.live.rootfs_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions/bootstrap.ign boot || reboot If you're still having trouble with that config, please post a console log.
It turns out there's an RHCOS bug when invoked in this configuration. (The osmet file copy happens before the rootfs is downloaded.) This should work as a workaround for now: #!ipxe kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img,https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions/bootstrap.ign boot || reboot (Note the two comma-separated initrds.) But that config is a worse one in the long run, so it'd be good to switch back to using coreos.live.rootfs_url once we get it fixed.
Tried following iPXE script (without the second line containing the initrd again, it causes a kernel panic [1]) It seems that coreos-livepxe-rootfs is not recognising the comma separated initrd #!ipxe kernel https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 initrd=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img,https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img console=ttyS1,115200n8 console=tty0 coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://147.75.83.233:8000/rhcos/ignitions/mrnd-packet-46-live-7905/bootstrap.ign initrd https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img boot || reboot [ 11.745028] coreos-livepxe-rootfs[1213]: No rootfs image found. Modify your PXE configuration to add the rootfs [ 11.745043] coreos-livepxe-rootfs[1213]: image as a second initrd, or use the coreos.live.rootfs_url= kernel parameter [ 11.745054] coreos-livepxe-rootfs[1213]: to specify an HTTP or HTTPS URL to the rootfs. [1] log without the second line of initrd in the ipxe file [ 7.035427] List of all partitions: [ 7.039024] No filesystem could mount root, tried: [ 7.039024] [ 7.045610] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) [ 7.054040] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.18.0-211.el8.x86_64 #1 [ 7.061423] Hardware name: Dell Inc. PowerEdge R6515/04F3CJ, BIOS 1.3.1 01/31/2020 [ 7.069152] Call Trace: [ 7.071715] dump_stack+0x5c/0x80 [ 7.075145] panic+0xe7/0x2a9 [ 7.078222] mount_block_root+0x2c5/0x2e9 [ 7.082346] ? do_early_param+0x91/0x91 [ 7.086294] prepare_namespace+0x135/0x16b [ 7.090498] kernel_init_freeable+0x22e/0x258 [ 7.094964] ? rest_init+0xaa/0xaa [ 7.098472] kernel_init+0xa/0x104 [ 7.101982] ret_from_fork+0x22/0x40 [ 7.106965] Kernel Offset: 0x7400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7.117845] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
Great. Okay, might just be best to wait for the fix.
Repro: KERNEL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-kernel-x86_64 INITRD=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-initramfs.x86_64.img ROOTFS=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/rhcos-46.82.202008030340-0-live-rootfs.x86_64.img wget $KERNEL wget $INITRD qemu-img create -f qcow2 test.qcow2 16G qemu-system-x86_64 -m 4096 -accel kvm -object rng-random,filename=/dev/urandom,id=rng0 \ -netdev user,id=eth0 -device virtio-net-pci,netdev=eth0 -hda test.qcow2 \ -kernel $(basename $KERNEL) -initrd $(basename $INITRD) \ -append "coreos.live.rootfs_url=$ROOTFS coreos.inst.install_dev=/dev/sda" If it installs Fedora CoreOS, you have the bug.
Verified using 4.6.0-0.nightly-2020-08-11-151449 ``` $ RELEASE=$(oc image info --output json $(oc adm release info -a ~/openshift-cluster-installs/all-the-pull-secrets.json --image-for=machine-os-content registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-11-151449) | jq -r .config.config.Labels.version) $ KERNEL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/$RELEASE/x86_64/rhcos-$RELEASE-live-kernel-x86_64 $ INITRD=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/$RELEASE/x86_64/rhcos-$RELEASE-live-initramfs.x86_64.img $ ROOTFS=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/$RELEASE/x86_64/rhcos-$RELEASE-live-rootfs.x86_64.img $ wget $KERNEL $ wget $INITRD $ qemu-img create -f qcow2 test.qcow2 16G $ qemu-system-x86_64 -m 4096 -accel kvm -object rng-random,filename=/dev/urandom,id=rng0 \ -netdev user,id=eth0 -device virtio-net-pci,netdev=eth0 -hda test.qcow2 \ -kernel $(basename $KERNEL) -initrd $(basename $INITRD) \ -append "coreos.live.rootfs_url=$ROOTFS coreos.inst.install_dev=/dev/sda" ``` Observed RHCOS being installed in the QEMU session
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196