Bug 2107674
Summary: | Unable to install RHCOS 4.11.0-rc2-ppc64le on Bare-metal Power system | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Thomas L Falcon <tfalcon> | ||||||||||
Component: | RHCOS | Assignee: | Benjamin Gilbert <bgilbert> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Nguyen <mnguyen> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 4.11 | CC: | bgilbert, bugproxy, dornelas, jligon, manokuma, mhamzy, miabbott, mrussell, mtarsel, nstielau, tfalcon, travier | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | 4.12.0 | ||||||||||||
Hardware: | ppc64le | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2022-08-25 05:09:11 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Thomas L Falcon
2022-07-15 15:48:31 UTC
In the future, please collect all required information as noted in our template before opening a BZ. The promise of more information doesn't allow us to act on this BZ in the current state and increases the chance that it could be overlooked in the future. Created attachment 1897494 [details]
console-and-ign-files.tar.gz
Sorry for the trouble. I am attaching console output from one of the nodes and ignition files. I can not gather journalctl output. After I pxeboot the nodes to kick off the installation, petitboot no longer detects any disks. I cannot progress further.
I don't know petitboot very well; are you sure it's completely failing to boot? The log suggests that it has started the kernel, but since the console is not directed to ttyS1, you aren't seeing any output. Is it possible to intercept the boot and manually add "console=ttyS1" to the kernel arguments? I don't about intercepting the boot, but the "console=ttyS1" parameter was already present in the nodes' configuration file. LABEL pxeboot KERNEL http://192.168.79.1:8080/assets/rhcos-live-kernel-ppc64le APPEND initrd=http://192.168.79.1:8080/assets/rhcos-live-initramfs.ppc64le.img console=tty0 console=ttyS1 <------ Here ip=dhcp rd.neednet=1 coreos.inst.install_dev=/dev/nvme0n1 coreos.live.rootfs_url=http://192.168.79.1:8080/assets/rhcos-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.79.1:8080/ignition?mac=08:94:ef:80:c5:51 I don't know how to grab the output from it though. That only affects the PXE boot, not the installed system. If you can't intercept the boot, the next easiest approach is to remove "coreos.inst.install_dev=/dev/nvme0n1" from the APPEND line, boot from PXE, connect to the console, and manually run coreos-installer at the prompt: sudo coreos-installer install /dev/nvme0n1 --ignition-url http://192.168.79.1:8080/ignition?mac=08:94:ef:80:c5:51 --insecure-ignition --append-karg console=ttyS1 Then run "sudo reboot" and see what output you get when booting from the installed system. Thanks, I was able to boot that way but I don't how to proceed. We're using ssh keys, so I don't know the password. However, ssh also is not working. Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) 4.11 SSH host key: SHA256:3RTAHY3cLSrqVdiR5q/79etki64sFwVO41+eB6pebpw (ECDSA) SSH host key: SHA256:CrnZwblQXJR7uv7UtE9qP0ESX8s9A/ebvyc8yp2dE3g (ED25519) SSH host key: SHA256:qZZb5gZb8nxp/eE4NWc/f2dVE3Geq1ElCRjcOhEQz/A (RSA) enP49p3s0f1: 192.168.79.25 fe80::a94:efff:fe80:c551 master-1 login: Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) 4.11 SSH host key: SHA256:3RTAHY3cLSrqVdiR5q/79etki64sFwVO41+eB6pebpw (ECDSA) SSH host key: SHA256:CrnZwblQXJR7uv7UtE9qP0ESX8s9A/ebvyc8yp2dE3g (ED25519) SSH host key: SHA256:qZZb5gZb8nxp/eE4NWc/f2dVE3Geq1ElCRjcOhEQz/A (RSA) enP49p3s0f1: 192.168.79.25 fe80::a94:efff:fe80:c551 master-1 login: Please post the console log from the successful boot. I tried this on another system (to ensure that nvme devices and 4k block sizes were not causing the installation).
The coreos-install seems to complete with some warnings.
Installing Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) ppc64le (512-byte sectors)
[ 4610.624181] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4610.624181] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4610.674346] GPT:8454143 != 1953525167
[ 4610.674346] GPT:8454143 != 1953525167
[ 4610.714445] GPT:Alternate GPT header not at the end of the disk.
[ 4610.714445] GPT:Alternate GPT header not at the end of the disk.
[ 4610.764567] GPT:8454143 != 1953525167
[ 4610.764567] GPT:8454143 != 1953525167
[ 4610.804659] GPT: Use GNU Parted to correct GPT errors.
[ 4610.804659] GPT: Use GNU Parted to correct GPT errors.
[ 4610.844768] sda: sda1 sda2 sda3 sda4
[ 4610.844768] sda: sda1 sda2 sda3 sda4
> Read disk 4.0 GiB/4.0 GiB (100%)
[ 4690.651453] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4690.651453] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4690.701629] GPT:8454143 != 1953525167
[ 4690.701629] GPT:8454143 != 1953525167
[ 4690.741712] GPT:Alternate GPT header not at the end of the disk.
[ 4690.741712] GPT:Alternate GPT header not at the end of the disk.
[ 4690.801836] GPT:8454143 != 1953525167
[ 4690.801836] GPT:8454143 != 1953525167
[ 4690.841922] GPT: Use GNU Parted to correct GPT errors.
[ 4690.841922] GPT: Use GNU Parted to correct GPT errors.
[ 4690.892035] sda: sda1 sda2 sda3 sda4
[ 4690.892035] sda: sda1 sda2 sda3 sda4
[ 4691.271404] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
[ 4691.271404] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
Writing Ignition config
Modifying kernel arguments
[ 4691.502798] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4691.502798] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4691.552950] GPT:8454143 != 1953525167
[ 4691.552950] GPT:8454143 != 1953525167
[ 4691.583048] GPT:Alternate GPT header not at the end of the disk.
[ 4691.583048] GPT:Alternate GPT header not at the end of the disk.
[ 4691.633162] GPT:8454143 != 1953525167
[ 4691.633162] GPT:8454143 != 1953525167
[ 4691.663248] GPT: Use GNU Parted to correct GPT errors.
[ 4691.663248] GPT: Use GNU Parted to correct GPT errors.
[ 4691.713364] sda: sda1 sda2 sda3 sda4
[ 4691.713364] sda: sda1 sda2 sda3 sda4
Install complete.
[ 4692.130617] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4692.130617] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4692.181393] GPT:8454143 != 1953525167
[ 4692.181393] GPT:8454143 != 1953525167
[ 4692.222040] GPT:Alternate GPT header not at the end of the disk.
[ 4692.222040] GPT:Alternate GPT header not at the end of the disk.
[ 4692.272734] GPT:8454143 != 1953525167
[ 4692.272734] GPT:8454143 != 1953525167
[ 4692.313398] GPT: Use GNU Parted to correct GPT errors.
[ 4692.313398] GPT: Use GNU Parted to correct GPT errors.
[ 4692.354094] sda: sda1 sda2 sda3 sda4
[ 4692.354094] sda: sda1 sda2 sda3 sda4
bash-4.4# [ 4692.411483] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4692.411483] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4692.462238] GPT:8454143 != 1953525167
[ 4692.462238] GPT:8454143 != 1953525167
[ 4692.502961] GPT:Alternate GPT header not at the end of the disk.
[ 4692.502961] GPT:Alternate GPT header not at the end of the disk.
[ 4692.553717] GPT:8454143 != 1953525167
[ 4692.553717] GPT:8454143 != 1953525167
[ 4692.594448] GPT: Use GNU Parted to correct GPT errors.
[ 4692.594448] GPT: Use GNU Parted to correct GPT errors.
[ 4692.635195] sda: sda1 sda2 sda3 sda4
[ 4692.635195] sda: sda1 sda2 sda3 sda4
bash-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 127.7G 0 loop /run/ephemeral loop1 7:1 0 937.8M 0 loop /sysroot sda 8:0 1 931.5G 0 disk |-sda1 8:1 1 4M 0 part |-sda2 8:2 1 1M 0 part |-sda3 8:3 1 384M 0 part `-sda4 8:4 1 3.7G 0 part sdb 8:16 1 931.5G 0 disk bash-4.4# fdisk /dev/sda Welcome to fdisk (util-linux 2.32.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. GPT PMBR size mismatch (8454143 != 1953525167) will be corrected by write. The backup GPT table is not on the end of the device. This problem will be corrected by write. [ 980.509508] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 980.509508] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 980.570297] GPT:8454143 != 1953525167 [ 980.570297] GPT:8454143 != 1953525167 [ 980.611009] GPT:Alternate GPT header not at the end of the disk. [ 980.611009] GPT:Alternate GPT header not at the end of the disk. [ 980.661746] GPT:8454143 != 1953525167 [ 980.661746] GPT:8454143 != 1953525167 [ 980.692452] GPT: Use GNU Parted to correct GPT errors. [ 980.692452] GPT: Use GNU Parted to correct GPT errors. [ 980.743197] sda: sda1 sda2 sda3 sda4 [ 980.743197] sda: sda1 sda2 sda3 sda4 https://bugzilla.redhat.com/show_bug.cgi?id=2107674#c8 seems to indicate that the installation succeeded. Even though the coreos-install seems to complete, I could not boot on to the installed image. Those GPT warnings are normal, and are corrected automatically on first boot. I agree that the installation appears to have succeeded. At this point we'd need to see logs from the failed boot. Created attachment 1897962 [details]
pxe-boot-console-logs
Logs after removing "coreos.inst.install_dev=/dev/nvme0n1" from kernel arguments.
@tfalcon That's the log from the PXE boot. We need the log from the boot of the installed system. Did you follow the full instructions in comment 5? I'm still trying to figure out how to connect to the console and login to run the command. Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) 4.11 SSH host key: SHA256:fc3NlioetXjw0i3C2B1U+drAlVpBtQVH2H+JK7dMbZs (ECDSA) SSH host key: SHA256:z2kQhaMpcxjgYInABwQbEaEZqpC1jN3ahDgJuDDC0Mc (ED25519) SSH host key: SHA256:feM2Q+9mxFS3IwQJMK/6OL8G7UQBR72Ue8VS7AtwjC8 (RSA) enP49p3s0f1: 192.168.79.25 fe80::a94:efff:fe80:c551 master-1 login: Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) 4.11 SSH host key: SHA256:fc3NlioetXjw0i3C2B1U+drAlVpBtQVH2H+JK7dMbZs (ECDSA) SSH host key: SHA256:z2kQhaMpcxjgYInABwQbEaEZqpC1jN3ahDgJuDDC0Mc (ED25519) SSH host key: SHA256:feM2Q+9mxFS3IwQJMK/6OL8G7UQBR72Ue8VS7AtwjC8 (RSA) enP49p3s0f1: 192.168.79.25 fe80::a94:efff:fe80:c551 master-1 login: Created attachment 1898018 [details]
bootstrap-autologin.ign
I tried to add an autologin unit to the bootstrap ignition file, but something isn't working. I'm still blocked on accessing the console after booting a node in PXE.
Whoops, yeah, sorry about that. In ISO boots autologin happens automatically if an Ignition config is omitted, but that doesn't happen in PXE boots. This config, based on the one in https://docs.fedoraproject.org/en-US/fedora-coreos/tutorial-autologin/#_first_ignition_config_via_butane, should enable autologin on ttyS1: { "ignition": { "version": "3.3.0" }, "systemd": { "units": [ { "dropins": [ { "contents": "[Service]\nExecStart=\nExecStart=-/usr/sbin/agetty --autologin core --noclear %I $TERM\n", "name": "autologin-core.conf" } ], "name": "serial-getty" } ] } } You can use it by hosting it on a web server, then passing "ignition.platform.id=metal ignition.firstboot ignition.config.url=https://example.com/ignition/config" in the kernel arguments. Thanks! I booted a node with that file, but I am still being prompted to login. I am trying to confirm that I am connected to ttyS1. Manoj said he did the installation on infnod-1, and this is what we noticed there. Petitboot did mount sda. And we could see the following: # ls -l /var/petitboot/mnt/dev/sda3 total 23 lrwxrwxrwx 1 root root 1 Jul 9 10:28 boot -> . drwxr-xr-x 5 root root 1024 Jul 9 10:28 grub2 drwx------ 2 root root 1024 Jul 17 22:56 ignition -rw-r--r-- 1 root root 0 Jul 9 10:28 ignition.firstboot lrwxrwxrwx 1 root root 8 Jul 9 10:28 loader -> loader.1 drwxr-xr-x 3 root root 1024 Jul 9 10:28 loader.1 drwx------ 2 root root 12288 Jul 9 10:27 lost+found drwxr-xr-x 3 root root 1024 Jul 9 10:28 ostree # ls -l /var/petitboot/mnt/dev/sda3/ostree/ total 2 drwxr-xr-x 2 root root 1024 Jul 9 10:28 rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e # ls -l /var/petitboot/mnt/dev/sda3/ostree/rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/ total 122614 -rw-r--r-- 1 root root 89918534 Jul 9 10:28 initramfs-4.18.0-372.13.1.el8_6.ppc64le.img -rwxr-xr-x 1 root root 35634157 Jul 9 10:28 vmlinuz-4.18.0-372.13.1.el8_6.ppc64le # ls -l /var/petitboot/mnt/dev/sda4 total 0 drwxr-xr-x 2 root root 6 Jul 9 10:27 boot drwxr-xr-x 5 root root 62 Jul 9 10:28 ostree # cat /var/petitboot/mnt/dev/sda3/loader/entries/ostree-1-rhcos.conf title Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) (ostree:0) version 1 options random.trust_cpu=on console=tty0 console=hvc0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.1/rhcos/797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/0 console=ttyS1 linux /ostree/rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/vmlinuz-4.18.0-372.13.1.el8_6.ppc64le initrd /ostree/rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/initramfs-4.18.0-372.13.1.el8_6.ppc64le.img We tried to kexec manually into the image, but encountered an error: # cd /var/petitboot/mnt/dev/sda3/ostree/rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/ # kexec -l vmlinuz-4.18.0-372.13.1.el8_6.ppc64le -i initramfs-4.18.0-372.13.1.el8_6.ppc64le.img -c "random.trust_cpu=on console=tty0 console=hvc0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/0 console=ttyS1" kexec syscall failed: Operation not permitted Unfortunately, petitboot does not have the "file" command, so we could not verify if the kernel was valid. We've been able to kexec manually into the image from the petitboot menu. # cd /var/petitboot/mnt/dev/*1p3/ostree/rhcos-*/ # kexec -l vmlinuz-*.ppc64le -i initramfs-*.img -c "$(grep options /var/petitboot/mnt/dev/*1p3/loader/entries/ostree-1-rhcos.conf | sed 's,^options ,,')" # kexec -e ------- Comment From chavez.com 2022-07-22 17:45 EDT------- Hi Tom and Mark, Trying to figure out if you need some help from IBM side for petiboot or if y'all are still doing some more investigation on your own. Can we get a summary of the investigation and the issues that you are facing? Luciano, sorry for the late response. We were hoping to get some help with figuring out why the petitboot entries are not being populated. Unfortunately the cluster is being used for a different issue at the moment. Timothee, We were able to workaround the petitboot issue by running the following commands in the shell # cd /var/petitboot/mnt/dev/nvme0n1p3/ostree/rhcos-*/ # kexec -l vmlinuz-*.ppc64le -i initramfs-*.img -c "ignition.firstboot rd.neednet=1 ip=dhcp $(grep options /var/petitboot/mnt/dev/nvme0n1p3/loader/entries/ostree-1-rhcos.conf | sed 's,^options ,,')" && kexec -e However, for some reason the bootkube service did not run on the bootstrap node and needed to be run manually. I wasn't able to get the installation to work before we had to repurpose the cluster for another issue. I don't know if the bootstrap issue is related to this one. Is this still an issue that you are facing or have you found a solution? If this is still an issue then we need a clearer summary as we do not understand how we could help here. Hi, we were able to workaround by installing 4.10 and upgrading to 4.11 using the openshift client. The system is being used by other teams for their work so I have not had the opportunity to attempt any installations with newer versions. The issue seems to be that petitboot is unable to parse the grub configuration file when rhcos 4.11.0-rc.2-ppc64le live images are used for installation on our bare-metal cluster. As a result there are no entries populated in the boot menu. We were forced to boot the live image manually with kexec from the petitboot shell. ------- Comment From kumarmn.com 2022-08-11 15:37 EDT------- chavez.com: Any update on this? ------- Comment From arbab.com 2022-08-12 12:55 EDT------- (In reply to comment #3) > # cat /var/petitboot/mnt/dev/sda3/loader/entries/ostree-1-rhcos.conf > title Red Hat Enterprise Linux CoreOS 411.86.202207090936-0 (Ootpa) (ostree:0) > version 1 > options random.trust_cpu=on console=tty0 console=hvc0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.1/rhcos/797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/0 console=ttyS1 > linux /ostree/rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/vmlinuz-4.18.0-372.13.1.el8_6.ppc64le > initrd /ostree/rhcos-797b8f8e1f9f0b6a5118d56f126a1fd518cfbadfaf7cf28deaa744924e38c87e/initramfs-4.18.0-372.13.1.el8_6.ppc64le.img Could someone please grab /var/log/petitboot/pb-discover.log when this happens? It may tell us why the above is not appearing in the menu. Created attachment 1906052 [details]
pb-discover.log
Hi, I pulled this from one of the nodes. I see this output a few times in the logs.
[18:38:51] Registering new progress struct
[18:38:51] boot option enP49p3s0f1@0x118d2128 is resolved, sending to clients
[18:38:51] process_read_stdout_once: read failed: Bad file descriptor
Have you been able to review the attachment? I'll provide the logs here in case there is an issue reading it. # cat /var/petitboot/mnt/dev/nvme0n1p3/loader/entries/ostree-1-rhcos.conf title Red Hat Enterprise Linux CoreOS 411.86.202208022159-0 (Ootpa) (ostree:0) version 1 options random.trust_cpu=on console=tty0 console=hvc0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.1/rhcos/69ef35526678094d9ab331a87f97458d144ef220af08e8eaf683bf78d8cfcb32/0 linux /ostree/rhcos-69ef35526678094d9ab331a87f97458d144ef220af08e8eaf683bf78d8cfcb32/vmlinuz-4.18.0-372.19.1.el8_6.ppc64le initrd /ostree/rhcos-69ef35526678094d9ab331a87f97458d144ef220af08e8eaf683bf78d8cfcb32/initramfs-4.18.0-372.19.1.el8_6.ppc64le.img # cat /var/log/petitboot/pb-discover.log [18:28:49] --- pb-discover --- [18:28:49] config_set_defaults: lang: en_US.utf8 [18:28:49] Detected platform type: powerpc [18:28:49] Running command: exe: nvram argv: 'nvram' '--print-config' '--partition' 'common' [18:28:49] platform: non-zero completion code 128 from IPMI req [18:28:49] platform: non-zero completion code 255 from IPMI network req [18:28:49] configuration: [18:28:49] autoboot: disabled [18:28:49] boot device 0: network [18:28:49] boot device 1: any [18:28:49] IPMI boot device 0x00 [18:28:49] Modifications allowed to disks: yes [18:28:49] Default UI to boot on: /dev/hvc0 [IPMI / Serial] [18:28:49] language: [18:28:49] Interface enP49p3s0f0 ready [18:28:49] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enP49p3s0f0' 'up' [18:28:50] network: bringing up interface enP49p3s0f0 [18:28:50] Interface enP49p3s0f1 ready [18:28:50] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enP49p3s0f1' 'up' [18:28:50] network: bringing up interface enP49p3s0f1 [18:28:50] SKIP: loop0: ignored (path=/devices/virtual/block/loop0) [18:28:50] SKIP: loop1: ignored (path=/devices/virtual/block/loop1) [18:28:50] SKIP: loop2: ignored (path=/devices/virtual/block/loop2) [18:28:50] SKIP: loop3: ignored (path=/devices/virtual/block/loop3) [18:28:50] SKIP: loop4: ignored (path=/devices/virtual/block/loop4) [18:28:50] SKIP: loop5: ignored (path=/devices/virtual/block/loop5) [18:28:50] SKIP: loop6: ignored (path=/devices/virtual/block/loop6) [18:28:50] SKIP: loop7: ignored (path=/devices/virtual/block/loop7) [18:28:50] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enP49p3s0f0' 'up' [18:28:50] network: bringing up interface enP49p3s0f0 [18:28:50] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'lo' 'up' [18:28:50] sit0 not marked ready yet [18:28:50] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enP49p3s0f0' 'up' [18:28:50] network: bringing up interface enP49p3s0f0 [18:28:50] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enP49p3s0f1' 'up' [18:28:50] network: bringing up interface enP49p3s0f1 [18:28:50] network: configuring interface enP49p3s0f0 [18:28:50] Running DHCPv4 client [18:28:50] Running command: exe: /sbin/udhcpc argv: '/sbin/udhcpc' '-R' '-f' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-O' 'reboottime' '-p' '/var/petitboot//udhcpc-enP49p3s0f0.pid' '-i' 'enP49p3s0f0' '-x' '0x5d:000e' [18:28:50] Running DHCPv6 client [18:28:50] Running command: exe: /usr/bin/udhcpc6 argv: '/usr/bin/udhcpc6' '-R' '-f' '-O' 'bootfile_url' '-O' 'bootfile_param' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-p' '/var/petitboot//udhcpc-enP49p3s0f0.pid' '-i' 'enP49p3s0f0' '-x' '0x3d:000e' [18:28:50] network: configuring interface enP49p3s0f1 [18:28:50] Running DHCPv4 client [18:28:50] Running command: exe: /sbin/udhcpc argv: '/sbin/udhcpc' '-R' '-f' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-O' 'reboottime' '-p' '/var/petitboot//udhcpc-enP49p3s0f1.pid' '-i' 'enP49p3s0f1' '-x' '0x5d:000e' [18:28:50] Running DHCPv6 client [18:28:50] Running command: exe: /usr/bin/udhcpc6 argv: '/usr/bin/udhcpc6' '-R' '-f' '-O' 'bootfile_url' '-O' 'bootfile_param' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-p' '/var/petitboot//udhcpc-enP49p3s0f1.pid' '-i' 'enP49p3s0f1' '-x' '0x3d:000e' udhcpc: started, v1.33.0 udhcpc: started, v1.33.0 udhcpc6: can't get link-local IPv6 address udhcpc6: started, v1.33.0 udhcpc: sending discover udhcpc: sending discover udhcpc6: sending discover udhcpc: sending select for 192.168.79.26 udhcpc: lease of 192.168.79.26 obtained, lease time 900 deleting routers adding dns 192.168.79.1 [18:28:50] trying parsers for enP49p3s0f1 [18:28:50] Running command: exe: /usr/bin/tftp argv: '/usr/bin/tftp' '-g' '-l' '/tmp/pb-SpYSa8' '-r' '/pxelinux/pxelinux.cfg/01-08-94-ef-80-c2-e8.cfg' '192.168.79.1' '69' [18:28:50] Registering new progress struct [18:28:50] boot option enP49p3s0f1@0x118cfbd8 is resolved, sending to clients [18:28:50] process_read_stdout_once: read failed: Bad file descriptor [18:28:51] eth0 not marked ready yet [18:28:52] SKIP: nvme0n1: no ID_FS_TYPE property [18:28:52] Snapshot successfully created for nvme0n1p4 [18:28:52] mounting device /dev/nvme0n1p4 read-only [18:28:52] trying parsers for nvme0n1p4 [18:28:52] Running command: exe: /usr/sbin/pb-plugin argv: '/usr/sbin/pb-plugin' 'scan' '/var/petitboot/mnt/dev/nvme0n1p4' Scanning device /var/petitboot/mnt/dev/nvme0n1p4 No plugins found [18:28:52] SKIP: nvme0n1p2: no ID_FS_TYPE property [18:28:52] SKIP: nvme0n1p1: no ID_FS_TYPE property [18:28:52] SKIP: dm-0: no ID_FS_TYPE property [18:28:52] SKIP: dm-1: no ID_FS_TYPE property [18:28:52] Snapshot successfully created for nvme0n1p3 [18:28:52] mounting device /dev/nvme0n1p3 read-only [18:28:52] trying parsers for nvme0n1p3 [18:28:52] grub2: undefined function 'serial' [18:28:52] grub2: undefined function 'terminal_input' [18:28:52] grub2: undefined function 'terminal_output' [18:28:52] grub2: undefined function 'blscf' [18:28:52] Running command: exe: /usr/sbin/pb-plugin argv: '/usr/sbin/pb-plugin' 'scan' '/var/petitboot/mnt/dev/nvme0n1p3' Scanning device /var/petitboot/mnt/dev/nvme0n1p3 No plugins found [18:28:52] eth1 not marked ready yet [18:28:52] SKIP: dm-2: no ID_FS_TYPE property [18:28:52] SKIP: dm-3: no ID_FS_TYPE property [18:28:52] SKIP: dm-4: no ID_FS_TYPE property [18:28:52] SKIP: dm-5: no ID_FS_TYPE property [18:28:52] enp1s0f1 not marked ready yet [18:28:52] Interface enp1s0f1 ready [18:28:52] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enp1s0f1' 'up' udhcpc: sending discover [18:28:54] network: bringing up interface enp1s0f1 [18:28:54] enp1s0f0 not marked ready yet [18:28:54] Interface enp1s0f0 ready [18:28:54] Running command: exe: /sbin/ip argv: '/sbin/ip' 'link' 'set' 'enp1s0f0' 'up' udhcpc6: sending discover [18:28:56] network: bringing up interface enp1s0f0 [18:28:56] network: configuring interface enp1s0f0 [18:28:56] Running DHCPv4 client [18:28:56] Running command: exe: /sbin/udhcpc argv: '/sbin/udhcpc' '-R' '-f' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-O' 'reboottime' '-p' '/var/petitboot//udhcpc-enp1s0f0.pid' '-i' 'enp1s0f0' '-x' '0x5d:000e' [18:28:56] Running DHCPv6 client [18:28:56] Running command: exe: /usr/bin/udhcpc6 argv: '/usr/bin/udhcpc6' '-R' '-f' '-O' 'bootfile_url' '-O' 'bootfile_param' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-p' '/var/petitboot//udhcpc-enp1s0f0.pid' '-i' 'enp1s0f0' '-x' '0x3d:000e' udhcpc: started, v1.33.0 udhcpc6: started, v1.33.0 udhcpc: sending discover udhcpc: sending discover udhcpc6: sending discover udhcpc6: sending discover ... [18:33:50] Running DHCPv4 client [18:33:50] Running command: exe: /sbin/udhcpc argv: '/sbin/udhcpc' '-R' '-f' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-O' 'reboottime' '-p' '/var/petitboot//udhcpc-enP49p3s0f1.pid' '-i' 'enP49p3s0f1' '-x' '0x5d:000e' udhcpc: received SIGTERM udhcpc: unicasting a release of 192.168.79.26 to 192.168.79.1 udhcpc: sending release udhcpc: entering released state [18:33:50] Running DHCPv6 client udhcpc: started, v1.33.0 [18:33:50] Running command: exe: /usr/bin/udhcpc6 argv: '/usr/bin/udhcpc6' '-R' '-f' '-O' 'bootfile_url' '-O' 'bootfile_param' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-p' '/var/petitboot//udhcpc-enP49p3s0f1.pid' '-i' 'enP49p3s0f1' '-x' '0x3d:000e' udhcpc6: started, v1.33.0 udhcpc: sending discover udhcpc6: sending discover udhcpc: sending select for 192.168.79.26 udhcpc: lease of 192.168.79.26 obtained, lease time 900 deleting routers adding dns 192.168.79.1 [18:33:50] Couldn't find interface matching 08:94:ef:80:c2:e8 [18:33:50] trying parsers for enP49p3s0f1 [18:33:50] Running command: exe: /usr/bin/tftp argv: '/usr/bin/tftp' '-g' '-l' '/tmp/pb-IyS8G7' '-r' '/pxelinux/pxelinux.cfg/01-08-94-ef-80-c2-e8.cfg' '192.168.79.1' '69' [18:33:50] Registering new progress struct [18:33:50] boot option enP49p3s0f1@0x118e72f8 is resolved, sending to clients [18:33:50] process_read_stdout_once: read failed: Bad file descriptor udhcpc: sending discover udhcpc6: sending discover ... [18:38:50] Running DHCPv4 client udhcpc: received SIGTERM udhcpc: unicasting a release of 192.168.79.26 to 192.168.79.1 udhcpc: sending release [18:38:50] Running command: exe: /sbin/udhcpc argv: '/sbin/udhcpc' '-R' '-f' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-O' 'reboottime' '-p' '/var/petitboot//udhcpc-enP49p3s0f1.pid' '-i' 'enP49p3s0f1' '-x' '0x5d:000e' udhcpc: entering released state udhcpc6: received SIGTERM udhcpc6: entering released state [18:38:50] Running DHCPv6 client [18:38:50] Running command: exe: /usr/bin/udhcpc6 argv: '/usr/bin/udhcpc6' '-R' '-f' '-O' 'bootfile_url' '-O' 'bootfile_param' '-O' 'pxeconffile' '-O' 'pxepathprefix' '-p' '/var/petitboot//udhcpc-enP49p3s0f1.pid' '-i' 'enP49p3s0f1' '-x' '0x3d:000e' udhcpc: started, v1.33.0 udhcpc6: started, v1.33.0 udhcpc: sending discover udhcpc6: sending discover udhcpc: sending select for 192.168.79.26 udhcpc: lease of 192.168.79.26 obtained, lease time 900 deleting routers adding dns 192.168.79.1 [18:38:51] Couldn't find interface matching 08:94:ef:80:c2:e8 [18:38:51] trying parsers for enP49p3s0f1 [18:38:51] Running command: exe: /usr/bin/tftp argv: '/usr/bin/tftp' '-g' '-l' '/tmp/pb-mWf4c6' '-r' '/pxelinux/pxelinux.cfg/01-08-94-ef-80-c2-e8.cfg' '192.168.79.1' '69' [18:38:51] Registering new progress struct [18:38:51] boot option enP49p3s0f1@0x118d2128 is resolved, sending to clients [18:38:51] process_read_stdout_once: read failed: Bad file descriptor ------- Comment From arbab.com 2022-08-23 14:16 EDT------- (In reply to comment #17) > Have you been able to review the attachment? I'll provide the logs here in > case there is an issue reading it. Here's the relevant part of pb-discover.log: I think the problem is this: [18:28:52] grub2: undefined function 'blscf' In the grub.cfg file, the command should be "blscfg" (with a 'g' at the end), not "blscf". I mocked up a recreation environment, and sure enough, without "blscfg" petitboot does not parse the files under /loader/entries, which is why the menu entry is not showing up. Could you please verify what's in your grub.cfg? Hey, good catch! I looked at grub.cfg and blscfg is there, but there is some kind of corruption? in the file. Viewing the file with cat produces strange results. # cat /var/petitboot/mnt/dev/nvme0n1p3/grub2/grub.cfg set pager=1 # petitboot doesn't support -e and doesn't support an empty path part if [ -d (md/md-boot)/grub2 ]; then # fcct currently creates /boot RAID with superblock 1.0, which allows # component partitions to be read directly as filesystems. This is # necessary because transposefs doesn't yet rerun grub2-install on BIOS, # so GRUB still expects /boot to be a partition on the first disk. # # There are two consequences: # 1. On BIOS and UEFI, the search command might pick an individual RAID # component, but we want it to use the full RAID in case there are bad # sectors etc. The undocumented --hint option is supposed to support # this sort of override, but it doesn't seem to work, so we set $boot # directly. # 2. On BIOS, the "normal" module has already been loaded from an # individual RAID component, and $prefix still points there. We want # future module loads to come from the RAID, so we reset $prefix. # (On UEFI, the stub grub.cfg has already set $prefix properly.) set boot=md/md-boot set prefix=($boot)/grub2 else if [ -f ${config_directory}/bootuuid.cfg ]; then source ${config_directory}/bootuuid.cfg fi if [ -n "${BOOT_UUID}" ]; then search --fs-uuid "${BOOT_UUID}" --set boot --no-floppy else search --label boot --set boot --no-floppy fi fi set root=$boot if [ -f ${config_directory}/grubenv ]; then load_env -f ${config_directory}/grubenv elif [ -s $prefix/grubenv ]; then load_env fi if [ x"${feature_menuentry_id}" = xy ]; then menuentry_id_option="--id" else menuentry_id_option="" fi function load_video { if [ x$feature_all_video_module = xy ]; then insmod all_video else insmod efi_gop insmod efi_uga insmod ieee1275_fb insmod vbe insmod vga insmod video_bochs insmod video_cirrus fi } serial --speed=115200 terminal_input serial console terminal_output serial console if [ x$feature_timeout_style = xy ] ; then set timeout_style=menu set timeout=1 # Fallback normal timeout code in case the timeout_style feature is # unavailable. else set timeout=1 fi # Determine if this is a first boot and set the ${ignition_firstboot} variable # which is used in the kernel command line. set ignition_firstboot="" if [ -f "/ignition.firstboot" ]; then # Default networking parameters to be used with ignition. set ignition_network_kcmdline='' # Source in the `ignition.firstboot` file which could override the # above $ignition_network_kcmdline with static networking config. # This override feature is also by coreos-installer to persist static # networking config provided during install to the first boot of the machine. source "/ignition.firstboot" set ignition_firstboot="ignition.firstboot ${ignition_network_kcmdline}" fi # Import user defined configuration # tracker: https://github.com/coreos/fedora-coreos-tracker/issues/805 if [ -f $prefix/user.cfg ]; then source $prefix/user.cfg fi blscfg# For comparison: # cat /var/petitboot/mnt/dev/nvme0n1p3/loader/entries/ostree-1-rhcos.conf title Red Hat Enterprise Linux CoreOS 411.86.202208022159-0 (Ootpa) (ostree:0) version 1 options random.trust_cpu=on console=tty0 console=hvc0,115200n8 ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.1/rhcos/69ef35526678094d9ab331a87f97458d144ef220af08e8eaf683bf78d8cfcb32/0 linux /ostree/rhcos-69ef35526678094d9ab331a87f97458d144ef220af08e8eaf683bf78d8cfcb32/vmlinuz-4.18.0-372.19.1.el8_6.ppc64le initrd /ostree/rhcos-69ef35526678094d9ab331a87f97458d144ef220af08e8eaf683bf78d8cfcb32/initramfs-4.18.0-372.19.1.el8_6.ppc64le.img # ------- Comment From arbab.com 2022-08-23 17:37 EDT------- (In reply to comment #19) > Hey, good catch! I looked at grub.cfg and blscfg is there, but there is some > kind of corruption? in the file. Viewing the file with cat produces strange > results. > > # cat /var/petitboot/mnt/dev/nvme0n1p3/grub2/grub.cfg [snip] > blscfg# The file may be truncated, but more likely it's just missing a new-line character at the end of that last line. Red Hat, can you see why this might be? Nice find! The missing trailing newline was introduced by https://github.com/coreos/coreos-assembler/commit/c9036faecb, which is in 4.11, and fixed by https://github.com/coreos/coreos-assembler/commit/45fc1e7518, which isn't. Could you try adding the missing newline (something like "echo >> /var/petitboot/mnt/dev/nvme0n1p3/grub2/grub.cfg") and confirm that the system then boots correctly? Sorry, I am unable to edit the file, at least not through the petitboot shell. # echo >> /var/petitboot/mnt/dev/nvme0n1p3/grub2/grub.cfg -sh: can't create /var/petitboot/mnt/dev/nvme0n1p3/grub2/grub.cfg: Read-only file system Okay, try "mount -o remount,rw /var/petitboot/mnt/dev/nvme0n1p3/grub2" beforehand and ...remount,ro... afterward? Alright, thanks, yes, the image is visible now with that change. Petitboot (v1.11) 9183-22X 13001DA ────────────────────────────────────────────────────────────────────────────── [Disk: nvme0n1p3 / 96d15588-3596-4b3c-adca-a2ff7279ea63] Red Hat Enterprise Linux CoreOS 411.86.202208022159-0 (Ootpa) (ostree:0) [Network: enP49p3s0f1 / 08:94:ef:80:c2:e8] pxeboot System information System configuration System status log Language Rescan devices Retrieve config from URL Plugins (0) *Exit to shell Great, thanks for checking. The code fix has landed in coreos-assembler for RHCOS 4.11. However, the paperwork on this one is a bit weird, because a) the problem was already fixed in the 4.12 branch but needs a backport, b) the fix won't be visible to users until we bump the RHCOS bootimage, and c) the Bugzilla OCP product is closed to new bugs. Net result, I have to close this bug CURRENTRELEASE and file a Jira bug for 4.11. You can track the progress of the 4.11 fix in https://issues.redhat.com/browse/OCPBUGS-565. Thanks for reporting this. |