Bug 1782885
| Summary: | qemu unable to boot rhel8.2 pxe images on rhel7.6 host | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | michal novacek <mnovacek> |
| Component: | ipxe | Assignee: | Laszlo Ersek <lersek> |
| ipxe sub component: | ipxe-bootimgs | QA Contact: | FuXiangChun <xfu> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | high | ||
| Priority: | medium | CC: | chayang, jinzhao, jkortus, juzhang, leiyang, lersek, virt-maint, xfu, yfu |
| Version: | 7.6 | Keywords: | TestBlocker |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-01-06 16:34:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
michal novacek
2019-12-12 14:55:48 UTC
I try to test on different host,but I can not reproduce this issue. 1. rhel8.2.0 host Test Version: 4.18.0-164.el8.x86_64 qemu-kvm-2.12.0-93.module+el8.2.0+5173+94838aaa.x86_64 ipxe-roms-20181214-3.git133f4c47.el8.noarch 2. rhel7.8 host Test Version: kernel-3.10.0-1118.el7.x86_64 qemu-kvm-rhev-2.12.0-40.el7.x86_64 ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch 3. rhel7.6 host Test Version: kernel-3.10.0-957.38.3.el7.x86_64 qemu-kvm-1.5.3-160.el7_6.3.x86_64 ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch I will be set up a new pxe server to test it,then update test result. Per https://access.redhat.com/solutions/4391971 , it seems more like a file corruption issue I set up an new pxeserver,RHEL-8.2.0 image(http://download.lab.eng.brq.redhat.com/nightly/RHEL-8.2.0-20191210.n.0/compose/BaseOS/x86_64/os/) is able to boot and install. Pxeserver Version: kernel-4.18.0-161.el8.x86_64 ipxe-roms-20181214-3.git133f4c47.el8.noarch syslinux-tftpboot-6.04-4.el8.noarch dhcp-server-4.3.6-40.el8.x86_64 tftp-server-5.2-24.el8.x86_64 What command do you use for testing? (In reply to michal novacek from comment #5) > What command do you use for testing? Hi,michal Test Command line: /usr/libexec/qemu-kvm -name rhel8 \ -M pc -m 4G \ -cpu Haswell \ -nodefaults \ -smp 4,sockets=1,cores=4,threads=1 \ -drive file=/home/ipxe.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=unsafe,aio=threads \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 \ -vnc :3 \ -vga qxl \ -monitor stdio \ -serial unix:/tmp/monitor1,server,nowait \ -boot menu=on \ -device virtio-net-pci,netdev=tap11,mac=22:57:f8:dd:fe:3a \ -netdev tap,id=tap11,vhost=on \ -kernel /root/vmlinuz \ -initrd /root/initrd.img \ -append method=http://download.lab.eng.brq.redhat.com/nightly/RHEL-8.2.0-20191210.n.0/compose/BaseOS/x86_64/os/ \ ==> I wrote a document that contains test commands and the process setting up a pxe server . link:https://mojo.redhat.com/docs/DOC-1212767 Best regards LeiYang I had some luck replicating the problem with virt-install. On the rhel7.6 I can boot correctly RHEL-8.1.1-20191217.n.1 but I cannot boot RHEL-8.2.0-20191217.n.1 pxe images. Our setup consists of central server (rhel6.8) providing dhcp and tftpd having root at /tftpboot. /tftpboot is mounted via nfs on the host running the virt. Provided image is copied manually from /mnt/redhat being nfs mount of ntap-rdu2-c01-eng01-nfs01b.storage.rdu2.redhat.com:/bos_eng01_engineering_sm/devarchive/redhat > big-01$ virt-install --disk=none --graphics=none --name test \ > --network bridge=br0,mac=1a:00:00:00:00:06 \ > --boot network,useserial=on --pxe \ > --ram 512 > --connect qemu:///system > ... > SeaBIOS (version 1.11.0-2.el7) > Machine UUID 11258876-ac2f-433e-b5f6-6f201544e4ee > > > iPXE (http://ipxe.org) 00:03.0 C100 PCI2.10 PnP PMM+1FF94550+1FEF4550 C100 > > > > Booting from ROM... > iPXE (PCI 00:03.0) starting execution...ok > iPXE initialising devices...ok > > > > iPXE 1.0.0+ (4e85b27) -- Open Source Network Boot Firmware -- http://ipxe.org > Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT > > net0: 1a:00:00:00:00:06 using rtl8139 on 0000:00:03.0 (open) > [Link:up, TX:0 TXE:0 RX:0 RXE:0] > Configuring (net0 1a:00:00:00:00:06)............... ok > net0: 10.37.166.133/255.255.252.0 gw 10.37.167.254 > Next server: 10.37.166.1 > Filename: /pxelinux.0 > tftp://10.37.166.1//pxelinux.0... ok > pxelinux.0 : 13148 bytes [PXE-NBP] > > PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin > UNDI data segment at: 0009CB00 > UNDI data segment size: 2CE0 > UNDI code segment at: 0009C2C0 > UNDI code segment size: 0802 > PXE entry point found (we hope) at 9C2C:0160 > My IP address seems to be 0A25A685 10.37.166.133 > ip=10.37.166.133:10.37.166.1:10.37.167.254:255.255.252.0 > TFTP prefix: / > Trying to load: pxelinux.cfg/01-1a-00-00-00-00-06 > Invalid or corrupt kernel image. > boot: > > > This is the content of the file pxelinux.cfg/01-1a-00-00-00-00-06 > > big-01$ cat /tftpboot/pxelinux.cfg/01-1a-00-00-00-00-06 > default linux > prompt 0 > timeout 100 > label linux > kernel /images/default/vmlinuz > ipappend 2 > append initrd=/images/default/initrd.img ks=http://beaker.cluster-qe.lab.eng.brq.redhat.com/bkr/kickstart/304286 ksdevice=bootif netboot_method=pxe > > [root@big-01 test]# sha256sum /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz /tftpboot/images/default/vmlinuz > 68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz > 68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc /tftpboot/images/default/vmlinuz So why is my kernel image deemed corrupt by ipxe? Michal - Speaking as virt-maint, no idea. I've been watching the interactions before attempting to assign the bz. Seems this may be an iPXE problem, so I'll move this along thru the process I found out that the same thing happens with libvirt configured pxe boot which makes it really easy to reproduce (no fancy tftp setup/bridging needed): > [root@kiff-01 ~]# virsh net-dumpxml default <network connections='1'> <name>default</name> <uuid>5c451bbc-6d2b-42c6-9e14-99b20ebbc40d</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr00' stp='on' delay='0'/> <mac address='52:54:00:ff:01:00'/> <ip address='192.168.1.253' netmask='255.255.254.0'> > <tftp root='/tftpboot'/> <dhcp> <range start='192.168.0.21' end='192.168.1.252'/> ... > <bootp file='pxelinux.0' server='192.168.1.253'/> <host mac='2A:00:00:00:00:01' ip='192.168.0.1'/> ... </dhcp> </ip> </network> > [root@kiff-01 ~]# cat /tftpboot/pxelinux.cfg/01-52-54-00-74-84-45 default linux prompt 0 timeout 100 label linux kernel images/default/vmlinuz ipappend 2 append initrd=images/default/initrd.img console=ttyS0,115200 ksdevice=bootif netboot_method=pxe > [root@kiff-01 ~] sha256sum /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz /tftpboot/images/default/vmlinuz 68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz 68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc /tftpboot/images/default/vmlinuz [root@kiff-01 ~] sha256sum /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/initrd.img /tftpboot/images/default/initrd.img 111fa26f96755e166e4ab5593ccdb8a387dfe32afea1599f9dbb0e0991a48a78 /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/initrd.img 111fa26f96755e166e4ab5593ccdb8a387dfe32afea1599f9dbb0e0991a48a78 /tftpboot/images/default/initrd.img > [root@kiff-01 ~]# rpm -qa ipxe-roms-qemu qemu-xvm libvirt kernel | sort ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch kernel-3.10.0-327.83.1.el7.x86_64 libvirt-4.5.0-10.el7_6.15.x86_64 > [root@kiff-01 ~] virt-install --disk=none --graphics=none --name test --network network=default,mac=52:54:00:74:84:45 --boot network,useserial=on --pxe --ram 512 --connect qemu:///system SeaBIOS (version 1.11.0-2.el7) Machine UUID 17596046-b1be-4c7e-b9a8-08bf8e6cd01a iPXE (http://ipxe.org) 00:03.0 C100 PCI2.10 PnP PMM+1FF94550+1FEF4550 C100 Booting from ROM... iPXE (PCI 00:03.0) starting execution...ok iPXE initialising devices...ok iPXE 1.0.0+ (4e85b27) -- Open Source Network Boot Firmware -- http://ipxe.org Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT net0: 52:54:00:74:84:45 using rtl8139 on 0000:00:03.0 (open) [Link:up, TX:0 TXE:0 RX:0 RXE:0] Configuring (net0 52:54:00:74:84:45)............... ok net0: 192.168.1.66/255.255.254.0 gw 192.168.1.253 Next server: 192.168.1.253 Filename: pxelinux.0 tftp://192.168.1.253/pxelinux.0... ok pxelinux.0 : 13148 bytes [PXE-NBP] PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin UNDI data segment at: 0009CB00 UNDI data segment size: 2CE0 UNDI code segment at: 0009C2C0 UNDI code segment size: 0802 PXE entry point found (we hope) at 9C2C:0160 My IP address seems to be C0A80142 192.168.1.66 ip=192.168.1.66:192.168.1.253:192.168.1.253:255.255.254.0 TFTP prefix: Trying to load: pxelinux.cfg/01-52-54-00-74-84-45 Invalid or corrupt kernel image. boot: Hello Michal, (1) just to be sure, please try the same procedure on a different virtualization host (i.e., run the PXE client VM on a different host). By "different", I mean: - switch from AMD to Intel - try disabling netsted paging. This idea is inspired by BZ#1627022 -> BZ#1655873 -> BZ#1673779. It's not a very smart idea, but we've been surprised before, and such a test should not take much of your time. (2) That said, I think the issue is that you are using an extremely outdated version of "pxelinux.0", for booting a bleeding edge RHEL-8.2 kernel. That's not expected to work. In support of my claim: (2a) In comment#4, Lei Yang used "syslinux-tftpboot-6.04-4.el8.noarch", as part of a successful boot test. Extracting "/tftpboot/pxelinux.0" from "syslinux-tftpboot-6.04-4.el8.noarch.rpm", we can see that the size of "pxelinux.0" is 42,821 bytes. Furthermore, running the following pipeline on that binary: $ strings pxelinux.0 | egrep 'PXELINUX|Copyright' yields the strings Copyright (C) 1994-2015 H. Peter Anvin et al PXELINUX 6.04 PXE Now compare the PXELINUX size, and PXELINUX banner, from your comment#7 and comment#16 (which report failed boot attempts): > pxelinux.0 : 13148 bytes [PXE-NBP] > > PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin This version number, and binary size both, match the file "/usr/lib/syslinux/pxelinux.0" extracted from "syslinux-3.11-7.i386.rpm" (Brew buildID=182898). Note that "syslinux-3.11-7.i386.rpm" is part of RHEL-5.8 (see the "dist-5E-U8" tag on the Brew build in question, and/or one of our internal RHEL-5.8 package repositories, such as .../released/RHEL-5-Server/U8/i386/os/Server/). In other words, you seem to be using the RHEL-5.8 "pxelinux.0" binary to load a RHEL-8.2 kernel. (2b) In comment#12, you prevented iPXE from downloading the outdated "pxelinux.0" binary -- thereby preventing "pxelinux.0" from downloading (mis-loading) the kernel and the initial ramdisk, too. Instead, you instructed iPXE to download the kernel image and the initial ramdisk directly, without an intermittent network boot program (NBP). Eliminating "pxelinux.0" -- and "pxelinux.0" only! -- from the chain enabled the boot to succeed. (2c) In the libvirt-level reproducer, in comment#16, you inserted the <bootp file='pxelinux.0' server='192.168.1.253'/> element to the "default" network's configuration XML. Nothing in comment#16 makes me doubt that you copied the same RHEL-5.8 "pxelinux.0" file into this (libvirt-based) reproducer, rather than extracting it afresh from the RHEL-8.2 candidate "syslinux-tftpboot-6.04-4.el8.noarch.rpm" package. The "pxelinux.0" size and banner captured lower down in comment#16 confirm the RHEL-5.8 origin (namely: 13,148 bytes, "PXELINUX 3.11 2005-09-02"). --*-- Summary: the RHEL-5.8 pxelinux.0 binary mis-loads the RHEL-8.2 kernel image, and then the kernel's self-check mechanism, referenced by Chao Yang in comment#3, kicks in. Please replace your "pxelinux.0" binary with the one from "syslinux-tftpboot-6.04-4.el8.noarch", and retry. Thanks! FWIW, Lei Yang's Mojo document from comment#6 (DOC-1212767) specifies the correct syslinux version (excerpt): # rpm2cpio syslinux-tftpboot-6.04-4.el8.noarch.rpm | cpio -dimv # cp /var/lib/tftpboot/pxelinux/pxelinux.0 /var/lib/tftpboot/ Yes, this solves the problem and can be closed. Thank you for the time you took explaining the tftpboot problem. I appreciate it as it was great pain for my team. Thank you for confirming! |