Bug 1782885

Summary: qemu unable to boot rhel8.2 pxe images on rhel7.6 host
Product: Red Hat Enterprise Linux 7 Reporter: michal novacek <mnovacek>
Component: ipxeAssignee: Laszlo Ersek <lersek>
ipxe sub component: ipxe-bootimgs QA Contact: FuXiangChun <xfu>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: medium CC: chayang, jinzhao, jkortus, juzhang, leiyang, lersek, virt-maint, xfu, yfu
Version: 7.6Keywords: TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-06 16:34:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description michal novacek 2019-12-12 14:55:48 UTC
Description of problem:

We have a RHEL7.6 host that is acting as our own beaker labcontroller. It downloads pxe images and install virtuals from nightly version snapshots of rhel from local reverse proxy (http://download.lab.eng.brq.redhat.com/nightly/RHEL-8.2.0-20191210.n.0/compose/BaseOS/x86_64/os/).

We are able to pxe boot and install virtuals from RHEL-8.1.0 images but we are not able to boot any recent RHEL-8.2.0 images. The virtual blocks at kernel loading saying "Incorrect or corrupt kernel image." and stops with "boot:" prompt.

Kernel image and initrd image are copied to local disk before they are exposed for virts to boot from and both kernel image itself and initrd seems good when compared md5sum to the original location. Virts are pxe booted from bridged network. The same virt can pxe boot and install RHEL-7.6 and RHEL-8.1.0 stable.


Version-Release number of selected component (if applicable):
RHEL7.6 with z updates from November 27.

> $ rpm -q libvirt qemu-kvm kernel
libvirt-4.5.0-10.el7_6.13.x86_64
qemu-kvm-1.5.3-160.el7_6.3.x86_64
kernel-3.10.0-957.el7.x86_64
kernel-3.10.0-957.38.3.el7.x86_64

How reproducible: always

Steps to Reproduce:
1. try to install rhel8.2.0 nigtly from clusterqe beaker

Actual results: "Incorrect or corrupt kernel image."

Expected results: Happy installation.

Additional info:

Command running the virt:

/usr/libexec/qemu-kvm -name virt-367.cluster-qe.lab.eng.brq.redhat.com -S -machine rhel6.3.0,accel=kvm,usb=off,dump-guest-core=off -m 4000 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid eeab5b89-0e27-4867-a91e-201460bde536 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-204-virt-367.cluster-qe./monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/vg_virts/root-virt-367.cluster-qe.lab.eng.brq.redhat.com,format=raw,if=none,id=drive-virtio-disk0,cache=unsafe,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=46,id=hostnet0,vhost=on,vhostfd=49 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=1a:00:00:00:01:6f,bus=pci.0,addr=0x3,bootindex=1 -netdev tap,fd=51,id=hostnet1,vhost=on,vhostfd=52 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:01:00:07,bus=pci.0,addr=0x5 -netdev tap,fd=54,id=hostnet2,vhost=on,vhostfd=53 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:02:00:07,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:6 -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0x8 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on


We have another labcontroller that is rhel7.2 where this exact scenario runs
correctly meaning rhel-8.2.0 nightly virts are able to boot and install. It has
the following versions:

$ rpm -q kernel libvirt qemu-kvm
kernel-3.10.0-327.36.3.el7.x86_64
libvirt-1.2.17-13.el7_2.5.x86_64
qemu-kvm-1.5.3-105.el7_2.7.x86_64

Comment 2 Lei Yang 2019-12-13 07:46:27 UTC
I try to test on different host,but I can not reproduce this issue.

1. rhel8.2.0 host
Test Version:
4.18.0-164.el8.x86_64
qemu-kvm-2.12.0-93.module+el8.2.0+5173+94838aaa.x86_64
ipxe-roms-20181214-3.git133f4c47.el8.noarch

2. rhel7.8 host
Test Version:
kernel-3.10.0-1118.el7.x86_64
qemu-kvm-rhev-2.12.0-40.el7.x86_64
ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch

3. rhel7.6 host
Test Version:
kernel-3.10.0-957.38.3.el7.x86_64
qemu-kvm-1.5.3-160.el7_6.3.x86_64
ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch

I will be set up a new pxe server to test it,then update test result.

Comment 3 Chao Yang 2019-12-13 10:00:31 UTC
Per https://access.redhat.com/solutions/4391971 , it seems more like a file corruption issue

Comment 4 Lei Yang 2019-12-16 05:41:52 UTC
I set up an new pxeserver,RHEL-8.2.0 image(http://download.lab.eng.brq.redhat.com/nightly/RHEL-8.2.0-20191210.n.0/compose/BaseOS/x86_64/os/) is able to boot and install.

Pxeserver Version:
kernel-4.18.0-161.el8.x86_64
ipxe-roms-20181214-3.git133f4c47.el8.noarch
syslinux-tftpboot-6.04-4.el8.noarch
dhcp-server-4.3.6-40.el8.x86_64
tftp-server-5.2-24.el8.x86_64

Comment 5 michal novacek 2019-12-16 09:31:55 UTC
What command do you use for testing?

Comment 6 Lei Yang 2019-12-16 10:46:26 UTC
(In reply to michal novacek from comment #5)
> What command do you use for testing?

Hi,michal

Test Command line:
/usr/libexec/qemu-kvm -name rhel8 \
-M pc -m 4G \
-cpu Haswell \
-nodefaults \
-smp 4,sockets=1,cores=4,threads=1 \
-drive file=/home/ipxe.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=unsafe,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 \
-vnc :3 \
-vga qxl \
-monitor stdio \
-serial unix:/tmp/monitor1,server,nowait \
-boot menu=on \
-device virtio-net-pci,netdev=tap11,mac=22:57:f8:dd:fe:3a \
-netdev tap,id=tap11,vhost=on \
-kernel /root/vmlinuz \
-initrd /root/initrd.img \
-append method=http://download.lab.eng.brq.redhat.com/nightly/RHEL-8.2.0-20191210.n.0/compose/BaseOS/x86_64/os/ \

==>
I wrote a document that contains test commands and the process setting up a pxe server .

link:https://mojo.redhat.com/docs/DOC-1212767

Best regards
LeiYang

Comment 7 michal novacek 2019-12-17 15:19:50 UTC
I had some luck replicating the problem with virt-install.

On the rhel7.6 I can boot correctly RHEL-8.1.1-20191217.n.1 but I cannot boot RHEL-8.2.0-20191217.n.1 pxe images.


Our setup consists of central server (rhel6.8) providing dhcp and tftpd having root at /tftpboot.
/tftpboot is mounted via nfs on the host running the virt. Provided image is
copied manually from /mnt/redhat being nfs mount of ntap-rdu2-c01-eng01-nfs01b.storage.rdu2.redhat.com:/bos_eng01_engineering_sm/devarchive/redhat



> big-01$ virt-install --disk=none --graphics=none --name test \
> --network bridge=br0,mac=1a:00:00:00:00:06 \
> --boot network,useserial=on --pxe \
> --ram 512
> --connect qemu:///system
> ...
> SeaBIOS (version 1.11.0-2.el7)
> Machine UUID 11258876-ac2f-433e-b5f6-6f201544e4ee
>
>
> iPXE (http://ipxe.org) 00:03.0 C100 PCI2.10 PnP PMM+1FF94550+1FEF4550 C100
>
>
>
> Booting from ROM...
> iPXE (PCI 00:03.0) starting execution...ok
> iPXE initialising devices...ok
>
>
>
> iPXE 1.0.0+ (4e85b27) -- Open Source Network Boot Firmware -- http://ipxe.org
> Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT
>
> net0: 1a:00:00:00:00:06 using rtl8139 on 0000:00:03.0 (open)
>   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> Configuring (net0 1a:00:00:00:00:06)............... ok
> net0: 10.37.166.133/255.255.252.0 gw 10.37.167.254
> Next server: 10.37.166.1
> Filename: /pxelinux.0
> tftp://10.37.166.1//pxelinux.0... ok
> pxelinux.0 : 13148 bytes [PXE-NBP]
>
> PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin
> UNDI data segment at:   0009CB00
> UNDI data segment size: 2CE0
> UNDI code segment at:   0009C2C0
> UNDI code segment size: 0802
> PXE entry point found (we hope) at 9C2C:0160
> My IP address seems to be 0A25A685 10.37.166.133
> ip=10.37.166.133:10.37.166.1:10.37.167.254:255.255.252.0
> TFTP prefix: /
> Trying to load: pxelinux.cfg/01-1a-00-00-00-00-06
> Invalid or corrupt kernel image.
> boot:
>
>
> This is the content of the file pxelinux.cfg/01-1a-00-00-00-00-06
>
> big-01$ cat /tftpboot/pxelinux.cfg/01-1a-00-00-00-00-06
> default linux
> prompt 0
> timeout 100
> label linux
>     kernel /images/default/vmlinuz
>     ipappend 2
>     append initrd=/images/default/initrd.img ks=http://beaker.cluster-qe.lab.eng.brq.redhat.com/bkr/kickstart/304286 ksdevice=bootif netboot_method=pxe
>
> [root@big-01 test]# sha256sum /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz /tftpboot/images/default/vmlinuz
> 68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc  /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz
> 68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc  /tftpboot/images/default/vmlinuz


So why is my kernel image deemed corrupt by ipxe?

Comment 8 John Ferlan 2019-12-17 16:11:39 UTC
Michal -

Speaking as virt-maint, no idea.  I've been watching the interactions before attempting to assign the bz. 

Seems this may be an iPXE problem, so I'll move this along thru the process

Comment 16 michal novacek 2019-12-19 11:03:12 UTC
I found out that the same thing happens with libvirt configured pxe boot which makes it really easy to reproduce (no fancy tftp setup/bridging needed):

> [root@kiff-01 ~]# virsh net-dumpxml default
<network connections='1'>
  <name>default</name>
  <uuid>5c451bbc-6d2b-42c6-9e14-99b20ebbc40d</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr00' stp='on' delay='0'/>
  <mac address='52:54:00:ff:01:00'/>
  <ip address='192.168.1.253' netmask='255.255.254.0'>
>     <tftp root='/tftpboot'/>
    <dhcp>
      <range start='192.168.0.21' end='192.168.1.252'/>
      ...
>       <bootp file='pxelinux.0' server='192.168.1.253'/>
      <host mac='2A:00:00:00:00:01' ip='192.168.0.1'/>
      ...
    </dhcp>
  </ip>
</network>

> [root@kiff-01 ~]# cat /tftpboot/pxelinux.cfg/01-52-54-00-74-84-45
default linux
prompt 0
timeout 100
label linux
    kernel images/default/vmlinuz
    ipappend 2
    append initrd=images/default/initrd.img console=ttyS0,115200 ksdevice=bootif netboot_method=pxe

> [root@kiff-01 ~] sha256sum /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz /tftpboot/images/default/vmlinuz
68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc  /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/vmlinuz
68b28e39acb44491bd746d5d7d37e1078d9582145dd1282a3d92f92640762ccc  /tftpboot/images/default/vmlinuz

[root@kiff-01 ~] sha256sum /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/initrd.img /tftpboot/images/default/initrd.img
111fa26f96755e166e4ab5593ccdb8a387dfe32afea1599f9dbb0e0991a48a78  /mnt/redhat/rhel-8/nightly/RHEL-8/RHEL-8.2.0-20191217.n.0/compose/BaseOS/x86_64/os/images/pxeboot/initrd.img
111fa26f96755e166e4ab5593ccdb8a387dfe32afea1599f9dbb0e0991a48a78  /tftpboot/images/default/initrd.img

> [root@kiff-01 ~]# rpm -qa ipxe-roms-qemu qemu-xvm libvirt kernel | sort
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
kernel-3.10.0-327.83.1.el7.x86_64
libvirt-4.5.0-10.el7_6.15.x86_64

> [root@kiff-01 ~] virt-install --disk=none --graphics=none --name test --network network=default,mac=52:54:00:74:84:45 --boot network,useserial=on --pxe --ram 512 --connect qemu:///system

SeaBIOS (version 1.11.0-2.el7)
Machine UUID 17596046-b1be-4c7e-b9a8-08bf8e6cd01a

iPXE (http://ipxe.org) 00:03.0 C100 PCI2.10 PnP PMM+1FF94550+1FEF4550 C100



Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
iPXE initialising devices...ok



iPXE 1.0.0+ (4e85b27) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT

net0: 52:54:00:74:84:45 using rtl8139 on 0000:00:03.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 52:54:00:74:84:45)............... ok
net0: 192.168.1.66/255.255.254.0 gw 192.168.1.253
Next server: 192.168.1.253
Filename: pxelinux.0
tftp://192.168.1.253/pxelinux.0... ok
pxelinux.0 : 13148 bytes [PXE-NBP]

PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin
UNDI data segment at:   0009CB00
UNDI data segment size: 2CE0
UNDI code segment at:   0009C2C0
UNDI code segment size: 0802
PXE entry point found (we hope) at 9C2C:0160
My IP address seems to be C0A80142 192.168.1.66
ip=192.168.1.66:192.168.1.253:192.168.1.253:255.255.254.0
TFTP prefix:
Trying to load: pxelinux.cfg/01-52-54-00-74-84-45

Invalid or corrupt kernel image.
boot:

Comment 17 Laszlo Ersek 2020-01-03 16:17:26 UTC
Hello Michal,

(1) just to be sure, please try the same procedure on a different
virtualization host (i.e., run the PXE client VM on a different host).
By "different", I mean:

- switch from AMD to Intel
- try disabling netsted paging.

This idea is inspired by BZ#1627022 -> BZ#1655873 -> BZ#1673779. It's
not a very smart idea, but we've been surprised before, and such a test
should not take much of your time.

(2) That said, I think the issue is that you are using an extremely
outdated version of "pxelinux.0", for booting a bleeding edge RHEL-8.2
kernel. That's not expected to work.

In support of my claim:


(2a) In comment#4, Lei Yang used "syslinux-tftpboot-6.04-4.el8.noarch",
as part of a successful boot test.

Extracting "/tftpboot/pxelinux.0" from
"syslinux-tftpboot-6.04-4.el8.noarch.rpm", we can see that the size of
"pxelinux.0" is 42,821 bytes.

Furthermore, running the following pipeline on that binary:

  $ strings pxelinux.0 | egrep 'PXELINUX|Copyright'

yields the strings

  Copyright (C) 1994-2015 H. Peter Anvin et al
  PXELINUX 6.04 PXE

Now compare the PXELINUX size, and PXELINUX banner, from your comment#7
and comment#16 (which report failed boot attempts):

> pxelinux.0 : 13148 bytes [PXE-NBP]
>
> PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin

This version number, and binary size both, match the file
"/usr/lib/syslinux/pxelinux.0" extracted from "syslinux-3.11-7.i386.rpm"
(Brew buildID=182898).

Note that "syslinux-3.11-7.i386.rpm" is part of RHEL-5.8 (see the
"dist-5E-U8" tag on the Brew build in question, and/or one of our
internal RHEL-5.8 package repositories, such as
.../released/RHEL-5-Server/U8/i386/os/Server/).

In other words, you seem to be using the RHEL-5.8 "pxelinux.0" binary to
load a RHEL-8.2 kernel.


(2b) In comment#12, you prevented iPXE from downloading the outdated
"pxelinux.0" binary -- thereby preventing "pxelinux.0" from downloading
(mis-loading) the kernel and the initial ramdisk, too.

Instead, you instructed iPXE to download the kernel image and the
initial ramdisk directly, without an intermittent network boot program
(NBP). Eliminating "pxelinux.0" -- and "pxelinux.0" only! -- from the
chain enabled the boot to succeed.


(2c) In the libvirt-level reproducer, in comment#16, you inserted the

  <bootp file='pxelinux.0' server='192.168.1.253'/>

element to the "default" network's configuration XML.

Nothing in comment#16 makes me doubt that you copied the same RHEL-5.8
"pxelinux.0" file into this (libvirt-based) reproducer, rather than
extracting it afresh from the RHEL-8.2 candidate
"syslinux-tftpboot-6.04-4.el8.noarch.rpm" package.

The "pxelinux.0" size and banner captured lower down in comment#16
confirm the RHEL-5.8 origin (namely: 13,148 bytes, "PXELINUX 3.11
2005-09-02").

--*--

Summary: the RHEL-5.8 pxelinux.0 binary mis-loads the RHEL-8.2 kernel
image, and then the kernel's self-check mechanism, referenced by Chao
Yang in comment#3, kicks in.

Please replace your "pxelinux.0" binary with the one from
"syslinux-tftpboot-6.04-4.el8.noarch", and retry. Thanks!

Comment 18 Laszlo Ersek 2020-01-03 16:31:23 UTC
FWIW, Lei Yang's Mojo document from comment#6 (DOC-1212767) specifies the correct syslinux version (excerpt):

# rpm2cpio syslinux-tftpboot-6.04-4.el8.noarch.rpm | cpio -dimv
# cp /var/lib/tftpboot/pxelinux/pxelinux.0 /var/lib/tftpboot/

Comment 19 michal novacek 2020-01-06 12:51:26 UTC
Yes, this solves the problem and can be closed.

Thank you for the time you took explaining the tftpboot problem. I appreciate it as it was great pain for my team.

Comment 20 Laszlo Ersek 2020-01-06 16:34:30 UTC
Thank you for confirming!