RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1926561 - boot vm from pxe failed
Summary: boot vm from pxe failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: ipxe
Version: 8.4
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: 8.4
Assignee: Laszlo Ersek
QA Contact: Erico Nunes
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-09 03:50 UTC by Yanan Fu
Modified: 2021-05-18 16:18 UTC (History)
18 users (show)

Fixed In Version: ipxe-20181214-8.git133f4c47.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 16:17:51 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
seabios log (5.32 KB, text/plain)
2021-02-09 03:50 UTC, Yanan Fu
no flags Details
Screenshot for the vm when boot failed (4.17 KB, image/png)
2021-02-09 03:53 UTC, Yanan Fu
no flags Details
[PATCH] spec: combine BIOS and EFI roms using "util/catrom.pl" -- on top of dist-git 6f7a5f398e2d (2.64 KB, patch)
2021-02-18 04:24 UTC, Laszlo Ersek
no flags Details | Diff

Description Yanan Fu 2021-02-09 03:50:54 UTC
Created attachment 1755840 [details]
seabios log

Description of problem:
Install vm from pxe failed

Version-Release number of selected component (if applicable):
ipxe: ipxe-roms-qemu-20181214-7.git133f4c47.el8.noarch

How reproducible:
100%

Steps to Reproduce:
1. Launch QEMU with the following command, to boot vm from pxe 
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -blockdev node-name=file_pxe,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/pxe-test.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_pxe,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_pxe \
    -device scsi-hd,id=pxe,drive=drive_pxe,write-cache=on \
    -device virtio-net-pci,mac=9a:a7:ed:a9:97:7a,id=id735AGB,netdev=idd4LUEH,bus=pci.0,addr=0x5  \
    -netdev tap,id=idd4LUEH,vhost=on \


2. It failed and show:
 Installation failed - cannot continue

Actual results:
Boot from pxe failed

Expected results:
Boot from pxe success


Additional info:
This is a regression issue with ipxe-roms-qemu-20181214-7.git133f4c47.el8.noarch, the pre version ipxe-roms-qemu-20181214-6.git133f4c47.el8.noarch is ok

Comment 1 Yanan Fu 2021-02-09 03:53:44 UTC
Created attachment 1755841 [details]
Screenshot for the vm when boot failed

Comment 2 Yanan Fu 2021-02-10 02:02:25 UTC
This regression issue block our gating test now, so add 'TestBlocker' and change priority to 'urgent'.
Thanks all!

Comment 6 Jeff Nelson 2021-02-10 07:31:11 UTC
It seems as though the fix for BZ 1913719 has caused this regression.

Comment 7 Jarod Wilson 2021-02-10 13:59:32 UTC
Yeah, that's... odd. The only change was indeed to enable building the 'ping' command into the roms, nothing else. I don't have a clue how that could cause this. I know that a newer compiler toolchain was used here, so the first thing to test would be to disable ping and re-rebuild w/the current toolchain, and see if that gives working results or not.

Comment 9 Laszlo Ersek 2021-02-11 07:58:56 UTC
The eax error code 0x1b101b10 that's visible in the screenshot (attachment 1755841 [details]) corresponds to "install_block_death" in "libprefix.S".

Comment 10 Laszlo Ersek 2021-02-11 08:25:14 UTC
I cannot reproduce this issue, with an ad-hoc test. My environment is a RHEL7 host, RHEL7 SeaBIOS, with an upstream QEMU build, and the two noted ipxe-roms-qemu packages (from RHEL8), using the "usr/share/ipxe/1af41000.rom" image from each.

qemu-system-x86_64 \
  -nodefaults \
  -device VGA \
  -enable-kvm \
  -device virtio-net-pci,romfile=2/usr/share/ipxe/1af41000.rom,netdev=x \
  -netdev bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper,id=x

where "2/usr/share/ipxe/1af41000.rom" is the supposedly broken iPXE ROM file (sha256sum f7a79b97bde6a77bb6a990127e16db73ea436729810db01455c1f0dec539281d).

The above command line correctly attempts PXE-boot (it even downloads the NBP).

I can even enter the iPXE command prompt with Ctrl-B, run the "dhcp" command successfully, and then run "ping 184.50.173.220" (www.redhat.com) successfully.


At this point I'm not convinced that the symptom is specific to the iPXE version difference. Yanan, please include a complete environment description (all virt package versions etc). Even better would be if you had a reproducer environment for us to access. Thanks.

Comment 15 Laszlo Ersek 2021-02-17 18:14:58 UTC
Minimal reproducer command line:

/usr/libexec/qemu-kvm \
  -nodefaults \
  -nographic \
  -chardev stdio,signal=off,mux=on,id=char0 \
  -mon chardev=char0,mode=readline \
  -serial chardev:char0 \
  -device sga \
  -device virtio-net-pci,netdev=net0,bootindex=0 \
  -netdev user,id=net0

Comment 16 Laszlo Ersek 2021-02-17 18:32:20 UTC
The symptom reproduces with "-machine accel=tcg" added to the QEMU command line, so the problem is independent of the host kernel at least.

Comment 17 Laszlo Ersek 2021-02-17 18:56:08 UTC
Rebuilding the "ipxe-20181214-6.git133f4c47.el8" source RPM with "gcc-8.4.1-1.el8.x86_64" and "binutils-2.30-90.el8.x86_64" does not reproduce the problem.

Rebuilding the "ipxe-20181214-7.git133f4c47.el8" source RPM in the same environment reproduces the problem.

This shows that the symptom is independent of the toolchain. The symptom is really related to the change(s) in the iPXE package.

Comment 18 Laszlo Ersek 2021-02-17 19:35:59 UTC
Sanity checked the failing iPXE version with some previous qemu builds as well, but qemu seems irrelevant:

fails with: qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a.x86_64
fails with: qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64
fails with: qemu-kvm-5.1.0-14.module+el8.3.0+8790+80f9c6d8.1.x86_64

Comment 19 Laszlo Ersek 2021-02-17 19:44:15 UTC
The symptom is specific to the "virtio-net-pci" device model. It does not reproduce with either "e1000" or "rtl8139".

Comment 20 Laszlo Ersek 2021-02-17 20:50:33 UTC
Something horrible is going on; if I add "DEBUG=pci:2" to the "make" command in the make_ipxe() shell function in the SPEC file, then the symptom disappears :/

Comment 21 Laszlo Ersek 2021-02-17 21:58:38 UTC
Made some progress.

On RHEL8, the ipxe-roms-qemu package provides two sets of files:

/usr/share/ipxe.efi/10222000.rom
/usr/share/ipxe.efi/10ec8029.rom
/usr/share/ipxe.efi/10ec8139.rom
/usr/share/ipxe.efi/15ad07b0.rom
/usr/share/ipxe.efi/1af41000.rom
/usr/share/ipxe.efi/8086100e.rom
/usr/share/ipxe.efi/808610d3.rom
/usr/share/ipxe.efi/80861209.rom

/usr/share/ipxe/10222000.rom
/usr/share/ipxe/10ec8029.rom
/usr/share/ipxe/10ec8139.rom
/usr/share/ipxe/15ad07b0.rom
/usr/share/ipxe/1af41000.rom
/usr/share/ipxe/8086100e.rom
/usr/share/ipxe/808610d3.rom
/usr/share/ipxe/80861209.rom

This is *unlike* on RHEL7.

The first set contains the combined oproms, the second set contains the BIOS-only oproms.

In my previous testing (comment 10), I used the *wrong file*, for my ad-hoc reproduction attempt. I used the BIOS-only oprom for virtio-net-pci. That file works fine.

However, the combined oprom breaks -- it breaks even in my original (comment 10) repro environment.

Note that QEMU loads the *combined* oproms, via symlinks:

/usr/share/qemu-kvm/efi-e1000.rom     ->  ../ipxe.efi/8086100e.rom
/usr/share/qemu-kvm/efi-e1000e.rom    ->  ../ipxe.efi/808610d3.rom
/usr/share/qemu-kvm/efi-ne2k_pci.rom  ->  ../ipxe.efi/10ec8029.rom
/usr/share/qemu-kvm/efi-pcnet.rom     ->  ../ipxe.efi/10222000.rom
/usr/share/qemu-kvm/efi-rtl8139.rom   ->  ../ipxe.efi/10ec8139.rom
/usr/share/qemu-kvm/efi-virtio.rom    ->  ../ipxe.efi/1af41000.rom

This means that the EfiRom utility from the edk2-tools package that we use for combining the BIOS ROM and the efidrv images, into the combined oproms, *corrupts* the output.

Comment 22 Laszlo Ersek 2021-02-17 23:25:34 UTC
Said corruption is selective. If I build the BIOS-only ROM file manually, then the combined oprom is fine as well. The breakage in the combined oprom only occurs if the BIOS-only ROM file (= the first half of the combined ROM) is prepared by the RPM spec file.

Comment 23 Laszlo Ersek 2021-02-18 00:09:40 UTC
Another partial result.

Going from RHEL7 to RHEL8, the following snippet in the spec file was lost:

> # The src/Makefile.housekeeping relies on .git/index existing
> # but since we pass GITVERSION= to make, we don't actally need
> # it to be the real deal, so just touch it to let the build pass
> mkdir .git
> touch .git/index

Because of this, in RHEL8 there is no .git subdirectory when the package is built.

Consequently, in the following "make" command:

> make_ipxe() {
>     make %{?_smp_mflags} \
>         NO_WERROR=1 V=1 \
>         GITVERSION=%{hash} \
> %if 0%{?cross}
>         CROSS_COMPILE=x86_64-linux-gnu- \
> %endif
>         "$@"
> }

the GITVERSION macro assignment *has no effect*. It's easy to see in the build logs in Brew as well:

  -DVERSION="\"1.0.0+\""

If the GITVERSION assignment had any effect, then the VERSION C-language macro would include the value of GITVERSION.

Why is this relevant? Because I've found that, with the GITVERSION assignment having an effect (i.e., when a .git directory exists), the build *works*.

Comment 24 Laszlo Ersek 2021-02-18 01:44:00 UTC
More progress.

The iPXE startup failure is due to LZMA decompression failure (CRC32
verification or similar).

The "arch/x86/prefix/libprefix.S" file defines a function called
"install_block", which performs LZMA decompression (calling
decompress16). If that call returns with CF set, the decompression is
deemed a failure, and we hang with the known symptoms.

"install_block" is called 4 times in total, with all calls being made
from "install_prealloc" (same file). By writing a byte to the QEMU debug
port just after the decompression result check:

>         /* Decompress (or copy) source to destination */
>  #if COMPRESS
>         movw    $decompress16, %bx
>  #else
>         movw    $copy_bytes, %bx
>  #endif
>         call    process_bytes
>         jc      99f
>
> +       pushw   %dx
> +       pushw   %ax
> +       movw    $0x402, %dx
> +       outb    %al, %dx
> +       popw    %ax
> +       popw    %dx
> +

I managed to determine that it's the fourth call to install_block that
fails:

>         /* Install .text and .data to temporary area in high memory,
>          * prior to reading the E820 memory map and relocating
>          * properly.
>          */
>         pushl   %edi
>         movl    $_textdata_filesz, %ecx
>         movl    $_textdata_memsz, %edx
>         progress "  .textdata      ", %esi, %edi, %ecx, %edx
>         call    install_block
>         jc      install_block_death

Note that in the debug code I added, I didn't actually set the AL
register to anything, so basically the byte written is unpredictable.
First, that's fine, only the byte count (call count) was relevant at
that space. Second, the reason for not setting AL is that the
*slightest* modification to the generated machine code may cause the
LZMA compressed stream to change in (padded) size, and then the symptom
disappears. This is why enabling iPXE's built-in debug facilities
doesn't help, and also why changing the VERSION macro perturbs the
result.

So, given all the info above, it finally dawned on me to run a simple
binary diff between the original BIOS-only image, and the
EfiRom-produced combined image. In theory, the first part of the
combined image should match the BIOS-only image bit-for-bit, except the
"last image" indicator. However, in the particular failing case, I've
found that EfiRom *corrupts the very last byte* of the BIOS-only image,
when copying it into the combined image. This causes a CRC32 error when
iPXE decompresses itself from the (corrupted) BIOS image that's embedded
in the combined image. The corruption seems to be that bit#7 of the last
byte gets cleared by EfiRom.

Comparing the files from
"ipxe-roms-qemu-20181214-7.git133f4c47.el8.noarch":

> # ls -l /usr/share/ipxe/1af41000.rom /usr/share/ipxe.efi/1af41000.rom
>
> -rw-r--r--. 1 root root 262144 Feb  4 19:07 /usr/share/ipxe.efi/1af41000.rom
> -rw-r--r--. 1 root root  74240 Feb  4 19:07 /usr/share/ipxe/1af41000.rom

we get:

> # cmp -l /usr/share/ipxe/1af41000.rom /usr/share/ipxe.efi/1af41000.rom
>    50 200   0
> 74240 233  33
> cmp: EOF on /usr/share/ipxe/1af41000.rom after byte 74240

(Note that "cmp" prints offsets in 1-based decimal notation.)

Expressed differently:

> diff -u \
>   <(hexdump -C /usr/share/ipxe/1af41000.rom) \
>   <(hexdump -C /usr/share/ipxe.efi/1af41000.rom)
>
> @@ -1,7 +1,7 @@
>  00000000  55 aa 91 e9 a2 00 85 00  00 00 00 00 00 00 00 00  |U...............|
>  00000010  9c 00 00 00 00 00 84 00  1c 00 40 00 50 43 49 52  |..........@.PCIR|
>  00000020  f4 1a 00 10 bf 04 1c 00  03 02 00 00 91 00 01 00  |................|
> -00000030  00 80 07 00 00 00 00 00  8d b4 00 00 8d b4 00 00  |................|
> +00000030  00 00 07 00 00 00 00 00  8d b4 00 00 8d b4 00 00  |................|
>  00000040  24 50 6e 50 01 02 00 00  00 7d 00 00 00 00 60 00  |$PnP.....}....`.|
>  00000050  70 00 02 00 00 f4 00 00  00 00 85 03 00 00 00 00  |p...............|
>  00000060  68 74 74 70 3a 2f 2f 69  70 78 65 2e 6f 72 67 00  |http://ipxe.org.|
> @@ -4637,5 +4637,6221 @@
>  000121c0  f0 71 0e 5b b9 cf d1 2f  d2 14 ff 55 ab 92 8d b7  |.q.[.../...U....|
>  000121d0  16 d4 5c 2b 57 a2 ca 89  89 7f 26 9a 38 85 ad d2  |..\+W.....&.8...|
>  000121e0  cc cc 16 2a 0f c2 56 7c  59 06 e4 05 6c 3b 45 3e  |...*..V|Y...l;E>|
> -000121f0  29 94 1b 72 4e 4a cc ff  fe 66 52 21 cd ef 41 9b  |)..rNJ...fR!..A.|
> -00012200
> +000121f0  29 94 1b 72 4e 4a cc ff  fe 66 52 21 cd ef 41 1b  |)..rNJ...fR!..A.|
> +00012200  55 aa c3 00 f1 0e 00 00  0b 00 64 86 01 00 00 00  |U.........d.....|
> [...]

The first difference, at offset 49 (zero-based, decimal), is the "last
image" indicator, and it is supposed to be zero for the initial
(BIOS-only) driver image, in the combined image.

The second difference, at offset 74239 (zero-based, decimal) is
unjustified however; that's the bug in EfiRom.

Comment 25 Laszlo Ersek 2021-02-18 01:56:57 UTC
Stunning: the last byte in question is supposed to be a checksum byte. According to the PCI Firmware Spec, the entire image must sum to zero.

EfiRom recalculates that byte. Given that the last indicator flag goes from 0x80 to 0x00, it seems justified that the checksum changes from 0x9b to 0x1b.

And now I'm again confused because this seems... valid. Why does it throw off the LZMA decompressor in iPXE?

Comment 26 Yanan Fu 2021-02-18 02:18:33 UTC
Hi Laszlo,

Sorry for the late reply as the Chinese holiday ~
From the comment above seems you have reproduce this issue. If you need any other support, feel free to contact me again,
Thanks a lot!


Best regards
Yanan Fu

Comment 27 Laszlo Ersek 2021-02-18 02:39:04 UTC
iPXE pads any BIOS image to the nearest multiple of 512 bytes, with
0xFF:

> arch/x86/Makefile.pcbios:PAD_rom                = $(PERL) $(PADIMG) --blksize=512 --byte=0xff

Compare the tails of the "/usr/share/ipxe/1af41000.rom" files between:
- ipxe-roms-qemu-20181214-6.git133f4c47.el8.noarch
- ipxe-roms-qemu-20181214-7.git133f4c47.el8.noarch

> 00011eb0  f2 59 bc 1f 77 02 6f 99  ff fa 95 1c 41 15 a3 d0  |.Y..w.o.....A...|
> 00011ec0  10 ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
> 00011ed0  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
> *
> 00012000

vs.

> 000121d0  16 d4 5c 2b 57 a2 ca 89  89 7f 26 9a 38 85 ad d2  |..\+W.....&.8...|
> 000121e0  cc cc 16 2a 0f c2 56 7c  59 06 e4 05 6c 3b 45 3e  |...*..V|Y...l;E>|
> 000121f0  29 94 1b 72 4e 4a cc ff  fe 66 52 21 cd ef 41 9b  |)..rNJ...fR!..A.|
> 00012200

In the -6 build, there is padding.

In the -7 build, there is no padding -- the ROM file size is a whole
multiple of 512!

This can be confirmed by modifying the above-quoted Makefile rule,
appending the "--verbose" flag to PAD_rom.

From the "util/padimg.pl" script:

> while ( my $filename = shift ) {
>   die "$filename is not a file\n" unless -f $filename;
>   my $oldsize = -s $filename;
>   my $padsize = ( ( -$oldsize ) % $blksize );
>   my $newsize = ( $oldsize + $padsize );
>   next unless $padsize;
>   if ( $verbosity >= 1 ) {
>       printf "Padding %s from %d to %d bytes with %d x 0x%02x\n",
>              $filename, $oldsize, $newsize, $padsize, $byte;
>   }

Note that, even with the --verbose flag added, there is no message if
padding is unnecessary (see "next unless $padsize").

And that's exactly what I'm seeing -- no padding message.

BIOS ROM files without trailing padding cannot be embedded by EfiRom
into a combined oprom. In the -6 build, EfiRom's checksum update falls
into the padding, so the LZMA decompression is not affected; in the -7
build, the checksum update by EfiFrom corrupts the LZMA stream.

BTW, during the build, iPXE also updates checksums. Right after invoking
"padimg" (which may or *may not* append any bytes), "util/fixrom.pl" is
invoked. And that script updates *multiple* checksums in one go:

>   while ( $image ) {
>     $image->pnp_header->fix_checksum() if $image->pnp_header;
>     $image->undi_header->fix_checksum() if $image->undi_header;
>     $image->ipxe_header->fix_checksum() if $image->ipxe_header;
>     $image->fix_checksum();
>     $image = $image->next_image();
>   }

Comment 28 Laszlo Ersek 2021-02-18 04:24:13 UTC
Created attachment 1757711 [details]
[PATCH] spec: combine BIOS and EFI roms using "util/catrom.pl" -- on top of dist-git 6f7a5f398e2d

Comment 29 Laszlo Ersek 2021-02-18 04:41:29 UTC
(In reply to Laszlo Ersek from comment #28)
> Created attachment 1757711 [details]
> [PATCH] spec: combine BIOS and EFI roms using "util/catrom.pl" -- on
> top of dist-git 6f7a5f398e2d

- Successfully tested the availability of the "ping" command, when
  booted on SeaBIOS.

- Successfully tested the SNP driver with OVMF (UEFI PXE Boot).

- Sanity checks on the "/usr/share/ipxe.efi/1af41000.rom" file, between
  "ipxe-roms-qemu-20181214-7.git133f4c47.el8.noarch" and the scratch
  build
  ("ipxe-roms-qemu-20181214-7.bz1926561.01.git133f4c47.el8.noarch"):

  - "EfiRom -d" output unchanged:

> Image 1 -- Offset 0x0
>   ROM header contents
>     Signature              0xAA55
>     PCIR offset            0x001C
>     Signature               PCIR
>     Vendor ID               0x1AF4
>     Device ID               0x1000
>     Length                  0x001C
>     Revision                0x0003
>     DeviceListOffset        0x4BF
>     Device list contents
>       0x1000
>     Class Code              0x000002
>     Image size              0x12200
>     Code revision:          0x0001
>     MaxRuntimeImageLength   0x07
>     ConfigUtilityCodeHeaderOffset 0x00
>     DMTFCLPEntryPointOffset 0x00
>     Indicator               0x00
>     Code type               0x00
> Image 2 -- Offset 0x12200
>   ROM header contents
>     Signature              0xAA55
>     PCIR offset            0x001C
>     Signature               PCIR
>     Vendor ID               0x1AF4
>     Device ID               0x1000
>     Length                  0x0018
>     Revision                0x0000
>     DeviceListOffset        0x00
>     Class Code              0x000000
>     Image size              0x18600
>     Code revision:          0x0000
>     MaxRuntimeImageLength   0x00
>     ConfigUtilityCodeHeaderOffset 0x83FA
>     DMTFCLPEntryPointOffset 0x01
>     Indicator               0x80   (last image)
>     Code type               0x03   (EFI image)
>   EFI ROM header contents
>     EFI Signature          0x0EF1
>     Compression Type       0x0001 (compressed)
>     Machine type           0x8664 (X64)
>     Subsystem              0x000B (EFI boot service driver)
>     EFI image offset       0x0034 (@0x12234)


  - "cmp -l" shows updates to internal iPXE checksum field:

>      7 205   5
> [...]
>  74240  33 233
> [...]

- Comparison between "/usr/share/ipxe/1af41000.rom" and
  "/usr/share/ipxe.efi/1af41000.rom" inside
  "ipxe-roms-qemu-20181214-7.bz1926561.01.git133f4c47.el8.noarch":

>     7 205   5
>    50 200   0
> cmp: EOF on bios

  Note how offset 6 (zero-based, decimal) compensates for the "last
  image indicator" update at offset 49, and that there is no change at
  offset 74239.

Comment 32 Laszlo Ersek 2021-02-18 07:06:49 UTC
(In reply to Laszlo Ersek from comment #28)
> Created attachment 1757711 [details]
> [PATCH] spec: combine BIOS and EFI roms using "util/catrom.pl" -- on top of
> dist-git 6f7a5f398e2d

On further thinking, reality is likely simpler than the commit message on this patch suggests:

- The PCI firmware spec doesn't seem to designate a particular "checksum byte" anywhere, it just says that the image must sum to zero,

- edk2's EfiRom assumes that the BIOS ROM image always ends with padding, and steals the last byte for checksum purposes,

- whereas ipxe's "util/catrom.pl" uses one of the reserved bytes in the PCI Expansion ROM Header for checksum purposes.

If there is no trailing padding (= the BIOS ROM is a whole multiple of 512 bytes), then EfiRom's assumption fails, and its choice for the checksum byte corrupts the image (e.g., it breaks the LZMA stream of iPXE). The choice of "util/catrom.pl" is safer, as it can never conflict with the ROM contents.

Anyway, this would only qualify as a commit message update; it does not affect the code in the patch.

Comment 33 Jarod Wilson 2021-02-18 15:50:00 UTC
Sorry, been a little distracted by other work... Laszlo, I see this bz has been reassigned, which suggests maybe you're planning to commit and push a new build with your patch. Or was the reassign accidental, and you're still wanting me to take care of it?

Comment 34 Laszlo Ersek 2021-02-18 16:30:06 UTC
Hi Jarod,

previously we used to follow a kernel-like workflow with the ipxe
package; we'd post patches to rhvirt-patches, the patches would be
reviewed, and then our maintainers would apply them to an "exploded git
tree". The maintainers would reflect the patches to dist-git with some
scripting, and then they'd build the new package. Regarding BZ
ownership, the Assignee field would reflect who was responsible for
solving the issue, not who was supposed to apply any patches and prepare
any builds.

However, at this time, we no longer follow this process with iPXE. IIUC,
the "exploded git tree" has been abandoned, and now dist-git is the
primary repository. I simply don't know how this process works. I
assigned the BZ to myself because I thought that it reflected the fact
that I provided a patch, and not because I thought it should be me to
push the patch to dist-git, and/or to prepare the next official
development build. I don't understand how "patch provider" and "package
maintainer" are represented / distinguished in a purely dist-git-based
package. "Attaching" a patch to a BZ ticket is also much inferior to
posting it to an internal mailing list, but at this point there seems to
be no other option for the iPXE package.

Ultimately I would like to ask you to please push the patch (from
comment 28) and to prepare a new build (see my earlier request in
comment 30 too). If that *requires* the BZ to be assigned to you, then
please correct the Assignee field as well. Thanks!

(

  General rant:

  The iPXE package remains a really bad fit for our workflows. In fact
  we should distinguish *three roles* in its handling: (1) the virt team
  should be the BZ assignee on issues related to the "ipxe-roms-qemu"
  subpackage, (2) another team should be the BZ assignee on tickets that
  relate to the "ipxe-roms" and "ipxe-bootimgs" subpackages, and (3) a
  third team should apply patches to an exploded git tree, and push new
  builds, in response to patches related to *either* of the three
  subpackages.

  iPXE is much more complex than it seems on the surface, and it would
  deserve significantly more process and "human resources" downstream
  than what it has historically received. In other words, we should
  treat iPXE like a "small kernel", with its own mailing list, exploded
  git tree, and so on.

  For years I've been pushing for assigning "ipxe-roms" and
  "ipxe-bootimgs" to people *outside* of the virt team, but that has
  never meant that the iPXE package should regress to a plain
  dist-git-based workflow. iPXE is both too complex for that, and too
  widely used for that, both inside and outside of Red Hat.

  iPXE is extremely advanced and it serves a multitude of use cases and
  user groups; those facts should be honored by our engineering
  *organization*. iPXE should never be the responsibilty of a sole
  maintainer who "occasionally" gets some help from a different team.

  To emphasize, the above is an organizational bug.

)

Laszlo

Comment 36 Jarod Wilson 2021-02-19 21:02:26 UTC
Ah. Every userspace package I've ever worked with, the package owner was the assignee on the bug. Reassigning usually meant the new assignee was going to take care of the fix, for whatever reason. In any case, I've submitted an updated build w/the patch:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35048574

Thanks for your work on this, I really know nothing of the package, I just got assigned it as a stop-gap...

Comment 44 errata-xmlrpc 2021-05-18 16:17:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ipxe bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1941


Note You need to log in before you can comment on or make changes to this bug.