RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1809246 - [RFE] GRUB does not consider information from proxy dhcp server
Summary: [RFE] GRUB does not consider information from proxy dhcp server
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: grub2
Version: 8.1
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Bootloader engineering team
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 2018329
TreeView+ depends on / blocked
 
Reported: 2020-03-02 17:09 UTC by Jacob Hunt
Modified: 2023-12-15 17:26 UTC (History)
16 users (show)

Fixed In Version: grub2-2.02-113.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2018329 (view as bug list)
Environment:
Last Closed: 2022-05-10 15:31:42 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tcpdump of failed pxe boot (2.63 MB, application/vnd.tcpdump.pcap)
2020-03-02 17:13 UTC, Jacob Hunt
no flags Details
UEFI guest trying to get the netboot image (14.94 KB, image/png)
2020-03-02 17:20 UTC, Jacob Hunt
no flags Details
UEFI guest drops to the grub prompt (3.72 KB, image/png)
2020-03-02 17:21 UTC, Jacob Hunt
no flags Details
Case number two pcap (6.23 KB, application/octet-stream)
2021-04-02 23:08 UTC, Ian Hands
no flags Details
Case number one pcap (5.46 KB, application/octet-stream)
2021-04-02 23:55 UTC, Ian Hands
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RTT-4224 0 None None None 2022-02-09 22:42:51 UTC
Red Hat Issue Tracker RTT-4225 0 None None None 2022-02-09 22:42:58 UTC
Red Hat Product Errata RHSA-2022:2110 0 None None None 2022-05-10 15:32:26 UTC

Description Jacob Hunt 2020-03-02 17:09:42 UTC
Description of problem:

dnsmasq is unable to successfully PXE boot a UEFI client while running in proxyDHCP mode.

Version-Release number of selected component (if applicable):

dnsmasq-2.79-6.el8.x86_64

How reproducible:

100%

Steps to Reproduce:
1. This can be easily reproduced in a libvirt environment
2. Setup the /var/lib/tftpboot directory structure
3. Configure dnsmasq for proxyDHCP mode (config included)
4. Configure a libvirt guest to use UEFI rather than BIOS
5. netboot the guest

Actual results:

The guest will fetch the 'shim.efi' and 'grubx64.efi' but can't download the netboot image, it will eventually fail and drop to the grub prompt

Expected results:

The UEFI client can successfully PXE boot

Additional info:

This works fine is dnsmasq is not in proxyDHCP mode, or if dnsmasq is acting only as a tftp server.

Comment 1 Jacob Hunt 2020-03-02 17:11:50 UTC
This seems to be a known problem, but I haven't found anywhere that it was successfully addressed.

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2015q4/009907.html

"""
The problem in known, but not the solution. I did start working on that
about six months ago, but got bogged down in creating a test system.

What would be really useful would be to find an implementation that
works with UEFI and proxy DHCP, and getting for packet captures to show
what should be sent.
"""

- Jacob

Comment 2 Jacob Hunt 2020-03-02 17:13:00 UTC
Created attachment 1667025 [details]
tcpdump of failed pxe boot

Comment 3 Jacob Hunt 2020-03-02 17:19:29 UTC
Here is my dnsmasq.conf

"""
# Don't function as a DNS server:
port=0
dhcp-no-override

# Log lots of extra information about DHCP transactions.
log-dhcp

# Set the root directory for files available via FTP.
tftp-root=/var/lib/tftpboot
enable-tftp

dhcp-range=192.168.122.0,proxy,255.255.255.0

dhcp-match=bios,option:client-arch,0
dhcp-match=uefi,option:client-arch,7

dhcp-boot=tag:bios,pxelinux.0
dhcp-boot=tag:uefi,grub2/shim.efi

pxe-service=tag:bios,x86PC,"x86 bios boot msg",pxelinux.0
pxe-service=tag:uefi,X86-64_EFI,"x86 uefi boot msg",grub2/shim.efi
"""

# netstat -tulpn | grep dnsmasq
udp        0      0 0.0.0.0:4011            0.0.0.0:*                           3502/dnsmasq
udp        0      0 0.0.0.0:67              0.0.0.0:*                           3502/dnsmasq
udp        0      0 0.0.0.0:69              0.0.0.0:*                           3502/dnsmasq
udp6       0      0 :::69                   :::*                                3502/dnsmasq

Here are the logs from the dnsmasq server:

"""
Feb 25 22:12:22 pxe8.example.com dnsmasq[3502]: started, version 2.79 DNS disabled
Feb 25 22:12:22 pxe8.example.com dnsmasq[3502]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth DNSSEC loop-detect inotify
Feb 25 22:12:22 pxe8.example.com dnsmasq[3502]: DBus support enabled: connected to system bus
Feb 25 22:12:22 pxe8.example.com dnsmasq-dhcp[3502]: DHCP, proxy on subnet 192.168.122.0
Feb 25 22:12:22 pxe8.example.com dnsmasq-tftp[3502]: TFTP root is /var/lib/tftpboot
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 available DHCP subnet: 192.168.122.0/255.255.255.0
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 vendor class: PXEClient:Arch:00007:UNDI:003001
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 PXE(enp1s0) 52:54:00:84:b1:75 proxy
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 tags: uefi, enp1s0
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 next server: 192.168.122.195
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 broadcast response
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 sent size:  1 option: 53 message-type  2
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 sent size:  4 option: 54 server-identifier  192.168.122.195
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Feb 25 22:12:34 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 sent size: 17 option: 97 client-machine-id  00:b4:46:3c:7e:d8:d0:49:43:a2:d2:96:6e:a2...
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 available DHCP subnet: 192.168.122.0/255.255.255.0
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 101020230 vendor class: PXEClient:Arch:00007:UNDI:003001
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 available DHCP subnet: 192.168.122.0/255.255.255.0
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 vendor class: PXEClient:Arch:00007:UNDI:003001
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 PXE(enp1s0) 52:54:00:84:b1:75 proxy
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 tags: uefi, enp1s0
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 bootfile name: grub2/shimx64-redhat.efi
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 server name: 192.168.122.195
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 next server: 192.168.122.195
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 sent size:  1 option: 53 message-type  5
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 sent size:  4 option: 54 server-identifier  192.168.122.195
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Feb 25 22:12:42 pxe8.example.com dnsmasq-dhcp[3502]: 3887940701 sent size: 17 option: 97 client-machine-id  00:b4:46:3c:7e:d8:d0:49:43:a2:d2:96:6e:a2...
Feb 25 22:12:43 pxe8.example.com dnsmasq-tftp[3502]: error 8 User aborted the transfer received from 192.168.122.94
Feb 25 22:12:43 pxe8.example.com dnsmasq-tftp[3502]: failed sending /var/lib/tftpboot/grub2/shimx64-redhat.efi to 192.168.122.94
Feb 25 22:12:43 pxe8.example.com dnsmasq-tftp[3502]: sent /var/lib/tftpboot/grub2/shimx64-redhat.efi to 192.168.122.94
Feb 25 22:12:43 pxe8.example.com dnsmasq-tftp[3502]: sent /var/lib/tftpboot/grub2/grubx64.efi to 192.168.122.94
"""

Initially the /var/lib/tftpboot/grub2/shimx64-redhat.efi fails to send, but I believe that is just because they have to establish the blocksize first, once that is established the shim and grub files are sent.

NOTE: the logs above are using shimx64-redhat.efi but it fails the same way with shim.efi.

I'm including screenshots of the guest trying to get the Netboot image, and dropping to the grub prompt.

Let me know what additional information may be helpful to gather.

- Jacob

Comment 4 Jacob Hunt 2020-03-02 17:20:13 UTC
Created attachment 1667026 [details]
UEFI guest trying to get the netboot image

Comment 5 Jacob Hunt 2020-03-02 17:21:07 UTC
Created attachment 1667027 [details]
UEFI guest drops to the grub prompt

Comment 7 Christophe Besson 2020-04-07 15:11:51 UTC
A customer is also experiencing the same issue. Here are my tests, I can provide pcap if needed.

Using DNSmasq as a proxy DHCP in a PXE context in order to define the TFTP server and the boot filename behaves differently. 
In BIOS mode, the boot menu is displayed and then the system boots successfully.
In EFI mode, grubx64.efi is loaded, but the NBP is NOT able to download its configuration through TFTP (grub.cfg), so it leads to the grub prompt.

From my (little) understanding, in case of network boot, grub defines its "$root" device thanks to the information contained in the DHCP ACK. Dnsmasq replies a DHCP ACK only in BIOS mode.



=== REPRODUCER ===

Dedicated/Isolated network => 10.115.16.0/20
DHCPD SERVER => 10.115.16.3
DNSMASQ SERVER => 10.115.16.4
GW + HTTPD SERVER => 10.115.16.1
EFI PXE CLIENT => 10.115.16.100

# tree /var/lib/tftpboot
/var/lib/tftpboot
├── bios
│   ├── ldlinux.c32
│   ├── libcom32.c32
│   ├── libutil.c32
│   ├── pxelinux.0
│   ├── pxelinux.cfg
│   │   └── default
│   ├── rhel8
│   │   ├── initrd.img
│   │   └── vmlinuz
│   └── vesamenu.c32
└── uefi
    ├── grub.cfg
    ├── grubx64.efi
    └── rhel8
        ├── initrd.img
        └── vmlinuz

# cat /var/lib/tftpboot/uefi/grub.cfg 
set timeout=20
menuentry 'Install Red Hat Enterprise Linux 8.1.0' --class fedora --class gnu-linux --class gnu --class os {
	linuxefi /uefi/rhel8/vmlinuz inst.repo=http://10.115.16.1/rhel81
	initrdefi /uefi/rhel8/initrd.img
}

# cat /var/lib/tftpboot/bios/pxelinux.cfg/default 
default vesamenu.c32
timeout 20
label RHEL8
  menu label ^Install Red Hat Enterprise Linux 8
  kernel rhel8/vmlinuz
  append initrd=rhel8/initrd.img inst.repo=http://10.115.16.1/rhel81



=== [CASE #1] EFI PXE with DHCP proxy ===

# cat /etc/dhcp/dhcpd.conf
subnet 10.115.16.0 netmask 255.255.240.0 {
	option routers 10.115.16.1;
	range 10.115.16.100 10.115.16.110;
}

# cat /etc/dnsmasq.conf
server=192.168.122.1
user=dnsmasq
group=dnsmasq
bind-interfaces
dhcp-range=10.115.16.0,proxy
dhcp-vendorclass=BIOS,PXEClient:Arch:00000
dhcp-vendorclass=UEFI,PXEClient:Arch:00007
dhcp-vendorclass=UEFI64,PXEClient:Arch:00009
pxe-service=x86PC, "Install Red Hat Enterprise Linux", bios/pxelinux.0
pxe-service=x86-64_EFI, "Boot UEFI PXE-64", uefi/grubx64.efi
pxe-service=BC_EFI, "Boot UEFI BC PXE-64", uefi/grubx64.efi
enable-tftp
tftp-root=/var/lib/tftpboot
tftp-no-fail
tftp-secure
log-dhcp
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig

# DHCPD logs on 10.115.16.3
Apr  7 09:30:50 infra dhcpd: DHCPDISCOVER from 52:54:00:f4:95:78 via ens10
Apr  7 09:30:51 infra dhcpd: DHCPOFFER on 10.115.16.100 to 52:54:00:f4:95:78 via ens10
Apr  7 09:30:58 infra dhcpd: DHCPREQUEST for 10.115.16.100 (10.115.16.3) from 52:54:00:f4:95:78 via ens10
Apr  7 09:30:58 infra dhcpd: DHCPACK on 10.115.16.100 to 52:54:00:f4:95:78 via ens10

# Capture from 10.115.16.3 (efi-dhcp-server-withproxy.pcap)
  1          0      0.0.0.0 -> 255.255.255.255 DHCP 389 DHCP Discover - Transaction ID 0x692f60a9
  2          0  10.115.16.3 -> 10.115.16.100 ICMP 62 Echo (ping) request  id=0x16fe, seq=0/0, ttl=64
  3          0  10.115.16.4 -> 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x692f60a9
  4          1  10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x692f60a9
---[cut]---
  8          8      0.0.0.0 -> 255.255.255.255 DHCP 401 DHCP Request  - Transaction ID 0x692f60a9
  9          8  10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x692f60a9

# DNSMASQ logs on 10.115.16.4
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 available DHCP subnet: 10.115.16.0/255.255.240.0
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 vendor class: PXEClient:Arch:00007:UNDI:003001
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 PXE(ens11) 52:54:00:f4:95:78 proxy
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 tags: UEFI, ens11
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 next server: 10.115.16.4
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 broadcast response
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 sent size:  1 option: 53 message-type  2
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 sent size:  4 option: 54 server-identifier  10.115.16.4
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Apr  7 09:30:50 foo dnsmasq-dhcp[9108]: 1764712617 sent size: 17 option: 97 client-machine-id  00:25:00:2c:72:d9:67:36:49:bf:57:77:ea:97...
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 1764712617 available DHCP subnet: 10.115.16.0/255.255.240.0
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 1764712617 vendor class: PXEClient:Arch:00007:UNDI:003001
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 available DHCP subnet: 10.115.16.0/255.255.240.0
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 vendor class: PXEClient:Arch:00007:UNDI:003001
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 PXE(ens11) 52:54:00:f4:95:78 proxy
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 tags: UEFI, ens11
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 bootfile name: uefi/grubx64.efi
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 server name: 10.115.16.4
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 next server: 10.115.16.4
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 sent size:  1 option: 53 message-type  5
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 sent size:  4 option: 54 server-identifier  10.115.16.4
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Apr  7 09:30:58 foo dnsmasq-dhcp[9108]: 3225508914 sent size: 17 option: 97 client-machine-id  00:25:00:2c:72:d9:67:36:49:bf:57:77:ea:97...
Apr  7 09:30:59 foo dnsmasq-tftp[9108]: error 8 User aborted the transfer received from 10.115.16.100
Apr  7 09:30:59 foo dnsmasq-tftp[9108]: failed sending /var/lib/tftpboot/uefi/grubx64.efi to 10.115.16.100
Apr  7 09:30:59 foo dnsmasq-tftp[9108]: sent /var/lib/tftpboot/uefi/grubx64.efi to 10.115.16.100

# Capture from 10.115.16.4 (efi-dhcp-proxy.pcap)
    1   0.000000      0.0.0.0 → 255.255.255.255 DHCP 389 DHCP Discover - Transaction ID 0x692f60a9
    2   0.000275  10.115.16.3 → 10.115.16.100 ICMP 62 Echo (ping) request  id=0x16fe, seq=0/0, ttl=64
    3   0.000433  10.115.16.4 → 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x692f60a9
    4   1.001498  10.115.16.3 → 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x692f60a9
---[cut]---
    8   8.000157      0.0.0.0 → 255.255.255.255 DHCP 401 DHCP Request  - Transaction ID 0x692f60a9
    9   8.011987  10.115.16.3 → 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x692f60a9
---[cut]---
   12   8.014242 10.115.16.100 → 10.115.16.4  UDP 389 4011 → 4011 Len=347
   13   8.014462  10.115.16.4 → 10.115.16.100 UDP 342 4011 → 4011 Len=300
---[cut]---
   16   9.000971 10.115.16.100 → 10.115.16.4  TFTP 88 Read Request, File: uefi/grubx64.efi, Transfer type: octet, tsize=0, blksize=1468
   17   9.001146  10.115.16.4 → 10.115.16.100 TFTP 71 Option Acknowledgement, blksize=1468, tsize=1168848
   18   9.001422 10.115.16.100 → 10.115.16.4  TFTP 72 Error Code, Code: Option negotiation failed, Message: User aborted the transfer
   19   9.002836 10.115.16.100 → 10.115.16.4  TFTP 80 Read Request, File: uefi/grubx64.efi, Transfer type: octet, blksize=1468
   20   9.002902  10.115.16.4 → 10.115.16.100 TFTP 57 Option Acknowledgement, blksize=1468
   21   9.003205 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
   22   9.003514  10.115.16.4 → 10.115.16.100 TFTP 1514 Data Packet, Block: 1

=> leads to grub prompt. The $root variable isn't defined so grub.cfg can NOT be downloaded. Workaround:
   grub> set root=tftp,10.115.16.4
   grub> configfile uefi/grub.cfg



=== [CASE #2] EFI PXE with standard DHCP server (no proxy) ===

# Note: specific DNSMASQ and DHCPD configs for this test.

# cat /etc/dhcp/dhcpd.conf 
option architecture-type code 93 = unsigned integer 16;
subnet 10.115.16.0 netmask 255.255.240.0 {
	option routers 10.115.16.1;
	range 10.115.16.100 10.115.16.110;
	next-server 10.115.16.4;
	if option architecture-type = 00:07 {
	    filename "uefi/grubx64.efi";
	} else {
	    filename "bios/pxelinux.0";
	}
}

# cat /etc/dnsmasq.conf
server=192.168.122.1
user=dnsmasq
group=dnsmasq
bind-interfaces
enable-tftp
tftp-root=/var/lib/tftpboot
tftp-no-fail
tftp-secure
log-dhcp
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig

# DHCPD logs on 10.115.16.3
Apr  7 10:12:48 infra dhcpd: DHCPDISCOVER from 52:54:00:f4:95:78 via ens10
Apr  7 10:12:49 infra dhcpd: DHCPOFFER on 10.115.16.100 to 52:54:00:f4:95:78 via ens10
Apr  7 10:12:56 infra dhcpd: DHCPREQUEST for 10.115.16.100 (10.115.16.3) from 52:54:00:f4:95:78 via ens10
Apr  7 10:12:56 infra dhcpd: DHCPACK on 10.115.16.100 to 52:54:00:f4:95:78 via ens10

# Capture from 10.115.16.3 (efi-dhcp-server.pcap)
  1          0      0.0.0.0 -> 255.255.255.255 DHCP 389 DHCP Discover - Transaction ID 0x6a266dcf
  2          0  10.115.16.3 -> 10.115.16.100 ICMP 62 Echo (ping) request  id=0x2c1d, seq=0/0, ttl=64
  3          1  10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x6a266dcf
---[cut]---
  7          8      0.0.0.0 -> 255.255.255.255 DHCP 401 DHCP Request  - Transaction ID 0x6a266dcf
  8          8  10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x6a266dcf

# DNSMASQ logs on 10.115.16.4
Apr  7 10:12:56 foo dnsmasq-tftp[9137]: error 8 User aborted the transfer received from 10.115.16.100
Apr  7 10:12:56 foo dnsmasq-tftp[9137]: failed sending /var/lib/tftpboot/uefi/grubx64.efi to 10.115.16.100
Apr  7 10:12:56 foo dnsmasq-tftp[9137]: sent /var/lib/tftpboot/uefi/grubx64.efi to 10.115.16.100
Apr  7 10:12:56 foo dnsmasq-tftp[9137]: file /var/lib/tftpboot/uefi/grub.cfg-01-52-54-00-f4-95-78 not found
Apr  7 10:12:56 foo dnsmasq-tftp[9137]: file /var/lib/tftpboot/uefi/grub.cfg-0A731064 not found
---[cut]---
Apr  7 10:12:56 foo dnsmasq-tftp[9137]: sent /var/lib/tftpboot/uefi/grub.cfg to 10.115.16.100
Apr  7 10:13:01 foo dnsmasq-tftp[9137]: sent /var/lib/tftpboot/uefi/rhel8/vmlinuz to 10.115.16.100
Apr  7 10:13:12 foo dnsmasq-tftp[9137]: sent /var/lib/tftpboot/uefi/rhel8/initrd.img to 10.115.16.100

# Capture from 10.115.16.4 (efi-dhcp-noproxy.pcap)
    1   0.000000      0.0.0.0 → 255.255.255.255 DHCP 389 DHCP Discover - Transaction ID 0x6a266dcf
    2   1.001582  10.115.16.3 → 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x6a266dcf
    3   8.000237      0.0.0.0 → 255.255.255.255 DHCP 401 DHCP Request  - Transaction ID 0x6a266dcf
    4   8.012047  10.115.16.3 → 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x6a266dcf
    5   8.013286 RealtekU_f4:95:78 → Broadcast    ARP 60 Who has 10.115.16.4? Tell 10.115.16.100
    6   8.013293 RealtekU_e0:e4:bc → RealtekU_f4:95:78 ARP 42 10.115.16.4 is at 52:54:00:e0:e4:bc
    7   8.013480 10.115.16.100 → 10.115.16.4  TFTP 88 Read Request, File: uefi/grubx64.efi, Transfer type: octet, tsize=0, blksize=1468
    8   8.013693  10.115.16.4 → 10.115.16.100 TFTP 71 Option Acknowledgement, blksize=1468, tsize=1168848
    9   8.013891 10.115.16.100 → 10.115.16.4  TFTP 72 Error Code, Code: Option negotiation failed, Message: User aborted the transfer
   10   8.015288 10.115.16.100 → 10.115.16.4  TFTP 80 Read Request, File: uefi/grubx64.efi, Transfer type: octet, blksize=1468
   11   8.015352  10.115.16.4 → 10.115.16.100 TFTP 57 Option Acknowledgement, blksize=1468
   12   8.015562 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
---[cut]---
 1627   8.258732 10.115.16.100 → 10.115.16.4  TFTP 86 Read Request, File: /uefi/grub.cfg, Transfer type: octet, blksize=1024, tsize=0
 1628   8.258776  10.115.16.4 → 10.115.16.100 TFTP 67 Option Acknowledgement, blksize=1024, tsize=226
 1629   8.258907 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
 1630   8.258930  10.115.16.4 → 10.115.16.100 TFTP 272 Data Packet, Block: 1 (last)
 1631   8.259069 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 1
---[cut]---
 1645  12.218699 10.115.16.100 → 10.115.16.4  TFTP 91 Read Request, File: /uefi/rhel8/vmlinuz, Transfer type: octet, blksize=1024, tsize=0
 1646  12.218862  10.115.16.4 → 10.115.16.100 TFTP 71 Option Acknowledgement, blksize=1024, tsize=8106848
 1647  12.219094 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
 1648  12.219142  10.115.16.4 → 10.115.16.100 TFTP 1070 Data Packet, Block: 1
---[cut]---
17484  13.399606 10.115.16.100 → 10.115.16.4  TFTP 94 Read Request, File: /uefi/rhel8/initrd.img, Transfer type: octet, blksize=1024, tsize=0
17485  13.399755  10.115.16.4 → 10.115.16.100 TFTP 72 Option Acknowledgement, blksize=1024, tsize=62248424
17486  13.399997 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
17487  13.400099  10.115.16.4 → 10.115.16.100 TFTP 1070 Data Packet, Block: 1

=> The system boots without issue.


=== [CASE #3] BIOS PXE with DHCP proxy ===

# Note: using same DNSMASQ and DHCPD config than the EFI mode.

# DHCPD logs on 10.115.16.3
Apr  7 09:11:28 infra dhcpd: DHCPDISCOVER from 52:54:00:10:f8:13 via ens10
Apr  7 09:11:29 infra dhcpd: DHCPOFFER on 10.115.16.100 to 52:54:00:10:f8:13 via ens10
Apr  7 09:11:29 infra dhcpd: DHCPREQUEST for 10.115.16.100 (10.115.16.3) from 52:54:00:10:f8:13 via ens10
Apr  7 09:11:29 infra dhcpd: DHCPACK on 10.115.16.100 to 52:54:00:10:f8:13 via ens10

# Capture from 10.115.16.3 (bios-dhcp-server.pcap)
  1          0      0.0.0.0 -> 255.255.255.255 DHCP 442 DHCP Discover - Transaction ID 0xa1d9ba29
  2          0  10.115.16.3 -> 10.115.16.100 ICMP 62 Echo (ping) request  id=0x73fd, seq=0/0, ttl=64
  3          0  10.115.16.4 -> 255.255.255.255 DHCP 380 DHCP Offer    - Transaction ID 0xa1d9ba29
  4          1  10.115.16.3 -> 10.115.16.100 DHCP 342 DHCP Offer    - Transaction ID 0xa1d9ba29
  5          1      0.0.0.0 -> 255.255.255.255 DHCP 454 DHCP Request  - Transaction ID 0xa1d9ba29
  6          1  10.115.16.3 -> 10.115.16.100 DHCP 342 DHCP ACK      - Transaction ID 0xa1d9ba29

# DNSMASQ logs on 10.115.16.4
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 available DHCP subnet: 10.115.16.0/255.255.240.0
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 vendor class: PXEClient:Arch:00000:UNDI:002001
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 user class: iPXE
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 PXE(ens11) 52:54:00:10:f8:13 proxy
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 tags: BIOS, ens11
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 broadcast response
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 sent size:  1 option: 53 message-type  2
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 sent size:  4 option: 54 server-identifier  10.115.16.4
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 sent size: 17 option: 97 client-machine-id  00:73:19:0c:fe:5b:ca:30:4a:92:e5:57:0c:65...
Apr  7 09:11:28 foo dnsmasq-dhcp[8991]: 2715400745 sent size: 56 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 2715400745 available DHCP subnet: 10.115.16.0/255.255.240.0
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 2715400745 vendor class: PXEClient:Arch:00000:UNDI:002001
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 2715400745 user class: iPXE
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 available DHCP subnet: 10.115.16.0/255.255.240.0
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 vendor class: PXEClient
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 user class: iPXE
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 PXE(ens11) 10.115.16.100 52:54:00:10:f8:13 bios/pxelinux.0
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 tags: ens11
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 bootfile name: bios/pxelinux.0
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 next server: 10.115.16.4
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 broadcast response
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 sent size:  1 option: 53 message-type  5
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 sent size:  4 option: 54 server-identifier  10.115.16.4
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 sent size: 17 option: 97 client-machine-id  00:73:19:0c:fe:5b:ca:30:4a:92:e5:57:0c:65...
Apr  7 09:11:29 foo dnsmasq-dhcp[8991]: 0 sent size:  7 option: 43 vendor-encap  47:04:80:00:00:00:ff
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/pxelinux.0 to 10.115.16.100
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/ldlinux.c32 to 10.115.16.100
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: file /var/lib/tftpboot/bios/pxelinux.cfg/73190cfe-5bca-304a-92e5-570c65180589 not found
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: file /var/lib/tftpboot/bios/pxelinux.cfg/01-52-54-00-10-f8-13 not found
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: file /var/lib/tftpboot/bios/pxelinux.cfg/0A731064 not found
---[cut]---
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/pxelinux.cfg/default to 10.115.16.100
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/vesamenu.c32 to 10.115.16.100
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/libcom32.c32 to 10.115.16.100
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/libutil.c32 to 10.115.16.100
Apr  7 09:11:29 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/pxelinux.cfg/default to 10.115.16.100
Apr  7 09:11:32 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/rhel8/vmlinuz to 10.115.16.100
Apr  7 09:11:39 foo dnsmasq-tftp[8991]: sent /var/lib/tftpboot/bios/rhel8/initrd.img to 10.115.16.100

# Capture from 10.115.16.4 (bios-dhcp-proxy.pcap)
    1   0.000000      0.0.0.0 → 255.255.255.255 DHCP 442 DHCP Discover - Transaction ID 0xa1d9ba29
    2   0.000193  10.115.16.4 → 255.255.255.255 DHCP 380 DHCP Offer    - Transaction ID 0xa1d9ba29
    3   1.001801      0.0.0.0 → 255.255.255.255 DHCP 454 DHCP Request  - Transaction ID 0xa1d9ba29
---[cut]---
    7   1.045659 10.115.16.100 → 10.115.16.4  DHCP 428 DHCP Request  - Transaction ID 0x0
    8   1.045840  10.115.16.4 → 10.115.16.100 DHCP 342 DHCP ACK      - Transaction ID 0x0
    9   1.061819 10.115.16.100 → 10.115.16.4  TFTP 87 Read Request, File: bios/pxelinux.0, Transfer type: octet, blksize=1432, tsize=0
   10   1.061976  10.115.16.4 → 10.115.16.100 TFTP 69 Option Acknowledgement, blksize=1432, tsize=42821
---[cut]---
  241   1.116588 10.115.16.100 → 10.115.16.4  TFTP 126 Read Request, File: bios/pxelinux.cfg/73190cfe-5bca-304a-92e5-570c65180589, Transfer type: octet, tsize=0, blksize=1408
  242   1.116669  10.115.16.4 → 10.115.16.100 TFTP 134 Error Code, Code: File not found, Message: file /var/lib/tftpboot/bios/pxelinux.cfg/73190cfe-5bca-304a-92e5-570c65180589 not found
  243   1.116802 10.115.16.100 → 10.115.16.4  TFTP 110 Read Request, File: bios/pxelinux.cfg/01-52-54-00-10-f8-13, Transfer type: octet, tsize=0, blksize=1408
  244   1.116859  10.115.16.4 → 10.115.16.100 TFTP 118 Error Code, Code: File not found, Message: file /var/lib/tftpboot/bios/pxelinux.cfg/01-52-54-00-10-f8-13 not found
  245   1.117000 10.115.16.100 → 10.115.16.4  TFTP 98 Read Request, File: bios/pxelinux.cfg/0A731064, Transfer type: octet, tsize=0, blksize=1408
  246   1.117063  10.115.16.4 → 10.115.16.100 TFTP 106 Error Code, Code: File not found, Message: file /var/lib/tftpboot/bios/pxelinux.cfg/0A731064 not found
---[cut]---
  261   1.118376 10.115.16.100 → 10.115.16.4  TFTP 97 Read Request, File: bios/pxelinux.cfg/default, Transfer type: octet, tsize=0, blksize=1408
  262   1.118414  10.115.16.4 → 10.115.16.100 TFTP 67 Option Acknowledgement, blksize=1408, tsize=186
  263   1.118536 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
  264   1.118564  10.115.16.4 → 10.115.16.100 TFTP 232 Data Packet, Block: 1 (last)
---[cut]---
  612   3.249780 10.115.16.100 → 10.115.16.4  TFTP 90 Read Request, File: bios/rhel8/vmlinuz, Transfer type: octet, tsize=0, blksize=1408
  613   3.250217  10.115.16.4 → 10.115.16.100 TFTP 71 Option Acknowledgement, blksize=1408, tsize=8106848
  614   3.251178 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
---[cut]---
12131   4.156249 10.115.16.100 → 10.115.16.4  TFTP 93 Read Request, File: bios/rhel8/initrd.img, Transfer type: octet, tsize=0, blksize=1408
12132   4.156328  10.115.16.4 → 10.115.16.100 TFTP 72 Option Acknowledgement, blksize=1408, tsize=62248424
12133   4.156477 10.115.16.100 → 10.115.16.4  TFTP 60 Acknowledgement, Block: 0
12134   4.156515  10.115.16.4 → 10.115.16.100 TFTP 1454 Data Packet, Block: 1

=> The system boots without issue.


=== Main differences between BIOS and EFI ===

* In both cases with the proxy, we have 2 DHCP offers, one from DHCPD with IP & GW, and one from DNSMASQ with the next-server IP (no boot filename).

* In BIOS mode, we can see two DHCP ACK, from both services:
10.115.16.3 -> 10.115.16.100 DHCP 342 DHCP ACK      - Transaction ID 0xa1d9ba29
10.115.16.4 → 10.115.16.100 DHCP 342 DHCP ACK      - Transaction ID 0x0

* In BIOS mode, the DHCP offer doesn't contain the next-server, but a specific option 43:
    Option: (43) Vendor-Specific Information (PXEClient)
        Length: 56
        Option 43 Suboption: (6) PXE discovery control
            Length: 1
            discovery control: 0x03
        Option 43 Suboption: (10) PXE menu prompt
            Length: 4
            menu prompt: 00505845
        Option 43 Suboption: (8) PXE boot servers
            Length: 7
            boot servers: 8000010a731004
        Option 43 Suboption: (9) PXE boot menu
            Length: 35
            boot menu: 800020496e7374616c6c205265642048617420456e746572...
        PXE Client End: 255

* In BIOS mode, dnsmasq sends a DHCP ACK which contains all the information needed by grub.
Internet Protocol Version 4, Src: 10.115.16.4, Dst: 10.115.16.100
---[cut]---
User Datagram Protocol, Src Port: 4011, Dst Port: 68
---[cut]---
Bootstrap Protocol (ACK)
    Message type: Boot Reply (2)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0x00000000
    Seconds elapsed: 4
    Bootp flags: 0x8000, Broadcast flag (Broadcast)
        1... .... .... .... = Broadcast flag: Broadcast
        .000 0000 0000 0000 = Reserved flags: 0x0000
    Client IP address: 0.0.0.0
    Your (client) IP address: 10.115.16.100
    Next server IP address: 10.115.16.4
    Relay agent IP address: 0.0.0.0
    Client MAC address: RealtekU_10:f8:13 (52:54:00:10:f8:13)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name: bios/pxelinux.0
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (ACK)
        Length: 1
        DHCP: ACK (5)
    Option: (54) DHCP Server Identifier
        Length: 4
        DHCP Server Identifier: 10.115.16.4


* In EFI mode, we can only see one DHCP ACK, from DHCPD in broadcast:
10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x692f60a9

* In EFI mode, we have a communication on port udp/4011 where the proxy DHCP seems to give the boot filename:
10.115.16.100 → 10.115.16.4  UDP 389 4011 → 4011 Len=347
10.115.16.4 → 10.115.16.100 UDP 342 4011 → 4011 Len=300

User Datagram Protocol, Src Port: 4011, Dst Port: 4011
    Source Port: 4011
    Destination Port: 4011
    Length: 308
    Checksum: 0x3693 [unverified]
    [Checksum Status: Unverified]
    [Stream index: 3]
Data (300 bytes)
0000  02 01 06 00 c0 41 5c 32 00 00 00 00 0a 73 10 64   .....A\2.....s.d
0010  00 00 00 00 0a 73 10 04 00 00 00 00 52 54 00 f4   .....s......RT..
0020  95 78 00 00 00 00 00 00 00 00 00 00 31 30 2e 31   .x..........10.1
0030  31 35 2e 31 36 2e 34 00 00 00 00 00 00 00 00 00   15.16.4.........
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0060  00 00 00 00 00 00 00 00 00 00 00 00 75 65 66 69   ............uefi
0070  2f 67 72 75 62 78 36 34 2e 65 66 69 00 00 00 00   /grubx64.efi....


=== Grub code ===

After a look at the grub code,
=> file grub-2.02~beta2/grub-x86_64-efi-2.02~beta2/grub-core/net/efi/net.c, 
=> function grub_efi_net_config_from_handle():
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 362       pxe_get_boot_location (
 363                 (const struct grub_net_bootp_packet *) &pxe->mode->dhcp_ack,
 364                 device,
 365                 path,
 366                 1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The function pxe_get_boot_location() is called to define the variable device (which corresponds to the $root variable in the grub prompt). This is parsed on the DHCP ACK packet and this packet doesn't contain the next server field in EFI mode.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 178 pxe_get_boot_location (const struct grub_net_bootp_packet *bp,
 179                   char **device,
 180                   char **path,
 181                   int is_default)
 182 { 
 183   char *server = grub_xasprintf ("%d.%d.%d.%d",
 184              ((grub_uint8_t *) &bp->server_ip)[0],
 185              ((grub_uint8_t *) &bp->server_ip)[1],
 186              ((grub_uint8_t *) &bp->server_ip)[2],
 187              ((grub_uint8_t *) &bp->server_ip)[3]);
 188   
 189   *device = grub_xasprintf ("tftp,%s", server);
 190   
 191   *path = grub_strndup (bp->boot_file, sizeof (bp->boot_file));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment 10 Javier Martinez Canillas 2020-05-21 13:02:19 UTC
Hello Tomas,

This is a known limitation of GRUB, it does not support a proxy DHCP configuration. It just looks at the main DHCP server ACK to get
the bootfile and next-server information. There's a bug filed upstream about this [0].

The UEFI environment (tianocore/edk2) provides information about the proxy DHCP offer in EFI_PXE_BASE_CODE_MODE::ProxyOfferReceived
and EFI_PXE_BASE_CODE_MODE::ProxyOffer [1], but GRUB just ignores that. Shim does parse that info since commit [2] and that's why
is able to fetch the GRUB binary correctly. But since GRUB doesn't support that, it fails to fetch the config file over TFTP.

I think that dnsmasq proxy DHCP support is working correctly here, otherwise the client wouldn't had fetched the GRUB binary over
TFTP. So either the component of this bugzilla should be moved to grub2 or be closed an a new RFE be filed for grub2.

[0]: https://savannah.gnu.org/bugs/index.php?55636
[1]: https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Protocol/PxeBaseCode.h#L266
[2]: https://github.com/rhboot/shim/commit/5f4fd536410

Comment 17 Ian Hands 2021-04-02 23:07:57 UTC
(In reply to Christophe Besson from comment #7)
> # Capture from 10.115.16.3 (efi-dhcp-server-withproxy.pcap)
>   1          0      0.0.0.0 -> 255.255.255.255 DHCP 389 DHCP Discover -
> Transaction ID 0x692f60a9
>   2          0  10.115.16.3 -> 10.115.16.100 ICMP 62 Echo (ping) request 
> id=0x16fe, seq=0/0, ttl=64
>   3          0  10.115.16.4 -> 255.255.255.255 DHCP 342 DHCP Offer    -
> Transaction ID 0x692f60a9
>   4          1  10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP Offer    -
> Transaction ID 0x692f60a9
> ---[cut]---
>   8          8      0.0.0.0 -> 255.255.255.255 DHCP 401 DHCP Request  -
> Transaction ID 0x692f60a9
>   9          8  10.115.16.3 -> 255.255.255.255 DHCP 342 DHCP ACK      -
> Transaction ID 0x692f60a9


I am also hitting this issue _but_ I think my pcap looks somewhat different.
There are two cases I observe:

#######
Case #1 where the primary DHCP server does not set a "Next server IP address" (or 0.0.0.0).
I need to do another pcap but I dont think I ever see grub do that 8th 9th frame (in your example) Req/ACK.
I see a request, offers, proxy offers... all this is while shim is doing its thing. I see tftp load grub, then things are silent.


#######
Case #2 where my primary DHCP server _does_ set its own IP as the "Next server IP address"
I observed this in the car where I dont control the DHCP Im using when tehtered.

Here I see the same as above while in shim. I see offers, primary DHCP hands back an IP but not bootp info.
Proxy DHCP requests properly do their thing. I see shim use this info to load the grub binary over tftp.

Then grub loads and starts using the primary DHCP next server ip to try and fetch grub.cfg.

You can see it happening in the attached pcap.
Shim loads grubx64.efi from 192.168.183.142
Then immediately after grub tries to load grub.cfg from 192.168.183.171

Comment 18 Ian Hands 2021-04-02 23:08:25 UTC
Created attachment 1768672 [details]
Case number two pcap

Comment 19 Ian Hands 2021-04-02 23:55:51 UTC
Created attachment 1768677 [details]
Case number one pcap

Comment 20 Ian Hands 2021-04-02 23:57:10 UTC
I have attached a pcap for case #1.
And yes I see that after grub takes over there is no Req/ACK that happens at all.
Grub just stops cold and drops us to rescue.

Comment 21 Ian Hands 2021-04-05 18:32:17 UTC
I made this PR. https://github.com/rhboot/grub2/pull/82
As far as I see if we don't have a valid server_ip in the dchp_ack we should just flat out assume that we got here from the proxy_offer...
Because how else would we have gotten to this point.

And thus a merge of the proxy_offer struct with the dchp_ack struct as it is about to be shipped off to configure_by_dhcp_ack seems appropriate.

You could instead send both structs into configure_by_dhcp_ack and merge them there. This requires touching way more code paths though.

Also the patch/PR could be way more defensive and only copy over stuff if its also not empty (0.0.0.0) and do three checks (one for each item it is copying).

Thoughts?

Comment 22 Ian Page Hands 2021-04-21 22:08:55 UTC
(In reply to Javier Martinez Canillas from comment #10)
> I think that dnsmasq proxy DHCP support is working correctly here, otherwise
> the client wouldn't had fetched the GRUB binary over
> TFTP. So either the component of this bugzilla should be moved to grub2 or
> be closed an a new RFE be filed for grub2.

The component of this BZ is already grub2. Is there anything else we need to do to move this forward?
Let me know how we can push to get this resolved.

Comment 23 Ian Page Hands 2021-04-21 22:15:44 UTC
The big PITA here is that we can't self service. I can patch my way out of this issue sure, but thanks to secure boot we are unable to boot with our opensource change.
Red Hat is one of the trusted vendors that fought for the shim in the chain loader just so the OSS world would be able to self service here. Yet this is going unresolved with open source contributions.

Can we take a hard look here please?
Again the urgency is different from a normal fix as the open source world can't self solve this.

Im happy to devote some extra cycles to clean up patches or explore a different approach, but silence here stinks.

Comment 25 Ian Page Hands 2021-04-22 21:30:45 UTC
FWIW I would also be happy to use a signed Fedora build.
Fedora's grub builds are signed as well eh?

Comment 26 Petr Janda 2021-05-05 13:08:43 UTC
There are other more priority bugs, setting RPL-
This doesn't mean we cannot fix it, but we need to focus on other issues now.

Comment 27 josh.schofield 2021-05-05 15:05:34 UTC
Thank you for that response @pjanda. As you mentioned other items are higher priority, is there any estimation on when this may move from the backlog?

Comment 28 Ian Page Hands 2021-05-05 19:39:06 UTC
(In reply to Petr Janda from comment #26)
> There are other more priority bugs, setting RPL-
> This doesn't mean we cannot fix it, but we need to focus on other issues now.

I mean... Im willing to work on the issue here. You turning down help?

Comment 29 Javier Martinez Canillas 2021-05-05 20:45:08 UTC
Hello Ian,

(In reply to Ian Page Hands from comment #28)
> (In reply to Petr Janda from comment #26)
> > There are other more priority bugs, setting RPL-
> > This doesn't mean we cannot fix it, but we need to focus on other issues now.
> 
> I mean... Im willing to work on the issue here. You turning down help?

No, we really appreciate your effort to investigate this issue and the patch
you proposed. But please understand that we have many other bugs and RFEs to
work on and limited capacity.

Even when you are proposing the patch, we need to review the changes, validate
that won't cause regression for other use cases / customers, do proper QE, etc.

As Petr said setting RPL- doesn't mean that this bug won't work on this bug for
this release, but just that it won't be in the Release Priority List (RPL) due
other bugs having higher priority for different reasons.

I hope this could help to clarify it. Again, thanks a lot for your pull-request
and we will try to review it as soon as possible.

Best regards,
Javier

Comment 30 Javier Martinez Canillas 2021-06-01 11:14:54 UTC
I'm able to reproduce the issue by following Christophe's Comment 7. I'll take a look to Ian's proposed patch.

Comment 31 Javier Martinez Canillas 2021-06-01 12:43:16 UTC
(In reply to Ian Hands from comment #21)
> I made this PR. https://github.com/rhboot/grub2/pull/82

Thanks for the patch! I had a few comments, let's move the conversation
there and once we agreed on a patch, we can pull it for Fedora and just
use this BZ to track the RHEL 8 backporting effort.

Comment 46 errata-xmlrpc 2022-05-10 15:31:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: grub2 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:2110


Note You need to log in before you can comment on or make changes to this bug.