Created attachment 676801 [details] wireshark-capture.pcap Description of problem: UEFI PXE boot fails and falls back to command line. Packet capture shows it successfully loaded grub.cfg and a few files. Then it sends two malformed packets of 60 zero bytes and no more packets are sent. There are icmpv6 messages like router solicitation and listener report. Version-Release number of selected component (if applicable): grub2-efi-2.00-15.fc18 with tftp module built in. How reproducible: Always. Reproduced on Dell XPS 8500 (Firmware: EFI v2.31, likely a EDK2 mod), capable of IPv4/6 PXE. Steps to Reproduce: 1. Configure DHCP and TFTP server 2. Boot Actual results: UEFI PXE boot fails and falls back to command line. Expected results: It boots into a menu. Additional info: The attached wireshark-capture.pcap contains the all packets on the wire (except tftp.block > 1). I've been quite curious about how the zero bytes packets were generated. Adding random link delay with netem won't change the number and order of malformed packets. Malformed packets are sent after certain tftp requests in a deterministic manner. I added a printf in net/drivers/efi/efinet.c: static grub_err_t send_card_buffer (struct grub_net_card *dev, struct grub_net_buff *pack) { ... if (dev->txbusy) while (1) { void *txbuf = NULL; st = efi_call_3 (net->get_status, net, 0, &txbuf); if (st != GRUB_EFI_SUCCESS) return grub_error (GRUB_ERR_IO, N_("couldn't send network packet")); if (txbuf == dev->txbuf) { dev->txbusy = 0; break; } if (txbuf) grub_printf("not my txbuf: txbuf=%p dev->txbuf=%p\n", txbuf, dev->txbuf); if (limit_time < grub_get_time_ms ()) return grub_error (GRUB_ERR_TIMEOUT, N_("couldn't send network packet")); } I got "not my txbuf: txbuf=0xd237a698 dev->txbuf=0xce9b6160" before falling back into command line. I hasn't seen txbuf value other than 0xd237a698, but dev->txbuf sometimes changes. My guess is that the firmware's ipv6 stack sending icmpv6 messages gets into a race condition with GetStatus by grub's efinet. GetStatus in SNP will remove the returned txbuf from the "transmitted buffer queue" and indicate the txbuf has finished transmission. The "not my txbuf" message probably shows grub stole the txbuf of an icmpv6 router solicitation and couldn't get its own txbuf any more. Hence the network stalls. Grub's efinet uses SNP which only allows exclusive access for one application while the firmware's (EDK2) IPv6 stack uses MNP which is an abstraction over SNP to allow concurrent operation. This looks quite problematic. But I still have no idea how the zero bytes packets are caused.
There's nothing in the spec that says you can't use SNP when the firmware is using MNP - and in fact there are specific APIs in MNP to tell when this is happening! There's also no requirement that MNP is used for IPv6 - there's explicit support for it in SNP. But even aside from that, the firmware should not be filling the network with garbage data. There is no case where that's not a firmware bug. With that in mind, I'm closing this.