Bug 1335440

Summary: chainloading big ipxe config file causes boot to hang while booting through http
Product: [Fedora] Fedora Reporter: mkovacik
Component: ipxeAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: berrange, crobinso, dtantsur, lmartins, mkovacik, pbonzini, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-04 23:01:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screen log of node-1 loop booting longer chainloaded ipxe config file
none
tcpdump pcap of the guest failing to chainload longer ipxe config file
none
tcpdump pcap of the guest able to chainload shorter ipxe config
none
screen log of node-1 booting with chainloaded shorter ipxe config
none
node-1 virsh dumpxml
none
my devstack local.conf none

Description mkovacik 2016-05-12 08:49:37 UTC
Created attachment 1156523 [details]
screen log of node-1 loop booting longer chainloaded ipxe config file

Description of problem:
I've encountered this issue trying to boot qemu virt host via OpenStack Ironic project.
It seems that chainloading a long enough ipxe config file over http causes the boot to hang.
I was able to capture the http and dhcp traffic between host both with a "hanging" and a booting guest systems.
The guest and host are the same in both scenarios: host: a F23 virtualbox instance and guest a qemu guest inside the host (see attached virsh dumpxml)
From this traffic log my impression is  that if the configuration file (chain-loaded) is too big (not fitting single http response packet) wrong checksum packets come from the guest system while downloading the rest of the config file through http.
The traffic is cut off by httpd eventually while keeping the guest waiting in a loop forever.
If the config file is reduced in size (so as to fit a single http response packet) guest system boots with no issue.

Version-Release number of selected component (if applicable):
qemu-common-2.4.1-8.fc23.x86_64
ipxe-roms-qemu-20150407-3.gitdc795b9f.fc23.noarch
libvirt-daemon-driver-qemu-1.2.18.2-3.fc23.x86_64
qemu-system-x86-2.4.1-8.fc23.x86_64
qemu-img-2.4.1-8.fc23.x86_64
qemu-kvm-2.4.1-8.fc23.x86_64

How reproducible:
Always

Steps to Reproduce:
## through my OpenStack deployment
0. deploy Devstack env with IRONIC_IPXE_ENABLED=True (see my local.conf attached)
1. ironic node-set-provision-state node-1 manage
2. ironic node-set-provision-state node-1 provide
3. bootin node-1 hangs (ironic node-list shows node-1 in clean-wait state forever)


## through manipulating the chainloaded ipxe config file
0. use the attached longer chainloading ipxe file (in my case /opt/stack/data/ironic/httpboot/pxelinux.cfg/52-54-00-45-4a-d6)
1. virsh reset node-1




Actual results:
 node-1 loops downloading the chainloaded ipxe file forever

Expected results:
 node-1 manages to download a chainloaded ipxe file even if it is larger than a single http response packet

Additional info:

Comment 1 mkovacik 2016-05-12 08:52:09 UTC
Created attachment 1156524 [details]
tcpdump pcap of the guest failing to chainload longer ipxe config file

see traffic around packets 70--90

Comment 2 mkovacik 2016-05-12 08:55:28 UTC
Created attachment 1156526 [details]
tcpdump pcap of the guest able to chainload shorter ipxe config

see packet #77

Comment 3 mkovacik 2016-05-12 08:56:44 UTC
Created attachment 1156528 [details]
screen log of node-1 booting with chainloaded shorter ipxe config

Comment 4 mkovacik 2016-05-12 08:58:10 UTC
Created attachment 1156543 [details]
node-1 virsh dumpxml

Comment 5 mkovacik 2016-05-12 09:00:10 UTC
Created attachment 1156563 [details]
my devstack local.conf

Comment 6 Dmitry Tantsur 2016-05-12 09:14:22 UTC
FYI: in RHOSP we had to update the iPXE ROM to 20160127-1.git6366fa7a.el7 to fix numerous similar issues.

Comment 7 Lucas Alvares Gomes 2016-05-12 14:33:51 UTC
Thanks Milan, great report.

Would be good to test it on baremetal to see if that also affects the iPXE ROM that we chainload (the one in the /tftpboot) or if it just affects the iPXE QEMU ROMS.

Comment 8 mkovacik 2016-05-16 16:39:28 UTC
(In reply to Dmitry Tantsur from comment #6)
> FYI: in RHOSP we had to update the iPXE ROM to 20160127-1.git6366fa7a.el7 to
> fix numerous similar issues.

unfortunately that version doesn't work for me either

Comment 9 Cole Robinson 2016-08-03 17:10:39 UTC
I just built a newer version of ipxe in rawhide, it should install fine on older fedora. Can someone give it a spin and see if the issue persists? Grab the RPMs with:

  koji download-build ipxe-20160622-1.git0418631.fc26

Comment 10 Fedora End Of Life 2016-11-25 09:00:44 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Cole Robinson 2016-12-04 23:01:13 UTC
Since NEEDINFO has gone unresponded for a while, closing as INSUFFICENT_DATA. If anyone is still hitting this bug on F24+, please try one of the newer ipxe builds like suggested in comment #9