Bug 1335440 - chainloading big ipxe config file causes boot to hang while booting through http
Summary: chainloading big ipxe config file causes boot to hang while booting through http
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: ipxe
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-12 08:49 UTC by mkovacik
Modified: 2016-12-04 23:01 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-12-04 23:01:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
screen log of node-1 loop booting longer chainloaded ipxe config file (2.40 KB, text/plain)
2016-05-12 08:49 UTC, mkovacik
no flags Details
tcpdump pcap of the guest failing to chainload longer ipxe config file (40.00 KB, application/octet-stream)
2016-05-12 08:52 UTC, mkovacik
no flags Details
tcpdump pcap of the guest able to chainload shorter ipxe config (42.12 KB, application/octet-stream)
2016-05-12 08:55 UTC, mkovacik
no flags Details
screen log of node-1 booting with chainloaded shorter ipxe config (5.25 KB, text/plain)
2016-05-12 08:56 UTC, mkovacik
no flags Details
node-1 virsh dumpxml (3.18 KB, text/plain)
2016-05-12 08:58 UTC, mkovacik
no flags Details
my devstack local.conf (2.39 KB, text/plain)
2016-05-12 09:00 UTC, mkovacik
no flags Details

Description mkovacik 2016-05-12 08:49:37 UTC
Created attachment 1156523 [details]
screen log of node-1 loop booting longer chainloaded ipxe config file

Description of problem:
I've encountered this issue trying to boot qemu virt host via OpenStack Ironic project.
It seems that chainloading a long enough ipxe config file over http causes the boot to hang.
I was able to capture the http and dhcp traffic between host both with a "hanging" and a booting guest systems.
The guest and host are the same in both scenarios: host: a F23 virtualbox instance and guest a qemu guest inside the host (see attached virsh dumpxml)
From this traffic log my impression is  that if the configuration file (chain-loaded) is too big (not fitting single http response packet) wrong checksum packets come from the guest system while downloading the rest of the config file through http.
The traffic is cut off by httpd eventually while keeping the guest waiting in a loop forever.
If the config file is reduced in size (so as to fit a single http response packet) guest system boots with no issue.

Version-Release number of selected component (if applicable):
qemu-common-2.4.1-8.fc23.x86_64
ipxe-roms-qemu-20150407-3.gitdc795b9f.fc23.noarch
libvirt-daemon-driver-qemu-1.2.18.2-3.fc23.x86_64
qemu-system-x86-2.4.1-8.fc23.x86_64
qemu-img-2.4.1-8.fc23.x86_64
qemu-kvm-2.4.1-8.fc23.x86_64

How reproducible:
Always

Steps to Reproduce:
## through my OpenStack deployment
0. deploy Devstack env with IRONIC_IPXE_ENABLED=True (see my local.conf attached)
1. ironic node-set-provision-state node-1 manage
2. ironic node-set-provision-state node-1 provide
3. bootin node-1 hangs (ironic node-list shows node-1 in clean-wait state forever)


## through manipulating the chainloaded ipxe config file
0. use the attached longer chainloading ipxe file (in my case /opt/stack/data/ironic/httpboot/pxelinux.cfg/52-54-00-45-4a-d6)
1. virsh reset node-1




Actual results:
 node-1 loops downloading the chainloaded ipxe file forever

Expected results:
 node-1 manages to download a chainloaded ipxe file even if it is larger than a single http response packet

Additional info:

Comment 1 mkovacik 2016-05-12 08:52:09 UTC
Created attachment 1156524 [details]
tcpdump pcap of the guest failing to chainload longer ipxe config file

see traffic around packets 70--90

Comment 2 mkovacik 2016-05-12 08:55:28 UTC
Created attachment 1156526 [details]
tcpdump pcap of the guest able to chainload shorter ipxe config

see packet #77

Comment 3 mkovacik 2016-05-12 08:56:44 UTC
Created attachment 1156528 [details]
screen log of node-1 booting with chainloaded shorter ipxe config

Comment 4 mkovacik 2016-05-12 08:58:10 UTC
Created attachment 1156543 [details]
node-1 virsh dumpxml

Comment 5 mkovacik 2016-05-12 09:00:10 UTC
Created attachment 1156563 [details]
my devstack local.conf

Comment 6 Dmitry Tantsur 2016-05-12 09:14:22 UTC
FYI: in RHOSP we had to update the iPXE ROM to 20160127-1.git6366fa7a.el7 to fix numerous similar issues.

Comment 7 Lucas Alvares Gomes 2016-05-12 14:33:51 UTC
Thanks Milan, great report.

Would be good to test it on baremetal to see if that also affects the iPXE ROM that we chainload (the one in the /tftpboot) or if it just affects the iPXE QEMU ROMS.

Comment 8 mkovacik 2016-05-16 16:39:28 UTC
(In reply to Dmitry Tantsur from comment #6)
> FYI: in RHOSP we had to update the iPXE ROM to 20160127-1.git6366fa7a.el7 to
> fix numerous similar issues.

unfortunately that version doesn't work for me either

Comment 9 Cole Robinson 2016-08-03 17:10:39 UTC
I just built a newer version of ipxe in rawhide, it should install fine on older fedora. Can someone give it a spin and see if the issue persists? Grab the RPMs with:

  koji download-build ipxe-20160622-1.git0418631.fc26

Comment 10 Fedora End Of Life 2016-11-25 09:00:44 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Cole Robinson 2016-12-04 23:01:13 UTC
Since NEEDINFO has gone unresponded for a while, closing as INSUFFICENT_DATA. If anyone is still hitting this bug on F24+, please try one of the newer ipxe builds like suggested in comment #9


Note You need to log in before you can comment on or make changes to this bug.