Bug 1310778
Summary: | ipxe freeze during HTTP download in virtual and hardware env | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Gonéri Le Bouder <goneri> | ||||
Component: | openstack-ironic | Assignee: | Lucas Alvares Gomes <lmartins> | ||||
Status: | CLOSED ERRATA | QA Contact: | Toure Dunnon <tdunnon> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 8.0 (Liberty) | CC: | arkady_kanevsky, christopher_dearborn, gael_rehault, kbasil, mburns, mcornea, racedoro, rhel-osp-director-maint, sbaker, srevivo | ||||
Target Milestone: | async | Keywords: | OtherQA, Rebase, ZStream | ||||
Target Release: | 8.0 (Liberty) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-ironic-4.2.3-1.el7ost | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1337206 (view as bug list) | Environment: | |||||
Last Closed: | 2016-06-09 19:40:04 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1261979, 1310828, 1337206 | ||||||
Attachments: |
|
Description
Gonéri Le Bouder
2016-02-22 16:09:36 UTC
According to http://lists.ipxe.org/pipermail/ipxe-devel/2014-October/003829.html, this is the purpose of the --timeout parameter, but I don't see this argument in the configuration file generated by Ironic. ------------------- #!ipxe dhcp goto deploy :deploy kernel http://192.0.2.240:8088/bfc53b24-9f79-4e17-8bc0-b9657047a3c4/deploy_kernel selinux=0 disk=cciss/c0d0,sda,hda,vda iscsi_target_iqn=iqn.2008-10.org.openstack:bfc53b24-9f79-4e17-8bc0-b9657047a3c4 deployment_id=bfc53b24-9f79-4e17-8bc0-b9657047a3c4 deployment_key=2Q1KKOTSH42OS9FYPYYHF0DT564PDWCU ironic_api_url=http://192.0.2.240:6385 troubleshoot=0 text nofb nomodeset vga=normal boot_option=local ip=${ip}:${next-server}:${gateway}:${netmask} BOOTIF=${mac} ipa-api-url=http://192.0.2.240:6385 ipa-driver-name=pxe_ipmitool coreos.configdrive=0 initrd http://192.0.2.240:8088/bfc53b24-9f79-4e17-8bc0-b9657047a3c4/deploy_ramdisk boot :boot_partition kernel http://192.0.2.240:8088/bfc53b24-9f79-4e17-8bc0-b9657047a3c4/kernel root={{ ROOT }} ro text nofb nomodeset vga=normal initrd http://192.0.2.240:8088/bfc53b24-9f79-4e17-8bc0-b9657047a3c4/ramdisk boot :boot_whole_disk kernel chain.c32 append mbr:{{ DISK_IDENTIFIER }} If you're using OVB you'll likely need to adjust the undercloud MTU settings, in my environment I do it by: mtu=$(ifconfig eth0 |egrep -o "mtu [^ ]*") ifconfig eth1 $mtu ifconfig eth2 $mtu This comment may not be helpful if you actually want IPXE to have better timeout handling Thanks Steve. Indeed this seems to be the root of the problem here. https://review.openstack.org/283893 I pushed a review to add a default timeout to reduce the impact of this kind of problem. I opened https://review.openstack.org/#/c/288041 to be able to adjust the MTU through the undercloud.conf Chris, I'm pretty sure this is not a blocker in your case. The problem happens only when the MTU is < 1500. We had the issue once with Gael but it was because of a misconfiguration. Ok, I'd just a freeze during agent.ramdisk download. This time the MTU was ok (1400). I use upstream ipxe image, not the one provided by Red Hat. The final patch to get the --timeout merged is here: https://review.openstack.org/#/c/294787 and is blocked by: https://bugs.launchpad.net/ironic/+bug/1567449 This is where we stand with this - this bug went away for a while, and reappeared in Beta9/RC/GA releases. in https://bugzilla.redhat.com/show_bug.cgi?id=1322056, Goneri highlighted the fact that the ipxe packages got updated,so we tried (using both beta9 and GA) a workaround, installing the previous packages & locking them prior to installing the undercloud [osp_admin@director ~]$ rpm -qa | grep ipxe ipxe-bootimgs-20150821-1.git4e03af8e.el7.noarch ipxe-roms-qemu-20150821-1.git4e03af8e.el7.noarch Unfortunately, that does not prevent the freezes to happen for me (ran into it 2 out of 3 deployments today with the above in place), so there might be something else at play besides those packages versions Also to clarify, this bug can occur in 2 different steps of the deployment process : 1 - during the bulk introspection of the nodes 2 - during the overcloud deployment Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1220 |