Red Hat Bugzilla – Bug 1283436
[RFE] Diskless boot via iSCSI & iBFT
Last modified: 2017-09-09 22:12:12 EDT
Description: When using the rhel-osp-director (RHEL7.1), the /usr/lib/python2.7/site-packages/ironic/drivers/modules/ipxe_config.template file can be amended with the options "rd.iscsi.ibft=1 rd.iscsi.firmware=1" in order to facilitate iSCSI boots for remotely presented iSCSI LUNs to servers for stateless booting (no local disks present in the server itself). The file /httpboot/discoverd.ipxe can also be amended with "rd.iscsi.firmware=1" so that Ironic discovers the remote LUN during discovery/introspection, which it does.
However, the booted server nodes incorrectly begin the "targetcli-wrapper /backstores/block create block1 dev=/dev/sda" command and become hung. Udev then crashes it appears. I'm not certain, but I think the other iSCSI IQN of the Ironic server presented to the provisioned node (for kernel and ramdisk) may be accidentally being chosen by the targetcli-wrapper script, complicating matters. How can we install RHEL-OSP7 on server nodes with LUNs provided by iSCSI instead of local disks? The ability to pass "ip=ibft" to probe the iSCSI Boot Firmware table used to work in RHEL-OSP6, but no longer does with Ironic and iPXE for RHEL-OSP7. I need some help debugging this.
If this is the incorrect component for this bug, I apologize. Can it be redirected please? We will really need this ability for OSP8 GA.
Details: Base install of RHEL 7.1 with partner provided key (NetApp) for OSP7 repositories. Specific package levels:
Bugzilla dependencies (if any): N/A
Hardware dependencies (if any): N/A
Date it will be upstream: N/A
Severity (U/H/M/L): H
Business Priority: Must
Hey Lucas, do you think this is an RFE or a defect?
Ironic currently does not support booting from volume, we have related specs upstream  about booting cinder volumes in Ironic.
Hi @Lucas, this is actually outside of Cinder itself.
Normally one can provide a block device through either Fibre Channel, iSCSI, or even FCoE to the host as a root volume for the operating system directly. Booting from SAN through either of these protocols can be done versus having the OS installed on local disks in the system. This provides significant advantages such as reduced management cost, allowing server profiles to "move" throughout the overall infrastructure (since it's disk is hosted on the managed storage device and is stateless), and the ability to employ space efficiency on the volume containing server LUNs.
This used to work in OSP6 by passing the "ip=ibft" option to the kernel command line, and it would probe the iSCSI Boot Firmware table (ibft) and see the remote LUN hosted on a managed storage device (like NetApp).
In the past few days, I've been able to figure out how to get the discovery image to detect and find the LUN and add it to the Ironic node database. Doing an ironic node-validate against the discovered node shows a UUID for the block device, which is the LUN. It even looks as if Ironic installs RHEL 7.1 on the discovered nodes properly through the deployment ramdisk and associated kernel. The server even reboots and I see the RHEL7.1 grub screen, which is a great sign! However, the server starts to boot and then never finds its root disk, dropping to a dracut shell. Upon further inspection I see the following issues:
(1) passing "iscsi_firmware" or "ip=ibft" to the kernel appears as if these modules are not included in the overcloud image.
(2) all of the physical interfaces are down on the server, which would prevent iscsistart from logging into the remote storage device.
(3) Grub inserts root=UUID=<UUIDX>, where UUIDX is the same disk used to install RHEL7.1 by the deployment ramdisk in Ironic. This won't work.
1) It's pretty possible that we don't include all modules. Are these shipped in separate packages?
2) I suspect here the problem is that we bootstrap DHCP in user space after root disk is mounted. I will defer the response to TripleO folks, as I'm not sure how to proceed with it.
3) Could you elaborate on why it's wrong? What should be done instead?
I'm retargeting this bug to a more generic component, as it's not only ironic problem.
Dmitry, thanks for responding to my e-mail here.
It looks like older overcloud images don't have the iscsi_ibft.ko kernel module in the /lib/modules directory. My beta copy of OSP8 seems to have that now, so that's great.
For the root=UUID, since the overcloud image and subsequent provisioning process does not accommodate LVM and uses just straight partitions (/dev/sd[a-x]), I would rather it target the symbolic link /dev/disk/by-label/img-rootfs, which appears to be more properly mapped to (on my system at least) /dev/dm-4, so that dm-multipath mounts the proper root device. This could be a red herring, but just comparing this to my installed OSP6 environment w/ RHEL 7.1 that uses LVM and has me somewhat concerned.
Can't say if that's completely the problem as the NIC cards are all down. Any way that you know of for me to force them up, short of opening up and modifying the overcloud image?
I can boot a rescue image and mount the provisioned overcloud image and see the disk, so I'm feeling a bit better and am more confident that it provisions now, just can't get it to boot right. Thoughts? I'm attaching a picture of some command output on my provisioned overcloud server and booting it with a rescue image to see what's going on in it.
Created attachment 1112961 [details]
BZ1283436 Picture of provisioned Overcloud server booted via rescue mode and associated commands
Dan, when is dhcp-all-interfaces run on the overcloud? I have a feeling that it happens after root is mounted which can be a problem in this case.
We bootstrap DHCP on the nodes via this udev rule:
That rule starts a service which ultimately runs this script:
The idea is we want to run DHCP as early as possible when network interfaces are discovered and active. I'd be curious to know why any of this might be causing udev to crash though...
Dan, I haven't seen udev crash since trying to get this functionality to work in OSP7. I moved on hoping that things were better in OSP8, and I have not seen that specific failure since.
With respect to the network interfaces, it looks as if my provisioned bare-metal overcloud node starts DHCP on the ibft0 and ibft1 interfaces as read from the iscsi boot firmware table. I can ping both of the interfaces that are set (if I use rd.debug) from my storage hardware, so I think we're good there. What doesn't seem to happen is logging into the fabric via "iscsistart -b". I never see those iscsi sessions on my storage hardware, so I don't think it is getting there. Does it try and look for the root disk before launching that command from dracut scripts?
Dmitry was helping me over IRC and trying to get it to where I could look at the console here. No luck so far to report with trying "rd.break=pre-mount” or "rd.break=initqueue” passing to the kernel. It always freezes with:
dracut-initqueue: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue: Warning: Could not boot.
Note that dhcp-all-interfaces only happens after dracut switches execution to the root partition. If dracut is failing then it isn't related to that.
You may need to stop the boot process even earlier to see what's going on. Probably pre-udev or pre-trigger. I'm not sure whether initqueue is considered a valid stopping point. It's possible that rd.break only recognizes the steps prefixed with Hook: on https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_hook_pre_udev
I can't say I have any experience booting with an iscsi root partition, but I see there are a pile of dracut options here: https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_iscsi Do any of those look helpful?
Part of the problem here I think is that the overcloud image that I have, overcloud-full-8.0-20151203.1-beta-2.tar, does not have the "multipath", "md", or "iscsi" modules inserted into the initramfs. I think that's why dracut whines about not being able to boot, it doesn't have the ability to find the disk at that stage.
If I get onto the provisioned server using the rescue image and regenerate the ramdisk with dracut and reboot, it now boots. I just got a two node deployment working on remote LUNs via iSCSI among a slew of other modifications in the discoverd portion of ironic, and the overcloud image itself.
I'm looking for a more automated way to fix this problem though as a part of an OpenStack deployment using the Director. Who can I talk to about the overcloud image to see what can be done here?
Okay, that makes me think the problem is that the overcloud images are built from a RHEL cloud image base. I would guess the cloud image doesn't include those modules in the interest of saving space, and since we just pull the initramfs out of the image we only get what was already included.
One workaround would be to save a copy of the regenerated initramfs and replace the ramdisk image from the tar file with the regenerated one. If you then run "openstack overcloud image upload --update-existing" it will use the regenerated image for future deployments.
Obviously this isn't ideal since it would potentially need to be done again for every release, but until we can get it resolved properly it might unblock you. I'll have to look into whether just running dracut as part of our image build will generate an initramfs with the necessary modules.
Ben, what you're describing is exactly what I've done to get me going: repackaging the overcloud deployment image with a newly generated initramfs (with "multipath", "md", or "iscsi") and re-uploaded it to Glance for future deployments. Totally agree with you, we 100% need to get functionality added to the overcloud images to support diskless provisioning/boot so we're not opening up the overcloud image each release to do this. Please do let me know if there's a different way to get that initramfs updated in the meantime.
One thing that is not working however throughout all of this is the Ironic Python Agent (IPA). The discovery process it appears uses this now, and I can't get it to find the iSCSI LUN no matter what I pass to the kernel. I've been using the old bash discoverd boot disk from OSP7 for now to get me over this hump. It looks as if the deployment ramdisk in my copy of the OSP8 beta (8.0-20151203.1-beta-2) is the old bash-based one, so I fear the same problem will be evident if OSP8 GA only includes the IPA for both discovery and deployment. How can we handle this?
(In reply to Ben Nemec from comment #13)
> ...I'll have to look into whether just running dracut as part of
> our image build will generate an initramfs with the necessary modules.
Hey Ben, what is the best way to regenerate the initramfs inside of the overcloud image? I have the kernel version present in the beta 2 image on a server of mine, and I've been running dracut with the -m flag specifying the modules + multipath, md, and iscsi. I then copy the initramfs into the overcloud-full.qcow2 file and re-upload to Glance using the "--update-existing" switch as you've indicated.
Is there a better way to do the regeneration?
That's about what I would do too. You may not need to insert it into the qcow2 file though - I think Ironic will install the ramdisk from Glance for you at deploy time, so the one in the image itself doesn't matter. I could be wrong about that though.
Okay, I think I finally understand why the modules weren't being included in my test ramdisks, and it really is as simple as re-running dracut during the image build. I'm linking an upstream patch to the bug that I believe should fix the problem. At least it seems to be including the needed modules for this:
I don't have a way to test this further, so if anyone who is doing this can verify the list of modules above is sufficient I would appreciate it. Thanks.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
(In reply to Ben Nemec from comment #19)
> I don't have a way to test this further, so if anyone who is doing this can
> verify the list of modules above is sufficient I would appreciate it.
Ben, this list looks good to me.
To whom it may concern -
We have Emulex 'OneConnect' which is being configured at bios phase pointing to remote scsi drives. servers has no local disks.
We had the same issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1328585
I have managed to overcome the issue (introspective part only for now...) using the suggested patch (https://bugzilla.redhat.com/attachment.cgi?id=1158689&action=diff)
And adding the following on top of it:
"""Wait for the udev event queue to settle.
Wait for the udev event queue to settle to make sure all devices
are detected once the machine boots up.
except processutils.ProcessExecutionError as e:
LOG.warning('Something went wrong when waiting for udev '
'to settle. Error: %s', e)
Hi! That's interesting, does this 'iscsistart -b' try to initialize iSCSI devices from kernel command line (the man page is a bit confusing)? Does it make sense for us to try running it every time?
That's great Yossi,
Might it make more sense to put the iscsistart before the udevadm? that way udev will have a chance to make itself aquatinted to the the newly discovered block devices.
Good work finding that one..
I did some more digging, ( I simply grep -r isacistart -b.. )
this is what I've found -
In usr/lib/dracut/modules.d/95iscsi/iscsiroot.sh theres :
if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
Where handle_firmware is
if ! iscsistart -f; then
warn "iscistart: Could not get list of targets from firmware."
for p in $(getargs rd.iscsi.param -d iscsi_param); do
iscsi_param="$iscsi_param --param $p"
if ! iscsistart -b $iscsi_param; then
warn "'iscsistart -b $iscsi_param' failed with return code $?"
echo 'started' > "/tmp/iscsistarted-iscsi:"
echo 'started' > "/tmp/iscsistarted-firmware"
My /httpboot/inspector.ipxe has 'iscsi_firmware'
I'll need to investigate as what's going on there.
This seems like a better place for a fix.
Interesting, seems like rd.iscsi.firmware=1 might do the trick for you. Please let me know how it ends up.
I have tried rd.iscsi.firmware=1 and it fails with exact same error. ( also tried iscsi_firmware )
Status update: the fix for the initramfs was merged to master and mitaka upstream. Hopefully it will merge to liberty as well.
Now we're working on making IPA aware of iBFT by trying to call iscsiboot -b on start up. Please stay tuned.
(In reply to Itamar from comment #31)
> That's great Yossi,
> Might it make more sense to put the iscsistart before the udevadm? that way
> udev will have a chance to make itself aquatinted to the the newly
> discovered block devices.
> Good work finding that one..
Two things I made have improve the status -
- I notice that running iscsistart –b several times halts the system, so, we need to added code that make sure it will executed only once.
- I have changed the template to RH latest templates ( it resolved 'callback exception' )
Now this is where I am and need assistance -
When running the overcloud deployment I notice the following behavioural -
Two PXE boots -
At first boot, it loads deploy_kernel and deploy_ramdisk , note that deploy_ramdisk is my modified image. Everything looks fine , then it reboots and
In second boot , it loads – 'kernel' and 'ramdisk' ( See below. )
What I am seeing on seconds boot is that it dropped to dracut and unable to proceed.
I don't know how it is possible to modify the 'ramdisk' file ( I was unable to extract it same way I as with deploy_ramdisk)
[root@undercloud httpboot]# cd 485e57a4-eeb5-49c1-a379-ed7d4e92afe7/
[root@undercloud 485e57a4-eeb5-49c1-a379-ed7d4e92afe7]# ll
-rw-r--r--. 1 ironic ironic 1049 Jun 21 18:48 config
-rw-r--r--. 5 ironic ironic 5153536 Jun 21 18:08 deploy_kernel
-rw-r--r--. 5 ironic ironic 392371696 Jun 21 18:08 deploy_ramdisk
-rw-r--r--. 5 ironic ironic 5153408 Jun 15 17:32 kernel
-rw-r--r--. 5 ironic ironic 40324447 Jun 15 17:32 ramdisk
Screen shots attached shows the two different boots.
Please advise !
Created attachment 1170964 [details]
First Overcloud boot
Created attachment 1170965 [details]
Second overcloud boot
Moving to https://bugzilla.redhat.com/show_bug.cgi?id=1347430
Regarding ( by Dmitry )
"Status update: the fix for the initramfs was merged to master and mitaka upstream. Hopefully it will merge to liberty as well."
How/where from can I have a that fixed initramfs ?
Can I get a description of the fix ( I assume it's DIB ? ) so I can try on current version of OSP8 ?
Ben, what's the status of the initramfs fix in OSP8?
I see the multipath module in the current OSP 8 ramdisk image, so I'm inclined to believe it's done. The upstream change is attached to this bz, but here's a direct link: https://review.openstack.org/#/c/298439/
Thanks all for collaboration! I'm closing this bug in favor of 1347430 just to reset to clean history, as this one already contains a lot of not entirely related issues. Please feel free to open more bugs for specific issues you see.
*** This bug has been marked as a duplicate of bug 1347430 ***
I am reopening this bug. It was marked as duplicate of
which itself is marked as duplicate of private bug bug 1276147
Please note that flagging public bugs as duplicates of private bugs defies the purpose. We are an Open Source company, we should keep our customers up to date via public bugzillas.
Please feel free to close this one as dup and set 1347430 as open, but don't trace progress in a private bug report.
Also, while this case here is generic and would IMO also include diskless boots on Cisco UCI with iBFT, 1276147 seems to particularly only address Nokia technology.
Converting to an RFE, as we never properly supported it in the first place..
what is the status of this bug in OSP 10?