Bug 1283436 - [RFE] Diskless boot via iSCSI & iBFT
[RFE] Diskless boot via iSCSI & iBFT
Status: NEW
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
8.0 (Liberty)
x86_64 All
unspecified Severity urgent
: ---
: ---
Assigned To: Ben Nemec
Shai Revivo
: FutureFeature, Reopened
Depends On: 1382890 1411366 1467377
Blocks: 1273812 1317731
  Show dependency treegraph
 
Reported: 2015-11-18 18:52 EST by Dave Cain
Modified: 2017-09-09 22:12 EDT (History)
35 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-21 06:21:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
BZ1283436 Picture of provisioned Overcloud server booted via rescue mode and associated commands (456.56 KB, image/jpeg)
2016-01-08 14:03 EST, Dave Cain
no flags Details
First Overcloud boot (124.62 KB, image/jpeg)
2016-06-22 13:23 EDT, Yossi Ovadia
no flags Details
Second overcloud boot (89.11 KB, image/jpeg)
2016-06-22 13:24 EDT, Yossi Ovadia
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1590606 None None None 2016-06-10 07:32 EDT
Red Hat Knowledge Base (Solution) 2352991 None None None 2016-10-11 17:31 EDT
OpenStack gerrit 298439 None None None 2016-03-28 17:40 EDT
OpenStack gerrit 327807 None None None 2016-06-10 07:34 EDT

  None (edit)
Description Dave Cain 2015-11-18 18:52:03 EST
Description: When using the rhel-osp-director (RHEL7.1), the /usr/lib/python2.7/site-packages/ironic/drivers/modules/ipxe_config.template file can be amended with the options "rd.iscsi.ibft=1 rd.iscsi.firmware=1" in order to facilitate iSCSI boots for remotely presented iSCSI LUNs to servers for stateless booting (no local disks present in the server itself).  The file /httpboot/discoverd.ipxe can also be amended with "rd.iscsi.firmware=1" so that Ironic discovers the remote LUN during discovery/introspection, which it does.

However, the booted server nodes incorrectly begin the "targetcli-wrapper /backstores/block create block1 dev=/dev/sda" command and become hung.  Udev then crashes it appears.  I'm not certain, but I think the other iSCSI IQN of the Ironic server presented to the provisioned node (for kernel and ramdisk) may be accidentally being chosen by the targetcli-wrapper script, complicating matters.  How can we install RHEL-OSP7 on server nodes with LUNs provided by iSCSI instead of local disks?  The ability to pass "ip=ibft" to probe the iSCSI Boot Firmware table used to work in RHEL-OSP6, but no longer does with Ironic and iPXE for RHEL-OSP7.  I need some help debugging this.

If this is the incorrect component for this bug, I apologize.  Can it be redirected please?  We will really need this ability for OSP8 GA.

Details: Base install of RHEL 7.1 with partner provided key (NetApp) for OSP7 repositories.  Specific package levels:

python-rdomanager-oscplugin-0.0.10-8.el7ost.noarch
openstack-ironic-api-2015.1.1-4.el7ost.noarch
openstack-ironic-conductor-2015.1.1-4.el7ost.noarch
python-ironicclient-0.5.1-11.el7ost.noarch
openstack-ironic-discoverd-1.1.0-8.el7ost.noarch
python-ironic-discoverd-1.1.0-8.el7ost.noarch
openstack-ironic-common-2015.1.1-4.el7ost.noarch
ipxe-roms-qemu-20130517-6.gitc4bce43.el7.noarch
ipxe-bootimgs-20130517-6.gitc4bce43.el7.noarch

Bugzilla dependencies (if any): N/A

Hardware dependencies (if any): N/A

Upstream information

Date it will be upstream: N/A

Version: RHEL-OSP7

External links:


Severity (U/H/M/L): H

Business Priority: Must
Comment 2 Dave Cain 2015-12-04 11:46:00 EST
Hey Lucas, do you think this is an RFE or a defect?
Comment 3 Lucas Alvares Gomes 2015-12-08 07:14:13 EST
Hi @Dave,

Ironic currently does not support booting from volume, we have related specs upstream [0][1] about booting cinder volumes in Ironic.

[0] https://review.openstack.org/#/c/200496/
[1] https://wiki.openstack.org/wiki/Ironic/blueprints/cinder-integration
Comment 4 Dave Cain 2015-12-14 16:14:45 EST
Hi @Lucas, this is actually outside of Cinder itself.  

Normally one can provide a block device through either Fibre Channel, iSCSI, or even FCoE to the host as a root volume for the operating system directly.  Booting from SAN through either of these protocols can be done versus having the OS installed on local disks in the system. This provides significant advantages such as reduced management cost, allowing server profiles to "move" throughout the overall infrastructure (since it's disk is hosted on the managed storage device and is stateless), and the ability to employ space efficiency on the volume containing server LUNs.

This used to work in OSP6 by passing the "ip=ibft" option to the kernel command line, and it would probe the iSCSI Boot Firmware table (ibft) and see the remote LUN hosted on a managed storage device (like NetApp).

In the past few days, I've been able to figure out how to get the discovery image to detect and find the LUN and add it to the Ironic node database.  Doing an ironic node-validate against the discovered node shows a UUID for the block device, which is the LUN.  It even looks as if Ironic installs RHEL 7.1 on the discovered nodes properly through the deployment ramdisk and associated kernel.  The server even reboots and I see the RHEL7.1 grub screen, which is a great sign!  However, the server starts to boot and then never finds its root disk, dropping to a dracut shell.  Upon further inspection I see the following issues:

(1) passing "iscsi_firmware" or "ip=ibft" to the kernel appears as if these modules are not included in the overcloud image.
(2) all of the physical interfaces are down on the server, which would prevent iscsistart from logging into the remote storage device.
(3) Grub inserts root=UUID=<UUIDX>, where UUIDX is the same disk used to install RHEL7.1 by the deployment ramdisk in Ironic.  This won't work.
Comment 5 Dmitry Tantsur 2016-01-06 05:54:15 EST
Hi!

1) It's pretty possible that we don't include all modules. Are these shipped in separate packages?
2) I suspect here the problem is that we bootstrap DHCP in user space after root disk is mounted. I will defer the response to TripleO folks, as I'm not sure how to proceed with it.
3) Could you elaborate on why it's wrong? What should be done instead?

I'm retargeting this bug to a more generic component, as it's not only ironic problem.
Comment 6 Dave Cain 2016-01-08 14:02:43 EST
Dmitry, thanks for responding to my e-mail here.

It looks like older overcloud images don't have the iscsi_ibft.ko kernel module in the /lib/modules directory.  My beta copy of OSP8 seems to have that now, so that's great.

For the root=UUID, since the overcloud image and subsequent provisioning process does not accommodate LVM and uses just straight partitions (/dev/sd[a-x]), I would rather it target the symbolic link /dev/disk/by-label/img-rootfs, which appears to be more properly mapped to (on my system at least) /dev/dm-4, so that dm-multipath mounts the proper root device.  This could be a red herring, but just comparing this to my installed OSP6 environment w/ RHEL 7.1 that uses LVM and has me somewhat concerned.

Can't say if that's completely the problem as the NIC cards are all down.  Any way that you know of for me to force them up, short of opening up and modifying the overcloud image?

I can boot a rescue image and mount the provisioned overcloud image and see the disk, so I'm feeling a bit better and am more confident that it provisions now, just can't get it to boot right.  Thoughts?  I'm attaching a picture of some command output on my provisioned overcloud server and booting it with a rescue image to see what's going on in it.
Comment 7 Dave Cain 2016-01-08 14:03 EST
Created attachment 1112961 [details]
BZ1283436 Picture of provisioned Overcloud server booted via rescue mode and associated commands
Comment 8 Dmitry Tantsur 2016-01-12 05:17:15 EST
Adding Dan.

Dan, when is dhcp-all-interfaces run on the overcloud? I have a feeling that it happens after root is mounted which can be a problem in this case.
Comment 9 Dan Prince 2016-01-12 14:13:06 EST
We bootstrap DHCP on the nodes via this udev rule:

http://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/dhcp-all-interfaces/install.d/dhcp-all-interfaces-udev.rules

That rule starts a service which ultimately runs this script:

http://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/dhcp-all-interfaces/install.d/dhcp-all-interfaces.sh

The idea is we want to run DHCP as early as possible when network interfaces are discovered and active. I'd be curious to know why any of this might be causing udev to crash though...
Comment 10 Dave Cain 2016-01-12 15:30:17 EST
Dan, I haven't seen udev crash since trying to get this functionality to work in OSP7.  I moved on hoping that things were better in OSP8, and I have not seen that specific failure since.

With respect to the network interfaces, it looks as if my provisioned bare-metal overcloud node starts DHCP on the ibft0 and ibft1 interfaces as read from the iscsi boot firmware table.  I can ping both of the interfaces that are set (if I use rd.debug) from my storage hardware, so I think we're good there.  What doesn't seem to happen is logging into the fabric via "iscsistart -b".  I never see those iscsi sessions on my storage hardware, so I don't think it is getting there.  Does it try and look for the root disk before launching that command from dracut scripts?

Dmitry was helping me over IRC and trying to get it to where I could look at the console here.  No luck so far to report with trying "rd.break=pre-mount” or "rd.break=initqueue” passing to the kernel.  It always freezes with:

dracut-initqueue[564]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[564]: Warning: Could not boot.
Comment 11 Ben Nemec 2016-01-13 10:39:06 EST
Note that dhcp-all-interfaces only happens after dracut switches execution to the root partition.  If dracut is failing then it isn't related to that.

You may need to stop the boot process even earlier to see what's going on.  Probably pre-udev or pre-trigger.  I'm not sure whether initqueue is considered a valid stopping point.  It's possible that rd.break only recognizes the steps prefixed with Hook: on https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_hook_pre_udev

I can't say I have any experience booting with an iscsi root partition, but I see there are a pile of dracut options here: https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_iscsi  Do any of those look helpful?
Comment 12 Dave Cain 2016-01-20 17:26:46 EST
Part of the problem here I think is that the overcloud image that I have, overcloud-full-8.0-20151203.1-beta-2.tar, does not have the "multipath", "md", or "iscsi" modules inserted into the initramfs.  I think that's why dracut whines about not being able to boot, it doesn't have the ability to find the disk at that stage.  

If I get onto the provisioned server using the rescue image and regenerate the ramdisk with dracut and reboot, it now boots.  I just got a two node deployment working on remote LUNs via iSCSI among a slew of other modifications in the discoverd portion of ironic, and the overcloud image itself.

I'm looking for a more automated way to fix this problem though as a part of an OpenStack deployment using the Director.  Who can I talk to about the overcloud image to see what can be done here?
Comment 13 Ben Nemec 2016-01-28 16:19:42 EST
Okay, that makes me think the problem is that the overcloud images are built from a RHEL cloud image base.  I would guess the cloud image doesn't include those modules in the interest of saving space, and since we just pull the initramfs out of the image we only get what was already included.

One workaround would be to save a copy of the regenerated initramfs and replace the ramdisk image from the tar file with the regenerated one.  If you then run "openstack overcloud image upload --update-existing" it will use the regenerated image for future deployments.

Obviously this isn't ideal since it would potentially need to be done again for every release, but until we can get it resolved properly it might unblock you.  I'll have to look into whether just running dracut as part of our image build will generate an initramfs with the necessary modules.
Comment 14 Dave Cain 2016-01-28 20:01:53 EST
Ben, what you're describing is exactly what I've done to get me going: repackaging the overcloud deployment image with a newly generated initramfs (with "multipath", "md", or "iscsi") and re-uploaded it to Glance for future deployments.  Totally agree with you, we 100% need to get functionality added to the overcloud images to support diskless provisioning/boot so we're not opening up the overcloud image each release to do this.  Please do let me know if there's a different way to get that initramfs updated in the meantime.

One thing that is not working however throughout all of this is the Ironic Python Agent (IPA).  The discovery process it appears uses this now, and I can't get it to find the iSCSI LUN no matter what I pass to the kernel.  I've been using the old bash discoverd boot disk from OSP7 for now to get me over this hump.  It looks as if the deployment ramdisk in my copy of the OSP8 beta (8.0-20151203.1-beta-2) is the old bash-based one, so I fear the same problem will be evident if OSP8 GA only includes the IPA for both discovery and deployment.  How can we handle this?
Comment 15 Dave Cain 2016-02-09 08:52:03 EST
(In reply to Ben Nemec from comment #13)
> ...I'll have to look into whether just running dracut as part of
> our image build will generate an initramfs with the necessary modules.

Hey Ben, what is the best way to regenerate the initramfs inside of the overcloud image?  I have the kernel version present in the beta 2 image on a server of mine, and I've been running dracut with the -m flag specifying the modules + multipath, md, and iscsi.  I then copy the initramfs into the overcloud-full.qcow2 file and re-upload to Glance using the "--update-existing" switch as you've indicated.

Is there a better way to do the regeneration?
Comment 16 Ben Nemec 2016-02-12 17:17:51 EST
That's about what I would do too.  You may not need to insert it into the qcow2 file though - I think Ironic will install the ramdisk from Glance for you at deploy time, so the one in the image itself doesn't matter.  I could be wrong about that though.
Comment 19 Ben Nemec 2016-03-28 17:40:32 EDT
Okay, I think I finally understand why the modules weren't being included in my test ramdisks, and it really is as simple as re-running dracut during the image build.  I'm linking an upstream patch to the bug that I believe should fix the problem.  At least it seems to be including the needed modules for this:

bash
modsign
nss-softokn
i18n
network
ifcfg
btrfs
crypt
dm
kernel-modules
lvm
mdraid
multipath
qemu
qemu-net
iscsi
nfs
resume
rootfs-block
terminfo
udev-rules
virtfs
biosdevname
systemd
usrmount
base
fs-lib
shutdown

I don't have a way to test this further, so if anyone who is doing this can verify the list of modules above is sufficient I would appreciate it.  Thanks.
Comment 20 Mike Burns 2016-04-07 16:57:01 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 23 Ruchika K 2016-04-11 14:09:23 EDT
http://pastebin.com/xF59JX0k
Comment 24 Dave Cain 2016-04-11 17:51:21 EDT
(In reply to Ben Nemec from comment #19)
> I don't have a way to test this further, so if anyone who is doing this can
> verify the list of modules above is sufficient I would appreciate it. 
> Thanks.

Ben, this list looks good to me.
Comment 29 Yossi Ovadia 2016-06-07 23:01:45 EDT
To whom it may concern - 
We have Emulex 'OneConnect' which is being configured at bios phase pointing to remote scsi drives. servers has no local disks.

We had the same issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1328585

I have managed to overcome the issue (introspective part only for now...)  using the suggested patch (https://bugzilla.redhat.com/attachment.cgi?id=1158689&action=diff) 

And adding the following on top of it: 

def _udev_settle():
    """Wait for the udev event queue to settle.

    Wait for the udev event queue to settle to make sure all devices
    are detected once the machine boots up.

    """
    try:
        utils.execute('udevadm', 'settle')
+        utils.execute('iscsistart','-b')
    except processutils.ProcessExecutionError as e:
        LOG.warning('Something went wrong when waiting for udev '
                    'to settle. Error: %s', e)
        return
Comment 30 Dmitry Tantsur 2016-06-08 03:05:45 EDT
Hi! That's interesting, does this 'iscsistart -b' try to initialize iSCSI devices from kernel command line (the man page is a bit confusing)? Does it make sense for us to try running it every time?
Comment 31 Itamar 2016-06-08 03:59:05 EDT
That's great Yossi,
Might it make more sense to put the iscsistart before the udevadm? that way udev will have a chance to make itself aquatinted to the the newly discovered block devices.
Good work finding that one..
Comment 32 Yossi Ovadia 2016-06-08 13:41:42 EDT
Hi, 

I did some more digging, ( I simply grep -r isacistart -b.. ) 
this is what I've found - 

In usr/lib/dracut/modules.d/95iscsi/iscsiroot.sh theres :

 if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
     if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
         handle_firmware
         ret=$?
     fi
 fi

Where handle_firmware is 
 handle_firmware()
 {
     if ! iscsistart -f; then
         warn "iscistart: Could not get list of targets from firmware."
         return 1
     fi  
 
     for p in $(getargs rd.iscsi.param -d iscsi_param); do
         iscsi_param="$iscsi_param --param $p"
     done  
 
     if ! iscsistart -b $iscsi_param; then
          warn "'iscsistart -b $iscsi_param' failed with return code $?"
     fi  
 
     echo 'started' > "/tmp/iscsistarted-iscsi:"
     echo 'started' > "/tmp/iscsistarted-firmware" 
 
     need_shutdown
     return 0
 }


My /httpboot/inspector.ipxe has 'iscsi_firmware' 

I'll need to investigate as what's going on there. 

This seems like a better place for a fix.
Comment 33 Dmitry Tantsur 2016-06-09 03:23:53 EDT
Interesting, seems like rd.iscsi.firmware=1 might do the trick for you. Please let me know how it ends up.
Comment 34 Yossi Ovadia 2016-06-09 10:28:05 EDT
I have tried rd.iscsi.firmware=1 and it fails with exact same error. ( also tried iscsi_firmware )
Comment 35 Dmitry Tantsur 2016-06-10 07:32:36 EDT
Thanks Yossi!

Status update: the fix for the initramfs was merged to master and mitaka upstream. Hopefully it will merge to liberty as well.

Now we're working on making IPA aware of iBFT by trying to call iscsiboot -b on start up. Please stay tuned.
Comment 36 Yossi Ovadia 2016-06-10 16:58:11 EDT
+1 !
(In reply to Itamar from comment #31)
> That's great Yossi,
> Might it make more sense to put the iscsistart before the udevadm? that way
> udev will have a chance to make itself aquatinted to the the newly
> discovered block devices.
> Good work finding that one..
Comment 37 Yossi Ovadia 2016-06-22 13:22:12 EDT
Hi,

Two things I made have improve the status - 

- I notice that running iscsistart –b several times halts the system, so, we need to  added code that make sure it will executed only once. 
- I have changed the template to RH latest templates ( it resolved 'callback exception' ) 

Now this is where I am and need assistance - 

When running the overcloud deployment I notice the following behavioural -
Two PXE boots -  
At first boot, it loads deploy_kernel and deploy_ramdisk , note that deploy_ramdisk is my modified image. Everything looks fine , then it reboots and 
In second boot , it loads – 'kernel' and 'ramdisk' ( See below. ) 
What I am seeing on seconds boot is that it dropped to dracut and unable to proceed. 

I don't know how it is possible to modify the 'ramdisk' file ( I was unable to extract it same way I as with deploy_ramdisk) 


[root@undercloud httpboot]# cd 485e57a4-eeb5-49c1-a379-ed7d4e92afe7/                                                   
[root@undercloud 485e57a4-eeb5-49c1-a379-ed7d4e92afe7]# ll                                                             
total 432632                                                                                                           
-rw-r--r--. 1 ironic ironic      1049 Jun 21 18:48 config                                                              
-rw-r--r--. 5 ironic ironic   5153536 Jun 21 18:08 deploy_kernel                                                       
-rw-r--r--. 5 ironic ironic 392371696 Jun 21 18:08 deploy_ramdisk                                                      
-rw-r--r--. 5 ironic ironic   5153408 Jun 15 17:32 kernel                                                              
-rw-r--r--. 5 ironic ironic  40324447 Jun 15 17:32 ramdisk

Screen shots attached shows the two different boots. 

Please advise !

Thanks.
Comment 38 Yossi Ovadia 2016-06-22 13:23 EDT
Created attachment 1170964 [details]
First Overcloud boot
Comment 39 Yossi Ovadia 2016-06-22 13:24 EDT
Created attachment 1170965 [details]
Second overcloud boot
Comment 40 Yossi Ovadia 2016-06-22 15:18:11 EDT
Moving to https://bugzilla.redhat.com/show_bug.cgi?id=1347430
Comment 41 Yossi Ovadia 2016-07-07 18:20:24 EDT
HI, 
Regarding ( by Dmitry ) 
 "Status update: the fix for the initramfs was merged to master and mitaka upstream.    Hopefully it will merge to liberty as well."

How/where from can I have a that fixed initramfs ?

Can I get a description of the fix ( I assume it's DIB ? ) so I can try on current version of OSP8 ? 

-Yossi
Comment 42 Dmitry Tantsur 2016-07-08 05:29:33 EDT
Ben, what's the status of the initramfs fix in OSP8?
Comment 43 Ben Nemec 2016-07-08 13:23:14 EDT
I see the multipath module in the current OSP 8 ramdisk image, so I'm inclined to believe it's done.  The upstream change is attached to this bz, but here's a direct link: https://review.openstack.org/#/c/298439/
Comment 49 Dmitry Tantsur 2016-09-21 06:21:06 EDT
Thanks all for collaboration! I'm closing this bug in favor of 1347430 just to reset to clean history, as this one already contains a lot of not entirely related issues. Please feel free to open more bugs for specific issues you see.

*** This bug has been marked as a duplicate of bug 1347430 ***
Comment 50 Andreas Karis 2016-12-26 09:39:59 EST
Hello,

I am reopening this bug. It was marked as duplicate of 
 bug 1347430
which itself is marked as duplicate of private bug bug 1276147

Please note that flagging public bugs as duplicates of private bugs defies the purpose. We are an Open Source company, we should keep our customers up to date via public bugzillas. 

Please feel free to close this one as dup and set 1347430 as open, but don't trace progress in a private bug report.

Also, while this case here is generic and would IMO also include diskless boots on Cisco UCI with iBFT, 1276147 seems to particularly only address Nokia technology.

Thanks,

- Andreas
Comment 51 Dmitry Tantsur 2017-02-09 05:06:51 EST
Converting to an RFE, as we never properly supported it in the first place..
Comment 52 MUHAMMAD AFZAL 2017-08-28 15:55:05 EDT
what is the status of this bug in OSP 10?

Note You need to log in before you can comment on or make changes to this bug.