Bug 1295132

Summary: Ironic fails to install the boot loader on PERC RAID1 volume
Product: Red Hat OpenStack Reporter: Gonéri Le Bouder <goneri>
Component: openstack-ironicAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED NOTABUG QA Contact: Toure Dunnon <tdunnon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: arkady_kanevsky, christopher_dearborn, gael_rehault, goneri, John_walsh, mburns, randy_perryman, rhel-osp-director-maint, srevivo
Target Milestone: ---Keywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-06 13:02:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
definition of the RAID1 volume
none
RAID1 disk definition
none
(partial) list of disk
none
final error screen
none
devices as seen fron dracut mini shell none

Description Gonéri Le Bouder 2016-01-02 17:23:49 UTC
Description of problem:


Version-Release number of selected component (if applicable):

I use a DELL R730xd aith a PERC H730 Mini RAId controller. The machine has 12 hard drives. The two last ones are aggregated in a RAID1 virtual drive.
This virtual drive is the bootable device. I took some screenshot of the RAID configuration.

I use the last poodles OSP-d 2015-12-03.1 / OSP 2015-12-22.2 with the following images:
mburns/8.0/2015-12-03.1/images/deploy-ramdisk-ironic.tar
mburns/8.0/2015-12-03.1/images/ironic-python-agent.tar
mburns/8.0/2015-12-03.1/images/overcloud-full.tar

Kernel see the two drives of the RAID (sda and sdb) instead of just one virtual disk. Ironic fails to install the boot loader. The machine will restart on the old grub and fail to boot.

I tried to create a new RAM disk from the master branch of disk-imagebuild ( f389b3a04d75a35bb99ac68d13eaaa634cba7650 ) with the following command:
./bin/ramdisk-image-create -o deploy.ramdisk --ramdisk-element dracut-ramdisk ironic-agent centos7
And I get the same issue.

How reproducible:

Use "nova boot" to deploy a bare-metal node on this server.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Gonéri Le Bouder 2016-01-02 17:25:18 UTC
Created attachment 1111056 [details]
definition of the RAID1 volume

Comment 3 Gonéri Le Bouder 2016-01-02 17:26:00 UTC
Created attachment 1111057 [details]
RAID1 disk definition

Comment 4 Gonéri Le Bouder 2016-01-02 17:27:31 UTC
Created attachment 1111058 [details]
(partial) list of disk

Comment 5 Gonéri Le Bouder 2016-01-02 17:28:57 UTC
Created attachment 1111059 [details]
final error screen

Comment 6 Gonéri Le Bouder 2016-01-02 19:38:10 UTC
Created attachment 1111074 [details]
devices as seen fron dracut mini shell

Comment 7 Gonéri Le Bouder 2016-01-02 22:29:44 UTC
*** Bug 1295131 has been marked as a duplicate of this bug. ***

Comment 8 Gonéri Le Bouder 2016-01-03 07:06:15 UTC
I managed to get my nodes deployed:

 - I totally drop all Virtual drives and I only do basic JBOD
   AND
 - I use a ramdisk build from the current diskimage-build master branch

In this case Ironic happily pick the first hard drive. Sadly, this means I've to scarify one of the SSD. The SATA disks dedicated for the operating system are at the end of the list and sda is one of the SSD.

Comment 9 Randy Perryman 2016-01-06 16:55:17 UTC
Looking through the bug items to check:
1. your R1 is probably at /dev/sdm or sdl not sda.  The volume will come last(I do not know why)
2. 200GB SSD will that handle the partition scheme?

Comment 10 Gonéri Le Bouder 2016-01-06 18:13:11 UTC
1. Yes, I think so.
2. Yes it does.

A easy solution if it's acceptable to lose the RAID1 volume for the system is to swap/reorder the SSD and SATA disks.

Comment 11 Chris Dearborn 2016-01-20 00:22:05 UTC
By default, Ironic installs the OS on the first disk of size >= 4 gigs.  As a result, it's trying to install the OS on sda, when it should be installing on sdm or sdl.  I'm currently working on a patch to JS 5.0 that will tell ironic which drive to install on.

Comment 12 Lucas Alvares Gomes 2016-10-03 15:52:11 UTC
(In reply to Chris Dearborn from comment #11)
> By default, Ironic installs the OS on the first disk of size >= 4 gigs.  As
> a result, it's trying to install the OS on sda, when it should be installing
> on sdm or sdl.  I'm currently working on a patch to JS 5.0 that will tell
> ironic which drive to install on.

Hi Chris, Goneri,

We do have a mechanism in place to tell Ironic what disk to pick when deploying the node it's called "root device hints" [0], can you please try it out and see if it works for you ?

[0] http://docs.openstack.org/project-install-guide/baremetal/draft/advanced.html#specifying-the-disk-for-deployment-root-device-hints

Comment 13 Chris Dearborn 2016-10-05 22:05:09 UTC
Hey Lucas,

so since this defect was created, we have started using root device hints and have gotten things working.  At the time, the only way to identify the OS disk was by size (most of the other hints are not supported by the iDRAC), so we have a pretty ugly hack in place: when we create the OS RAID volume, we make sure to create it with a unique size.  Since then, device name was added to root device hints (thanks for that!).  We will be switching our RAID creation over to use the new Ironic RAID API, and when we do that, we'll look at switching from using size to using the device name for root device.

The end result is that I believe you can close this bug.

Comment 14 Lucas Alvares Gomes 2016-10-06 13:02:32 UTC
(In reply to Chris Dearborn from comment #13)
> Hey Lucas,
> 
> so since this defect was created, we have started using root device hints
> and have gotten things working.  At the time, the only way to identify the
> OS disk was by size (most of the other hints are not supported by the
> iDRAC), so we have a pretty ugly hack in place: when we create the OS RAID
> volume, we make sure to create it with a unique size.  Since then, device
> name was added to root device hints (thanks for that!).  We will be
> switching our RAID creation over to use the new Ironic RAID API, and when we
> do that, we'll look at switching from using size to using the device name
> for root device.
> 
> The end result is that I believe you can close this bug.

Thanks for the reply Chris!

Yes I totally agree that root device hints was (and still is) a painful to use. It's too ossified because it only does exact matching of the values (like the exact size as you mentioned). There's some work going on in this cycle (ocata) to make it a bit more flexible [0], hope it will come handy!

[0] https://bugs.launchpad.net/ironic/+bug/1561137 

Cheers,
Lucas