Bug 1295132 - Ironic fails to install the boot loader on PERC RAID1 volume
Ironic fails to install the boot loader on PERC RAID1 volume
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 8.0 (Liberty)
Assigned To: Lucas Alvares Gomes
Toure Dunnon
: ZStream
: 1295131 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-02 12:23 EST by Gonéri Le Bouder
Modified: 2017-07-04 14:32 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-06 09:02:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
definition of the RAID1 volume (72.36 KB, image/png)
2016-01-02 12:25 EST, Gonéri Le Bouder
no flags Details
RAID1 disk definition (72.36 KB, image/png)
2016-01-02 12:26 EST, Gonéri Le Bouder
no flags Details
(partial) list of disk (117.10 KB, image/png)
2016-01-02 12:27 EST, Gonéri Le Bouder
no flags Details
final error screen (215.26 KB, image/png)
2016-01-02 12:28 EST, Gonéri Le Bouder
no flags Details
devices as seen fron dracut mini shell (161.47 KB, image/png)
2016-01-02 14:38 EST, Gonéri Le Bouder
no flags Details

  None (edit)
Description Gonéri Le Bouder 2016-01-02 12:23:49 EST
Description of problem:


Version-Release number of selected component (if applicable):

I use a DELL R730xd aith a PERC H730 Mini RAId controller. The machine has 12 hard drives. The two last ones are aggregated in a RAID1 virtual drive.
This virtual drive is the bootable device. I took some screenshot of the RAID configuration.

I use the last poodles OSP-d 2015-12-03.1 / OSP 2015-12-22.2 with the following images:
mburns/8.0/2015-12-03.1/images/deploy-ramdisk-ironic.tar
mburns/8.0/2015-12-03.1/images/ironic-python-agent.tar
mburns/8.0/2015-12-03.1/images/overcloud-full.tar

Kernel see the two drives of the RAID (sda and sdb) instead of just one virtual disk. Ironic fails to install the boot loader. The machine will restart on the old grub and fail to boot.

I tried to create a new RAM disk from the master branch of disk-imagebuild ( f389b3a04d75a35bb99ac68d13eaaa634cba7650 ) with the following command:
./bin/ramdisk-image-create -o deploy.ramdisk --ramdisk-element dracut-ramdisk ironic-agent centos7
And I get the same issue.

How reproducible:

Use "nova boot" to deploy a bare-metal node on this server.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Gonéri Le Bouder 2016-01-02 12:25 EST
Created attachment 1111056 [details]
definition of the RAID1 volume
Comment 3 Gonéri Le Bouder 2016-01-02 12:26 EST
Created attachment 1111057 [details]
RAID1 disk definition
Comment 4 Gonéri Le Bouder 2016-01-02 12:27 EST
Created attachment 1111058 [details]
(partial) list of disk
Comment 5 Gonéri Le Bouder 2016-01-02 12:28 EST
Created attachment 1111059 [details]
final error screen
Comment 6 Gonéri Le Bouder 2016-01-02 14:38 EST
Created attachment 1111074 [details]
devices as seen fron dracut mini shell
Comment 7 Gonéri Le Bouder 2016-01-02 17:29:44 EST
*** Bug 1295131 has been marked as a duplicate of this bug. ***
Comment 8 Gonéri Le Bouder 2016-01-03 02:06:15 EST
I managed to get my nodes deployed:

 - I totally drop all Virtual drives and I only do basic JBOD
   AND
 - I use a ramdisk build from the current diskimage-build master branch

In this case Ironic happily pick the first hard drive. Sadly, this means I've to scarify one of the SSD. The SATA disks dedicated for the operating system are at the end of the list and sda is one of the SSD.
Comment 9 Randy Perryman 2016-01-06 11:55:17 EST
Looking through the bug items to check:
1. your R1 is probably at /dev/sdm or sdl not sda.  The volume will come last(I do not know why)
2. 200GB SSD will that handle the partition scheme?
Comment 10 Gonéri Le Bouder 2016-01-06 13:13:11 EST
1. Yes, I think so.
2. Yes it does.

A easy solution if it's acceptable to lose the RAID1 volume for the system is to swap/reorder the SSD and SATA disks.
Comment 11 Chris Dearborn 2016-01-19 19:22:05 EST
By default, Ironic installs the OS on the first disk of size >= 4 gigs.  As a result, it's trying to install the OS on sda, when it should be installing on sdm or sdl.  I'm currently working on a patch to JS 5.0 that will tell ironic which drive to install on.
Comment 12 Lucas Alvares Gomes 2016-10-03 11:52:11 EDT
(In reply to Chris Dearborn from comment #11)
> By default, Ironic installs the OS on the first disk of size >= 4 gigs.  As
> a result, it's trying to install the OS on sda, when it should be installing
> on sdm or sdl.  I'm currently working on a patch to JS 5.0 that will tell
> ironic which drive to install on.

Hi Chris, Goneri,

We do have a mechanism in place to tell Ironic what disk to pick when deploying the node it's called "root device hints" [0], can you please try it out and see if it works for you ?

[0] http://docs.openstack.org/project-install-guide/baremetal/draft/advanced.html#specifying-the-disk-for-deployment-root-device-hints
Comment 13 Chris Dearborn 2016-10-05 18:05:09 EDT
Hey Lucas,

so since this defect was created, we have started using root device hints and have gotten things working.  At the time, the only way to identify the OS disk was by size (most of the other hints are not supported by the iDRAC), so we have a pretty ugly hack in place: when we create the OS RAID volume, we make sure to create it with a unique size.  Since then, device name was added to root device hints (thanks for that!).  We will be switching our RAID creation over to use the new Ironic RAID API, and when we do that, we'll look at switching from using size to using the device name for root device.

The end result is that I believe you can close this bug.
Comment 14 Lucas Alvares Gomes 2016-10-06 09:02:32 EDT
(In reply to Chris Dearborn from comment #13)
> Hey Lucas,
> 
> so since this defect was created, we have started using root device hints
> and have gotten things working.  At the time, the only way to identify the
> OS disk was by size (most of the other hints are not supported by the
> iDRAC), so we have a pretty ugly hack in place: when we create the OS RAID
> volume, we make sure to create it with a unique size.  Since then, device
> name was added to root device hints (thanks for that!).  We will be
> switching our RAID creation over to use the new Ironic RAID API, and when we
> do that, we'll look at switching from using size to using the device name
> for root device.
> 
> The end result is that I believe you can close this bug.

Thanks for the reply Chris!

Yes I totally agree that root device hints was (and still is) a painful to use. It's too ossified because it only does exact matching of the values (like the exact size as you mentioned). There's some work going on in this cycle (ocata) to make it a bit more flexible [0], hope it will come handy!

[0] https://bugs.launchpad.net/ironic/+bug/1561137 

Cheers,
Lucas

Note You need to log in before you can comment on or make changes to this bug.