Bug 1253959 - Disk ordering on overcloud deployment and discovery kernel differs from installed RHEL 7.1 kernel
Summary: Disk ordering on overcloud deployment and discovery kernel differs from insta...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Dmitry Tantsur
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks: 1191185 1243520 1290377
TreeView+ depends on / blocked
 
Reported: 2015-08-16 04:24 UTC by Karthik Prabhakar
Modified: 2019-09-12 08:46 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-22 11:01:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Karthik Prabhakar 2015-08-16 04:24:11 UTC
Description of problem:
Discovery and deployment kernel order disks differently from the RHEL7.1 kernel that eventually gets installed.


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch

How reproducible:
Requires hardware consisting of multiple disk types (for e.g., SAS and SSD's)

Steps to Reproduce:
1. Deploy undercloud

2. Desired disk ordering (based on a standard RHEL7.1 kernel detecting SAS drives prior to SSD's:
/dev/sda (sas): root/boot
/dev/sdb...sdg (sas): ceph data disks
/sdk, sdl (ssd): ceph journal partitions

3. customize hieradata ceph.yaml with:
ceph::profile::params::osd_journal_size: 0
ceph::profile::params::osds:
    '/dev/sdb':
        journal: '/dev/disk/by-partlabel/jnl1'
    '/dev/sdc':
        journal: '/dev/disk/by-partlabel/jnl2'
    '/dev/sdd':
        journal: '/dev/disk/by-partlabel/jnl3'
    '/dev/sde':
        journal: '/dev/disk/by-partlabel/jnl4'
    '/dev/sdf':
        journal: '/dev/disk/by-partlabel/jnl5'
    '/dev/sdg':
        journal: '/dev/disk/by-partlabel/jnl6'
    '/dev/sdh':
        journal: '/dev/disk/by-partlabel/jnl7'
    '/dev/sdi':
        journal: '/dev/disk/by-partlabel/jnl8'
    '/dev/sdj':
        journal: '/dev/disk/by-partlabel/jnl9'

3. OSP-Director discovery & deploy kernel finds disks in a different order (orders ssd's before sas drives) from the eventually deployed RHEL7.1 kernel.

Actual results:
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph-disk list
WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying sgdisk; may not correctly identify ceph volumes with dmcrypt
/dev/sda other, unknown
/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, osd.24
/dev/sdc :
 /dev/sdc1 ceph data, prepared, cluster ceph, osd.4
/dev/sdd :
 /dev/sdd1 ceph data, prepared, cluster ceph, osd.10
/dev/sde :
 /dev/sde1 ceph data, prepared, cluster ceph, osd.7
/dev/sdf :
 /dev/sdf1 ceph data, prepared, cluster ceph, osd.13
/dev/sdg :
 /dev/sdg1 ceph data, active, cluster ceph, osd.26
/dev/sdh :
 /dev/sdh1 ceph data, active, cluster ceph, osd.16
/dev/sdi :
 /dev/sdi1 ceph data, active, cluster ceph, osd.0
/dev/sdj :
 /dev/sdj1 ceph data, active, cluster ceph, osd.21
/dev/sdk :
 /dev/sdk1 other, iso9660
 /dev/sdk2 other, xfs, mounted on /
/dev/sdl :
 /dev/sdl1 other
 /dev/sdl2 other
 /dev/sdl3 other
 /dev/sdl4 other
 /dev/sdl5 other
/dev/sdm other, unknown
/dev/sdn other, unknown
/dev/sdo other, unknown
/dev/sr0 other, udf
/dev/sr1 other, unknown

Expected results:

Discovery/deployment kernel find the disks in the same order as the installed RHEL7.1 kernel

Additional info:

Comment 3 chris alfonso 2015-08-18 18:39:51 UTC
Karthik, What is the net impact of the ordering change?

Comment 4 Dmitry Tantsur 2015-08-19 11:20:08 UTC
Is it somehow related to https://bugzilla.redhat.com/show_bug.cgi?id=1252437 ?

Comment 5 Karthik Prabhakar 2015-08-21 06:41:36 UTC
Yes, it might possibly be similar to BZ# 1252437.

This is not just an issue with Ceph deployments by OSP-D, but potentially on compute or controller nodes as well if they happen to have multiple disk controllers.

The impact is that the OS boot/root get installed on the wrong disk, leads to a sub-optimal deployed config. If the boot policy on the deployed node is set to boot from the disk desired for OS boot/root, then the node will not boot properly after deployment (& needs a manual reset of boot policy).

Comment 7 Karthik Prabhakar 2015-10-03 18:16:18 UTC
A fix would be to have Director use the Ironic root-device-hints blueprint implementation which now appears to be in Kilo:
http://specs.openstack.org/openstack/ironic-specs/specs/kilo-implemented/root-device-hints.html
https://blueprints.launchpad.net/ironic/+spec/root-device-hints

Comment 9 Dmitry Tantsur 2015-10-08 15:37:44 UTC
Hi and sorry for the delay. Yes, root device hints should be the answer in your case, at least as far as the OS device is concerned. Is there anything else we could do here to help you?

Comment 10 Karthik Prabhakar 2015-10-10 17:37:46 UTC
Any tips on the syntax for using root device hints with the overcloud image? According to BZ#1252437 the fix was in the 10-8 errata.

Comment 11 Dmitry Tantsur 2015-10-12 08:21:24 UTC
Looks like for your case you'll need to do

 ironic node-update UUID add properties/root_device='{"size": NNN}'

before dpeloy, where NNN is replaced with the size of the root device. You can also use vendor, model, and everything stated in http://specs.openstack.org/openstack/ironic-specs/specs/kilo-implemented/root-device-hints.html. I can't give a more specific example, as I don't know the exact properties that might be used in your case.

Note that BZ#1252437 was about introspection, not about deploy. Root device hints for introspection are not supported in OSPd7 as of now, so you might need to fix properties/local_gb if it's incorrect for you manually, e.g.

 ironic node-update UUID replace properties/local_gb=NNN

Hope that helps.

Comment 15 Mike Burns 2016-04-07 20:47:27 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 18 Dmitry Tantsur 2016-08-22 11:01:09 UTC
Hello!

It seems like using root device hints is the correct solution in this case. Ordering of disks in not guaranteed and should not be relied upon.


Note You need to log in before you can comment on or make changes to this bug.