Bug 1253959

Summary: Disk ordering on overcloud deployment and discovery kernel differs from installed RHEL 7.1 kernel
Product: Red Hat OpenStack Reporter: Karthik Prabhakar <kprabhak>
Component: rhosp-directorAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED NOTABUG QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: chorn, dhill, dmesser, dtantsur, ealcaniz, gdrapeau, jdonohue, kamil.rogon, kprabhak, mburns, mcornea, mtessun, racedoro, rhel-osp-director-maint, vumrao
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-22 11:01:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1191185, 1243520, 1290377    

Description Karthik Prabhakar 2015-08-16 04:24:11 UTC
Description of problem:
Discovery and deployment kernel order disks differently from the RHEL7.1 kernel that eventually gets installed.


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch

How reproducible:
Requires hardware consisting of multiple disk types (for e.g., SAS and SSD's)

Steps to Reproduce:
1. Deploy undercloud

2. Desired disk ordering (based on a standard RHEL7.1 kernel detecting SAS drives prior to SSD's:
/dev/sda (sas): root/boot
/dev/sdb...sdg (sas): ceph data disks
/sdk, sdl (ssd): ceph journal partitions

3. customize hieradata ceph.yaml with:
ceph::profile::params::osd_journal_size: 0
ceph::profile::params::osds:
    '/dev/sdb':
        journal: '/dev/disk/by-partlabel/jnl1'
    '/dev/sdc':
        journal: '/dev/disk/by-partlabel/jnl2'
    '/dev/sdd':
        journal: '/dev/disk/by-partlabel/jnl3'
    '/dev/sde':
        journal: '/dev/disk/by-partlabel/jnl4'
    '/dev/sdf':
        journal: '/dev/disk/by-partlabel/jnl5'
    '/dev/sdg':
        journal: '/dev/disk/by-partlabel/jnl6'
    '/dev/sdh':
        journal: '/dev/disk/by-partlabel/jnl7'
    '/dev/sdi':
        journal: '/dev/disk/by-partlabel/jnl8'
    '/dev/sdj':
        journal: '/dev/disk/by-partlabel/jnl9'

3. OSP-Director discovery & deploy kernel finds disks in a different order (orders ssd's before sas drives) from the eventually deployed RHEL7.1 kernel.

Actual results:
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph-disk list
WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying sgdisk; may not correctly identify ceph volumes with dmcrypt
/dev/sda other, unknown
/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, osd.24
/dev/sdc :
 /dev/sdc1 ceph data, prepared, cluster ceph, osd.4
/dev/sdd :
 /dev/sdd1 ceph data, prepared, cluster ceph, osd.10
/dev/sde :
 /dev/sde1 ceph data, prepared, cluster ceph, osd.7
/dev/sdf :
 /dev/sdf1 ceph data, prepared, cluster ceph, osd.13
/dev/sdg :
 /dev/sdg1 ceph data, active, cluster ceph, osd.26
/dev/sdh :
 /dev/sdh1 ceph data, active, cluster ceph, osd.16
/dev/sdi :
 /dev/sdi1 ceph data, active, cluster ceph, osd.0
/dev/sdj :
 /dev/sdj1 ceph data, active, cluster ceph, osd.21
/dev/sdk :
 /dev/sdk1 other, iso9660
 /dev/sdk2 other, xfs, mounted on /
/dev/sdl :
 /dev/sdl1 other
 /dev/sdl2 other
 /dev/sdl3 other
 /dev/sdl4 other
 /dev/sdl5 other
/dev/sdm other, unknown
/dev/sdn other, unknown
/dev/sdo other, unknown
/dev/sr0 other, udf
/dev/sr1 other, unknown

Expected results:

Discovery/deployment kernel find the disks in the same order as the installed RHEL7.1 kernel

Additional info:

Comment 3 chris alfonso 2015-08-18 18:39:51 UTC
Karthik, What is the net impact of the ordering change?

Comment 4 Dmitry Tantsur 2015-08-19 11:20:08 UTC
Is it somehow related to https://bugzilla.redhat.com/show_bug.cgi?id=1252437 ?

Comment 5 Karthik Prabhakar 2015-08-21 06:41:36 UTC
Yes, it might possibly be similar to BZ# 1252437.

This is not just an issue with Ceph deployments by OSP-D, but potentially on compute or controller nodes as well if they happen to have multiple disk controllers.

The impact is that the OS boot/root get installed on the wrong disk, leads to a sub-optimal deployed config. If the boot policy on the deployed node is set to boot from the disk desired for OS boot/root, then the node will not boot properly after deployment (& needs a manual reset of boot policy).

Comment 7 Karthik Prabhakar 2015-10-03 18:16:18 UTC
A fix would be to have Director use the Ironic root-device-hints blueprint implementation which now appears to be in Kilo:
http://specs.openstack.org/openstack/ironic-specs/specs/kilo-implemented/root-device-hints.html
https://blueprints.launchpad.net/ironic/+spec/root-device-hints

Comment 9 Dmitry Tantsur 2015-10-08 15:37:44 UTC
Hi and sorry for the delay. Yes, root device hints should be the answer in your case, at least as far as the OS device is concerned. Is there anything else we could do here to help you?

Comment 10 Karthik Prabhakar 2015-10-10 17:37:46 UTC
Any tips on the syntax for using root device hints with the overcloud image? According to BZ#1252437 the fix was in the 10-8 errata.

Comment 11 Dmitry Tantsur 2015-10-12 08:21:24 UTC
Looks like for your case you'll need to do

 ironic node-update UUID add properties/root_device='{"size": NNN}'

before dpeloy, where NNN is replaced with the size of the root device. You can also use vendor, model, and everything stated in http://specs.openstack.org/openstack/ironic-specs/specs/kilo-implemented/root-device-hints.html. I can't give a more specific example, as I don't know the exact properties that might be used in your case.

Note that BZ#1252437 was about introspection, not about deploy. Root device hints for introspection are not supported in OSPd7 as of now, so you might need to fix properties/local_gb if it's incorrect for you manually, e.g.

 ironic node-update UUID replace properties/local_gb=NNN

Hope that helps.

Comment 15 Mike Burns 2016-04-07 20:47:27 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 18 Dmitry Tantsur 2016-08-22 11:01:09 UTC
Hello!

It seems like using root device hints is the correct solution in this case. Ordering of disks in not guaranteed and should not be relied upon.