Bug 1261886 - [OpenStack Director] Deployment fails due to virtual media attached to host
[OpenStack Director] Deployment fails due to virtual media attached to host
Status: CLOSED WONTFIX
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
7.0 (Kilo)
All Linux
high Severity medium
: ---
: 10.0 (Newton)
Assigned To: Lucas Alvares Gomes
Shai Revivo
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-10 07:52 EDT by Joe Talerico
Modified: 2016-10-14 12:28 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-14 12:28:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe Talerico 2015-09-10 07:52:04 EDT
Description of problem:
When attempting to deploy we had 4 hosts drop into dracut for what looked to be storage issues. It seems that the deployment defaults to whatever sda/vda is to install on, in this case, it was virtual media that was attached to the host at one point. 

How reproducible:
100%


Steps to Reproduce:
1. Hosts have to have a ironic node-update <node-uuid> add properties/root_device='{"key": "value"}'virtual drive attached (we saw this with Dell)
2. Run through a OSPD Deployment

Actual results:
Host drops into dracut.

Expected results:
1) Present the user information about the failure (dump the vendor/model of the storage device -- focusing on Storage, but there might be other useful information to present.)

2) Preferably determine a way to build some intelligence into which disk the installer chooses to install on. 

Additional info:
We attempted to provide hints on where to install the media to using `ironic node-update <node-uuid> add properties/root_device='{"model": "ata"}' however the deployment would only stall at that point.
Comment 4 Mike Burns 2016-04-07 16:50:54 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 6 Lucas Alvares Gomes 2016-05-19 06:20:32 EDT
Hi Joe,

(In reply to Joe Talerico from comment #0)
> Description of problem:
> When attempting to deploy we had 4 hosts drop into dracut for what looked to
> be storage issues. It seems that the deployment defaults to whatever sda/vda
> is to install on, in this case, it was virtual media that was attached to
> the host at one point. 
> 
> How reproducible:
> 100%
> 
> 
> Steps to Reproduce:
> 1. Hosts have to have a ironic node-update <node-uuid> add
> properties/root_device='{"key": "value"}'virtual drive attached (we saw this
> with Dell)
> 2. Run through a OSPD Deployment
> 

Not sure if I follow, so the root_device was pointing to the virtual media device? 

> Actual results:
> Host drops into dracut.
> 
> Expected results:
> 1) Present the user information about the failure (dump the vendor/model of
> the storage device -- focusing on Storage, but there might be other useful
> information to present.)
> 
> 2) Preferably determine a way to build some intelligence into which disk the
> installer chooses to install on. 
> 
> Additional info:
> We attempted to provide hints on where to install the media to using `ironic
> node-update <node-uuid> add properties/root_device='{"model": "ata"}'
> however the deployment would only stall at that point.
Comment 7 Joe Talerico 2016-05-19 19:25:50 EDT
Lucas from what we saw, yes. It was pointing to the Dell Virtual media. This was in a team members lab, I will add him as the NEEDINFO so he can provide more information if needed.
Comment 8 Ben England 2016-05-30 18:58:29 EDT
So we were using an older Dell DRAC which has a virtual CDROM and a virtual flash (not kidding), and if you don't disable these, they get discovered before the real SCSI devices that you want it to use.    Someone had left those enabled from a previous test run.   So the system disk in this case that we wanted to use was /dev/sdc instead of /dev/sda, for example.   I think virtually all x86_64 servers have similar functionality in the BIOS.  

Linux has been very clear about this for decades -- you CANNOT DEPEND ON DEVICE NAMES TO BE STABLE or MEANINGFUL, full stop, device names are just determined by order of discovery.    And the OpenStack deployment in the yaml files specifies device names only, with no other way to identify correct devices, if I recall correctly.

To identify the correct device target for OSP install, it would be more useful to search for a candidate by its stable attributes, such as size, whether or not it is removable or non-rotational, and have some sane defaults for this.     You could have a choice rule such as "choose the smallest device > 30 GB that is non-removable and rotational" and a fallback choice such as "choose the smallest device > 30 GB that is non-removable".  Or the yaml file could let the user specify that introspection should filter out devices with certain strings like "DRAC" or "CDROM" in the vendor or model name.  This would allow you to choose the right device more often with less intervention (not having to go into the BIOS on all the systems and change their configuration).  For examples of fields that would be stable regardless of order of discovery see: 

/sys/block/sd[a-z]*/{removable,size} 
/sys/block/sd[a-z]*/device/{model,vendor,rotational}

BTW the rotational field is not always accurate - a MegaRAID controller may pass a non-rotational SSD device as a "Logical Drive" that the controller indicates is rotational (uggh).   But NVM SSDs always show up as non-rotational.

Also information about the discovered values of these attributes and the device selected by above rules should be displayed/logged in summary form, so that a OpenStack sysadmin can see what's going wrong without debugging the install like we did.  The user could then retry introspection with improved filters.
Comment 9 Dmitry Tantsur 2016-10-14 12:28:19 EDT
Hi! Since move to IPA, we've changed the logic to detect the default device. However, one should not rely on it. You have to use root device hints, if you have more than one disk device, no matter of which nature.

As to better diagnostic, it would be awesome, but to my best knowledge, IPMI does not expose virtual media information, and the Drac driver does not support it either. So we can only know it when we fail.

Now, it might be interesting to have something like "deployment summary" before the actual deployment, but it's going to be a separate and pretty big RFE.

Note You need to log in before you can comment on or make changes to this bug.