Red Hat Bugzilla – Bug 1466045
[RFE] Better logic for the default root disk selection
Last modified: 2017-12-19 10:15:56 EST
Created attachment 1292701 [details]
tarball containing ironic discovery results
Description of problem:
Ironic introspection database reports incorrectly that root disk is /dev/nvme0n1 when it is /dev/sda.
Version-Release number of selected component (if applicable):
not sure yet.
Steps to Reproduce:
1. introspect cluster including Dell R610 servers (older model)
2. openstack baremetal node list
3. for each uuid: openstack baremetal introspection data save <uuid>
The 4 Dell R610 servers with NVM SSD show this, incorrectly, when I parse the output of command in step 3. Example:
[stack@gprfc052 introspect_dir]$ jq '.root_disk' c0778747-342a-4abe-9c20-212156615354.params
"model": "INTEL SSDPEDMD400G4",
It should show info for /dev/sda as root device.
I'll attach a tarball with the files containing the output of step 3 above.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Hi! Why do you think it should use /dev/sda? Have you used root device hints to point at it? Otherwise, ironic-inspector will choose the disk following its internal logic, that does not have to match your expectations.
Dmitry, sigh, you are correct. Introspection guesses at the smallest disk in the config as the root disk, in this case it was 400-GB /dev/nvme0n1. Real system disk was 1 TB and OSD data disks were 500 GB. Ouch. Furthermore if you do
# openstack baremetal introspection data save $uuid | jq '.extra.disk'
you see that it does not include /dev/nvme0n1 in this list, presumably because it thinks it is a system disk.
Thanks for your clarification. I think I understand why this happened, the question is: should it happen? How well does this system-disk-guessing heuristic work in practice? If it isn't correct almost all the time, what's the point of guessing the system disk if we frequently guess wrong? Guessing wrong will result in lots of failed deployments. We need to make this easier to get right the first time.
Another way these heuristics get us into trouble - Older Dell R610s had virtual CDROM devices and virtual flash devices that, if defined, would come out as smallest devices. Therefore RHOSP 7 (Havana?) would try to deploy to them. Of course no one needs these now, but older servers may still have them.
Maybe it would be better to leave the root_disk uninitialized, and force the user to provide it before a deployment could go forward. For example, the user knows the kind of hardware being deployed on and could provide hints like "choose a system disk that is rotational and is of size 1000 GB", and OOO could search introspection DB, and print each UUID + change the root_disk field where there was a match. This process only has to be done once, regardless of how many deployments are done on the same nodes, as long as the undercloud node survives.
A better guess would be smallest rotational device, I think, although that would have worked terribly in this config as well - it might work for most production-class hardware configurations, where OSD drives are usually bigger than internal system disk.
One clue is the number of drives of a given size. In this case, there is only 1 HDD drives that are 1 TB, and the rest of the HDDs are all 500 GB, this is a hint that the 500-GB drives are intended for something else besides system disk.
We need to better define the request before we can consider this for inclusion in Queens.
Renaming the RFE to what it really asks for.
To be honest, I think we should just make root device hints required for TripleO. Changing the defaults is breaking, and it will surely start a big flame war, as everyone has their own idea of defaults ;)
We could probably have several names strategies in IPA, and then a kernel command line option to pick one of them. That will allow us to keep the default as it is, while changing it for TripleO specifically.
That being said, I don't believe our team will have time to work on it in the near future.