Created attachment 1292701 [details] tarball containing ironic discovery results Description of problem: Ironic introspection database reports incorrectly that root disk is /dev/nvme0n1 when it is /dev/sda. Version-Release number of selected component (if applicable): python-ironicclient-1.11.1-1.el7ost.noarch openstack-ironic-api-7.0.1-1.el7ost.noarch python-ironic-lib-2.5.2-1.el7ost.noarch openstack-ironic-common-7.0.1-1.el7ost.noarch python-ironic-inspector-client-1.11.0-1.el7ost.noarch puppet-ironic-10.4.0-3.el7ost.noarch openstack-ironic-conductor-7.0.1-1.el7ost.noarch How reproducible: not sure yet. Steps to Reproduce: 1. introspect cluster including Dell R610 servers (older model) 2. openstack baremetal node list 3. for each uuid: openstack baremetal introspection data save <uuid> Actual results: The 4 Dell R610 servers with NVM SSD show this, incorrectly, when I parse the output of command in step 3. Example: [stack@gprfc052 introspect_dir]$ jq '.root_disk' c0778747-342a-4abe-9c20-212156615354.params { "size": 400088457216, "serial": "PHFT548200SB400BGN", "rotational": false, "vendor": null, "name": "/dev/nvme0n1", "wwn_vendor_extension": null, "hctl": null, "wwn_with_extension": null, "model": "INTEL SSDPEDMD400G4", "wwn": null } Expected results: It should show info for /dev/sda as root device. Additional info: I'll attach a tarball with the files containing the output of step 3 above.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Hi! Why do you think it should use /dev/sda? Have you used root device hints to point at it? Otherwise, ironic-inspector will choose the disk following its internal logic, that does not have to match your expectations.
Dmitry, sigh, you are correct. Introspection guesses at the smallest disk in the config as the root disk, in this case it was 400-GB /dev/nvme0n1. Real system disk was 1 TB and OSD data disks were 500 GB. Ouch. Furthermore if you do # openstack baremetal introspection data save $uuid | jq '.extra.disk' you see that it does not include /dev/nvme0n1 in this list, presumably because it thinks it is a system disk. Thanks for your clarification. I think I understand why this happened, the question is: should it happen? How well does this system-disk-guessing heuristic work in practice? If it isn't correct almost all the time, what's the point of guessing the system disk if we frequently guess wrong? Guessing wrong will result in lots of failed deployments. We need to make this easier to get right the first time. Another way these heuristics get us into trouble - Older Dell R610s had virtual CDROM devices and virtual flash devices that, if defined, would come out as smallest devices. Therefore RHOSP 7 (Havana?) would try to deploy to them. Of course no one needs these now, but older servers may still have them. Maybe it would be better to leave the root_disk uninitialized, and force the user to provide it before a deployment could go forward. For example, the user knows the kind of hardware being deployed on and could provide hints like "choose a system disk that is rotational and is of size 1000 GB", and OOO could search introspection DB, and print each UUID + change the root_disk field where there was a match. This process only has to be done once, regardless of how many deployments are done on the same nodes, as long as the undercloud node survives. A better guess would be smallest rotational device, I think, although that would have worked terribly in this config as well - it might work for most production-class hardware configurations, where OSD drives are usually bigger than internal system disk. One clue is the number of drives of a given size. In this case, there is only 1 HDD drives that are 1 TB, and the rest of the HDDs are all 500 GB, this is a hint that the 500-GB drives are intended for something else besides system disk. Opinions?
We need to better define the request before we can consider this for inclusion in Queens.
Renaming the RFE to what it really asks for. To be honest, I think we should just make root device hints required for TripleO. Changing the defaults is breaking, and it will surely start a big flame war, as everyone has their own idea of defaults ;) We could probably have several names strategies in IPA, and then a kernel command line option to pick one of them. That will allow us to keep the default as it is, while changing it for TripleO specifically. That being said, I don't believe our team will have time to work on it in the near future.
As for defining the request, it's just common sense that you don't put the operating system on the most expensive, high-performance storage device in the system, preventing it from being used for anything else. Who would object to that?
Hi Dmitry, Any update here?
Sai Malleni thought mandatory root device hint was an acceptable solution, I could live with that. This problem will continue - for example, if your server has both PCI NVM SSD cards and NVDIMM-N pmem modules, Ironic would choose the smaller /dev/pmem0 device probably, which would be a mistake.
It's an RFE, so there'll be any update when it gets target somewhere. Given the current priorities, it's unlikely to happen any time soon. Ben, we can consider excluding pmem devices explicitly if there are not chances anybody would use them for deployment. Please file a separate bug if you believe it's the case.
given a significant reduction in capacity within the team and age of this RFE, closing wontfix please open a new rfe with updated requirements should there remain a customer need for future feature