Description of problem: If the root hint specified when deploying a node is not at the first boot device, IPMI cannot update the boot device (some IPMI drivers are like that apparently), and there is a bootable disk earlier in the sequence, the node will boot from the earlier-in-the-sequence disk and even if the overcloud image gets pushed to the correct root-hint (by WWN/HCTL/etc), what actually boots is another OS. This will cause the OC deploy to hang until the timeout, and will end in a faulty condition Version-Release number of selected component (if applicable): OSP11 - any version How reproducible: always Steps to Reproduce: 1. deploy on virt, with two disks, when the first has a working OS 2. point the deploy root hint to the second disk 3. try to deploy overcloud Actual results: OC deploy fails on timeout. Checking the nodes, I can see it's booted to the first disk, and if I mnaually mount the second disk, I can see it contains the correct image. The root hint worked, but the node booted from the first disk Expected results: If the original disks are cleaned upon deploy, the boot sequence can simply continue iterating over all disks until it hits the one that overcloud is deployed on. Additional info:
Currently undercloud.conf has clean_nodes = false by default. The goal would be: 1. Allow specifying wether to clean data + metadata, just metadata or nothing 2. Default to clean meatadata
Do I get it right that we need to set clean_node=True by default? I agree, but that's an RFE for sure. We can try fitting it into 12, as code-wise it's just one variable flip. But I'd prefer to get buy-in from the TripleO team, possibly via the ML. > Default to clean meatadata This is done. > Allow specifying wether to clean data + metadata, just metadata or nothing This can be done via hiera overrides.
Whether or not this is an RFE is a question of semantics, since a (more) reasonable default setting doesn't always need to be an RFE, and the reason is pretty much why the BZ was open. But that's not really a big deal atm, since we're talking about a corner case. What we really need here is a good documentation note regarding cleaning, which will take care of warning customers their disks are going to be cleaned even when the root hint doesn't point at them. We don't want to do destructive actions without a warning fair warning to the user.
Given the upstream CI condition, and the size of our backlog, this has to be moved to 13.
Removing from the release. Giving that Queens is about stabilization, this better be deferred.
Hi all, it seems that enabling cleaning by default meets resistance from both TripleO team and TripleO consumers - see thread http://lists.openstack.org/pipermail/openstack-dev/2018-April/129826.html. Thus, I'm closing this RFE in favour of a separate command to run cleaning: https://bugzilla.redhat.com/show_bug.cgi?id=1573790. We can re-evaluate it if the upstream situation changes.