Because this is getting a bit confusing, here's a summary that I'll reproduce in the other affected BZs. BZ 1986406: Documentation for controller replacement BZ 1990034 (this BZ): Documentation for FFU with Ironic as the virt driver BZ 1977667: Nova fix to allow service deletion when a service has no associated compute nodes.
Thinking about this some more, I'm not sure it's a valid bug. Here's our original though process: During FFU [1], specifically after steps 3.iv and 3.v - let's call this "state_3_end" - we end up with the bootstrap controller upgraded, and all the compute services in hybrid mode (16.1 containers on RHEL 7). This can be a problem when running the Ironic virt driver. With Ironic, we're running nova-compute on the controllers, so the assumption is that in state_3_end we end up with the bootstrap controller fully upgraded, but the compute services on the other controllers still registered in the database with their old OSP 13 service version numbers. This would then break certain checks that we do, such as the NUMA live migration check. However, there are open questions and problems with that reasoning: - Does hybrid state even apply in the Ironic virt driver case, when the compute services are running on the controllers? - Why are we assuming that in state_3_end the compute services on the other controllers haven't been put in the hybrid state as well? - NUMA live migration checks don't apply for Ironic, thus making the whole thing moot. We're not aware of any similar checks that happen when the Ironic virt driver is in use. We need to talk to HardProv QE to validate the above questions before doing anything else. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#upgrading-controller-nodes-with-director-deployed-ceph-storage_upgrading-overcloud-standard