Bug 1990034

Summary: [Docs][FFU] controller replacement does not remove compute services before the node is deleted
Product: Red Hat OpenStack Reporter: Irina <igallagh>
Component: documentationAssignee: Irina <igallagh>
Status: CLOSED WORKSFORME QA Contact: Vlada Grosu <vgrosu>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: alifshit, aschultz, dasmith, eglynn, igallagh, jhakimra, jkreger, jpretori, kchamart, lbezdick, ltamagno, mschuppe, nova-maint, pbabbar, rdiwakar, rhos-docs, sbauza, sgordon, smooney, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: pbabbar: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1986406 Environment:
Last Closed: 2021-09-22 15:02:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1986406    
Bug Blocks:    

Comment 1 Artom Lifshitz 2021-08-04 17:04:57 UTC
Because this is getting a bit confusing, here's a summary that I'll reproduce in the other affected BZs.

BZ 1986406: Documentation for controller replacement

BZ 1990034 (this BZ): Documentation for FFU with Ironic as the virt driver

BZ 1977667: Nova fix to allow service deletion when a service has no associated compute nodes.

Comment 2 Artom Lifshitz 2021-08-05 14:32:15 UTC
Thinking about this some more, I'm not sure it's a valid bug.

Here's our original though process:

During FFU [1], specifically after steps 3.iv and 3.v - let's call this "state_3_end" - we end up with the bootstrap controller upgraded, and all the compute services in hybrid mode (16.1 containers on RHEL 7). This can be a problem when running the Ironic virt driver. With Ironic, we're running nova-compute on the controllers, so the assumption is that in state_3_end we end up with the bootstrap controller fully upgraded, but the compute services on the other controllers still registered in the database with their old OSP 13 service version numbers. This would then break certain checks that we do, such as the NUMA live migration check.

However, there are open questions and problems with that reasoning:

- Does hybrid state even apply in the Ironic virt driver case, when the compute services are running on the controllers?
- Why are we assuming that in state_3_end the compute services on the other controllers haven't been put in the hybrid state as well?
- NUMA live migration checks don't apply for Ironic, thus making the whole thing moot. We're not aware of any similar checks that happen when the Ironic virt driver is in use.

We need to talk to HardProv QE to validate the above questions before doing anything else.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#upgrading-controller-nodes-with-director-deployed-ceph-storage_upgrading-overcloud-standard