While FFWD upgrading from 13 to 16.1 the step to upgrade compute-0 (openstack overcloud upgrade run --stack $STACK --limit $STACK-novacompute-0) finished and created a redundant entry in my list of available computes. I had the nova-compute hosts compute-0.localdomain and compute-1.localdomain but when the above step finished I got novacompute-0.localdomain. I predict when I do the same command to upgrade compute-1 that I'll then have four entries (novacompute-1.localdomain will be added) when I should really have two. I think this is more than a matter of appearances since nova might have confusion about what is running where and be unable to do actions to existing instances; e.g. break live migration during upgrades. (osp-test-octopi-zorillas) [stack@osp-test-octopi-zorillas-undercloud ~]$ openstack compute service list +--------------------------------------+------------------+----------------------------------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+------------------+----------------------------------------------------+----------+---------+-------+----------------------------+ | f9bebb07-77a6-4106-8234-fb82fd8f972c | nova-scheduler | osp-test-octopi-zorillas-controller-1.localdomain | internal | enabled | up | 2020-06-11T21:57:54.000000 | | 24cfae43-8169-4209-9ad5-f20184f7eef3 | nova-scheduler | osp-test-octopi-zorillas-controller-2.localdomain | internal | enabled | up | 2020-06-11T21:58:00.000000 | | 8357f8fa-09e3-4095-bf45-48577c17965c | nova-scheduler | osp-test-octopi-zorillas-controller-0.localdomain | internal | enabled | up | 2020-06-11T21:57:54.000000 | | 04bc62b2-b75f-4aea-82ba-8153a1a1d770 | nova-consoleauth | osp-test-octopi-zorillas-controller-1.localdomain | internal | enabled | down | 2020-06-11T12:00:49.000000 | | 095a09d1-b022-49ab-9b57-03f8482bb426 | nova-consoleauth | osp-test-octopi-zorillas-controller-0.localdomain | internal | enabled | down | 2020-06-09T21:04:23.000000 | | 3bd05935-00a5-46e4-be09-0ce6c522c8f6 | nova-consoleauth | osp-test-octopi-zorillas-controller-2.localdomain | internal | enabled | down | 2020-06-11T16:41:01.000000 | | 033a61af-4b71-41f5-810c-ef5439437c34 | nova-conductor | osp-test-octopi-zorillas-controller-1.localdomain | internal | enabled | up | 2020-06-11T21:57:59.000000 | | da759763-4c67-4d50-9fe4-16edfbdb57ab | nova-conductor | osp-test-octopi-zorillas-controller-2.localdomain | internal | enabled | up | 2020-06-11T21:57:59.000000 | | 0a49739d-4114-4b80-8d63-a0a0c8df7434 | nova-conductor | osp-test-octopi-zorillas-controller-0.localdomain | internal | enabled | up | 2020-06-11T21:57:54.000000 | | 6e76b986-60af-40ad-bd70-12300caf08b1 | nova-compute | osp-test-octopi-zorillas-compute-0.localdomain | nova | enabled | down | 2020-06-11T19:52:28.000000 | | cec1a217-2ae7-421c-9982-d1d2bf3ddb3a | nova-compute | osp-test-octopi-zorillas-compute-1.localdomain | nova | enabled | up | 2020-06-11T21:57:53.000000 | | 0456e78c-f378-416d-b92f-53b7f55edf0f | nova-compute | osp-test-octopi-zorillas-novacompute-0.localdomain | nova | enabled | up | 2020-06-11T21:57:57.000000 | +--------------------------------------+------------------+----------------------------------------------------+----------+---------+-------+----------------------------+
Bug 1563866 is back and will continue to be back until we stop down-streaming this change.
Can we instead of yet again donw-streaming the change make procedure to switch the name of compute and either do the removal in OSP13 or in FFWD 13->16. If we go with FFWD 13->16 we will need procedure for renaming both ways so we can support rollback.
We would like Compute team to propose way out of this rabbit hole.
(In reply to Lukas Bezdicka from comment #3) > We would like Compute team to propose way out of this rabbit hole. I mean, don't rename the service? Sorry for sounding flippant, but what's the purpose/use case behind renaming the compute services?
(In reply to Artom Lifshitz from comment #4) > I mean, don't rename the service? Sorry for sounding flippant, but what's > the purpose/use case behind renaming the compute services? That is something that appears to have been done upstream between Mitaka and Newton. See https://bugzilla.redhat.com/show_bug.cgi?id=1365789 for more details. Ever since then there has been a downstream patch to prevent the rename during the upgrade process.
(In reply to Lukas Bezdicka from comment #3) > We would like Compute team to propose way out of this rabbit hole. It's not possible to change this value upstream since it will result in the same issue in reverse for everyone else. We would also have the same issue if we forward ported it since that patch doesn't appear to have been included in 15.0, meaning anyone trying to upgrade from 15.0 -> 16.1 will break. Looking at that linked patch, it looks like this is configurable via the 'ComputeHostnameFormat' parameter. Forgive my ignorance, but couldn't this be overidden and set to the legacy value when doing a fast forward upgrade of an OSP 13.0 deployment? This would need to be done for any future upgrades to 17.1 etc. but it makes this a documentation issues and would prevent the need to carry the downstream patch for infinity. If that's not possible now, we'll have to make it so.
I think this needs to be solved in two parts: 1. Document that the HostnameFormatDefault changes from '%stackname%-novacompute-%index%' to '%stackname%-compute-%index%' and how to (a) add the applicable configuration to maintain the previous naming standard or (b) what the effect is of not changing the default and how to clean up the old service names. 2. Implement a check and stop (unless --yes or something like that) if the current stack has the old naming convention and the to-be config will change it. The first is essential for GA. The second is ideal to have for GA because it will prevent people making unexpected changes in their cluster.
I've registered an RFE in https://bugzilla.redhat.com/show_bug.cgi?id=1876464 and will therefore convert this into a docs bug.
Closing this BZ as the docs component has already been implemented.