Bug 1846557 - [upgrade] novacompute vs compute naming diff in 13 vs 16 results in redundant entries in 'openstack compute service list'
Summary: [upgrade] novacompute vs compute naming diff in 13 vs 16 results in redundant...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ga
: 16.1 (Train on RHEL 8.2)
Assignee: Dan Macpherson
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-11 22:17 UTC by John Fulton
Modified: 2020-09-07 15:05 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
There is a known issue when upgrading from RHOSP 13 to RHOSP 16.1. The value of `HostnameFormatDefault` has changed from `%stackname%-compute-%index%` to `%stackname%-novacompute-%index%`. This change in default value can result in duplicate service entries and have further impacts on operations such as live migration. + Workaround: If you upgrade from RHOSP 13 to RHOSP 16.1, you must override the `HostnameFormatDefault` value to configure the previous default value to ensure that the previous hostname format is retained. If you upgrade from RHOSP 15 or RHOSP 16.0, no action is required.
Clone Of:
Environment:
Last Closed: 2020-09-07 15:05:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 735863 0 None NEW WIP: persist hostname format in parameter_defaults 2020-10-28 00:37:59 UTC

Internal Links: 1876464

Description John Fulton 2020-06-11 22:17:02 UTC
While FFWD upgrading from 13 to 16.1 the step to upgrade compute-0 (openstack overcloud upgrade run --stack $STACK --limit $STACK-novacompute-0) finished and created a redundant entry in my list of available computes. 

I had the nova-compute hosts compute-0.localdomain and compute-1.localdomain but when the above step finished I got novacompute-0.localdomain. I predict when I do the same command to upgrade compute-1 that I'll then have four entries (novacompute-1.localdomain will be added) when I should really have two. 

I think this is more than a matter of appearances since nova might have confusion about what is running where and be unable to do actions to existing instances; e.g. break live migration during upgrades.


(osp-test-octopi-zorillas) [stack@osp-test-octopi-zorillas-undercloud ~]$ openstack compute service list
+--------------------------------------+------------------+----------------------------------------------------+----------+---------+-------+----------------------------+
| ID                                   | Binary           | Host                                               | Zone     | Status  | State | Updated At                 |
+--------------------------------------+------------------+----------------------------------------------------+----------+---------+-------+----------------------------+
| f9bebb07-77a6-4106-8234-fb82fd8f972c | nova-scheduler   | osp-test-octopi-zorillas-controller-1.localdomain  | internal | enabled | up    | 2020-06-11T21:57:54.000000 |
| 24cfae43-8169-4209-9ad5-f20184f7eef3 | nova-scheduler   | osp-test-octopi-zorillas-controller-2.localdomain  | internal | enabled | up    | 2020-06-11T21:58:00.000000 |
| 8357f8fa-09e3-4095-bf45-48577c17965c | nova-scheduler   | osp-test-octopi-zorillas-controller-0.localdomain  | internal | enabled | up    | 2020-06-11T21:57:54.000000 |
| 04bc62b2-b75f-4aea-82ba-8153a1a1d770 | nova-consoleauth | osp-test-octopi-zorillas-controller-1.localdomain  | internal | enabled | down  | 2020-06-11T12:00:49.000000 |
| 095a09d1-b022-49ab-9b57-03f8482bb426 | nova-consoleauth | osp-test-octopi-zorillas-controller-0.localdomain  | internal | enabled | down  | 2020-06-09T21:04:23.000000 |
| 3bd05935-00a5-46e4-be09-0ce6c522c8f6 | nova-consoleauth | osp-test-octopi-zorillas-controller-2.localdomain  | internal | enabled | down  | 2020-06-11T16:41:01.000000 |
| 033a61af-4b71-41f5-810c-ef5439437c34 | nova-conductor   | osp-test-octopi-zorillas-controller-1.localdomain  | internal | enabled | up    | 2020-06-11T21:57:59.000000 |
| da759763-4c67-4d50-9fe4-16edfbdb57ab | nova-conductor   | osp-test-octopi-zorillas-controller-2.localdomain  | internal | enabled | up    | 2020-06-11T21:57:59.000000 |
| 0a49739d-4114-4b80-8d63-a0a0c8df7434 | nova-conductor   | osp-test-octopi-zorillas-controller-0.localdomain  | internal | enabled | up    | 2020-06-11T21:57:54.000000 |
| 6e76b986-60af-40ad-bd70-12300caf08b1 | nova-compute     | osp-test-octopi-zorillas-compute-0.localdomain     | nova     | enabled | down  | 2020-06-11T19:52:28.000000 |
| cec1a217-2ae7-421c-9982-d1d2bf3ddb3a | nova-compute     | osp-test-octopi-zorillas-compute-1.localdomain     | nova     | enabled | up    | 2020-06-11T21:57:53.000000 |
| 0456e78c-f378-416d-b92f-53b7f55edf0f | nova-compute     | osp-test-octopi-zorillas-novacompute-0.localdomain | nova     | enabled | up    | 2020-06-11T21:57:57.000000 |
+--------------------------------------+------------------+----------------------------------------------------+----------+---------+-------+----------------------------+

Comment 1 Alex Schultz 2020-06-11 22:34:33 UTC
Bug 1563866 is back and will continue to be back until we stop down-streaming this change.

Comment 2 Lukas Bezdicka 2020-06-15 10:38:37 UTC
Can we instead of yet again donw-streaming the change make procedure to switch the name of compute and either do the removal in OSP13 or in FFWD 13->16. If we go with FFWD 13->16 we will need procedure for renaming both ways so we can support rollback.

Comment 3 Lukas Bezdicka 2020-06-15 13:04:48 UTC
We would like Compute team to propose way out of this rabbit hole.

Comment 4 Artom Lifshitz 2020-06-18 15:59:59 UTC
(In reply to Lukas Bezdicka from comment #3)
> We would like Compute team to propose way out of this rabbit hole.

I mean, don't rename the service? Sorry for sounding flippant, but what's the purpose/use case behind renaming the compute services?

Comment 5 Jesse Pretorius 2020-06-22 14:22:45 UTC
(In reply to Artom Lifshitz from comment #4)
> I mean, don't rename the service? Sorry for sounding flippant, but what's
> the purpose/use case behind renaming the compute services?

That is something that appears to have been done upstream between Mitaka and Newton. See https://bugzilla.redhat.com/show_bug.cgi?id=1365789 for more details. Ever since then there has been a downstream patch to prevent the rename during the upgrade process.

Comment 7 Stephen Finucane 2020-06-22 15:49:29 UTC
(In reply to Lukas Bezdicka from comment #3)
> We would like Compute team to propose way out of this rabbit hole.

It's not possible to change this value upstream since it will result in the same issue in reverse for everyone else. We would also have the same issue if we forward ported it since that patch doesn't appear to have been included in 15.0, meaning anyone trying to upgrade from 15.0 -> 16.1 will break.

Looking at that linked patch, it looks like this is configurable via the 'ComputeHostnameFormat' parameter. Forgive my ignorance, but couldn't this be overidden and set to the legacy value when doing a fast forward upgrade of an OSP 13.0 deployment? This would need to be done for any future upgrades to 17.1 etc. but it makes this a documentation issues and would prevent the need to carry the downstream patch for infinity. If that's not possible now, we'll have to make it so.

Comment 8 Jesse Pretorius 2020-06-23 09:30:02 UTC
I think this needs to be solved in two parts:

1. Document that the HostnameFormatDefault changes from '%stackname%-novacompute-%index%' to '%stackname%-compute-%index%' and how to (a) add the applicable configuration to maintain the previous naming standard or (b) what the effect is of not changing the default and how to clean up the old service names.
2. Implement a check and stop (unless --yes or something like that) if the current stack has the old naming convention and the to-be config will change it.

The first is essential for GA. The second is ideal to have for GA because it will prevent people making unexpected changes in their cluster.

Comment 16 Jesse Pretorius 2020-09-07 09:40:35 UTC
I've registered an RFE in https://bugzilla.redhat.com/show_bug.cgi?id=1876464 and will therefore convert this into a docs bug.

Comment 17 Dan Macpherson 2020-09-07 15:05:04 UTC
Closing this BZ as the docs component has already been implemented.


Note You need to log in before you can comment on or make changes to this bug.