Description of problem: After OSP13->OSP16.1 upgrade, openstack overcloud deploy always fails because of the error in "Check Keystone public endpoint status" task. By removing no_logs: true in /usr/share/ansible/roles/tripleo-keystone-resources/tasks/endpoints.yml, I captured the following error in tripleo-keystone-resources. ~~~ ... TASK [tripleo-keystone-resources : Check Keystone public endpoint status] ****** Sunday 13 September 2020 11:14:39 +0000 (0:00:05.000) 0:25:13.826 ****** ... failed: [undercloud] (item={'started': 1, 'finished': 0, 'ansible_job_id': '668760418547.976504', 'results_file': '/root/.ansible_async/668760418547.976504', 'changed': True, 'failed': False, 'tripleo_keystone_resources_data': {'key': 'cinderv3', 'value': {'endpoints': {'admin': 'http://172.17.1.25:8776/v3/%(tenant_id)s', 'internal': 'http://172.17.1.25:8776/v3/%(tenant_id)s', 'public': 'http://10.0.0.145:8776/v3/%(tenant_id)s'}, 'region': 'regionOne', 'service': 'volumev3', 'users': {'cinderv3': {'password': 'NAR9UexZ7u28rCGRuEEArtt7v', 'roles': ['admin', 'service']}}}}, 'ansible_loop_var': 'tripleo_keystone_resources_data'}) => {"ansible_job_id": "668760418547.976504", "ansible_loop_var": "tripleo_keystone_resources_endpoint_async_result_item", "attempts": 1, "changed": false, "finished": 1, "msg": "Multiple matches found for cinderv3", "tripleo_keystone_resources_endpoint_async_result_item": {"ansible_job_id": "668760418547.976504", "ansible_loop_var": "tripleo_keystone_resources_data", "changed": true, "failed": false, "finished": 0, "results_file": "/root/.ansible_async/668760418547.976504", "started": 1, "tripleo_keystone_resources_data": {"key": "cinderv3", "value": {"endpoints": {"admin": "http://172.17.1.25:8776/v3/%(tenant_id)s", "internal": "http://172.17.1.25:8776/v3/%(tenant_id)s", "public": "http://10.0.0.145:8776/v3/%(tenant_id)s"}, "region": "regionOne", "service": "volumev3", "users": {"cinderv3": {"password": "NAR9UexZ7u28rCGRuEEArtt7v", "roles": ["admin", "service"]}}}}}} ... NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* compute-0 : ok=256 changed=103 unreachable=0 failed=0 skipped=118 rescued=0 ignored=0 compute-1 : ok=252 changed=103 unreachable=0 failed=0 skipped=118 rescued=0 ignored=0 controller-0 : ok=317 changed=138 unreachable=0 failed=0 skipped=127 rescued=0 ignored=0 controller-1 : ok=306 changed=138 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0 controller-2 : ok=306 changed=138 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0 undercloud : ok=51 changed=16 unreachable=0 failed=1 skipped=44 rescued=0 ignored=0 Sunday 13 September 2020 11:14:42 +0000 (0:00:02.533) 0:25:16.360 ****** =============================================================================== Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Overcloud configuration failed. ~~~ Cinder v1 endpoint created during deployment of OSP13, while it is no longer created in OSP16.1 deployment. The problem is that it is not purged during upgrade and left in OSP16.1 deployment after upgrade, then results in the deployment failure. Fresh RHOSP13 deployment ~~~ (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list --service cinderv3 Multiple service matches found for 'cinderv3', use an ID to be more specific. (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinderv3 | 0232351c41fc41eb8dc281a01f27e20b | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.103:8776/v3/%(tenant_id)s | | 1cfa53ca39184f94a55a91ed6d3ae8af | regionOne | cinderv3 | volume | True | admin | http://172.17.1.103:8776/v3/%(tenant_id)s | | 5a6b1149cc4142d08e411e10743b8356 | regionOne | cinderv3 | volume | True | public | http://10.0.0.113:8776/v3/%(tenant_id)s | | 7aea0836ca984a6fb4dd2c55318dd8af | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.113:8776/v3/%(tenant_id)s | | c04ae1bae20b45e0b127749e75911907 | regionOne | cinderv3 | volume | True | internal | http://172.17.1.103:8776/v3/%(tenant_id)s | | e097455080374c5fad4ab96725e8ec8c | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.103:8776/v3/%(tenant_id)s | ~~~ Fresh RHOSP16.1 deployment ~~~ (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list --service cinderv3 +----------------------------------+-----------+--------------+--------------+---------+-----------+-------------------------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+--------------+---------+-----------+-------------------------------------------+ | 691ce1bdc9b343ebb8fee891bd2b9733 | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.49:8776/v3/%(tenant_id)s | | 905b33cf92454c7881c8c222b5c30059 | regionOne | cinderv3 | volumev3 | True | public | https://10.0.0.101:13776/v3/%(tenant_id)s | | e133d65395874e5fb2dd0ae53abd268e | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.49:8776/v3/%(tenant_id)s | +----------------------------------+-----------+--------------+--------------+---------+-----------+-------------------------------------------+ ~~~ RHOSP16.1 upgraded from RHOSP13 ~~~ (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list --service cinderv3 Multiple service matches found for 'cinderv3', use an ID to be more specific. (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinderv3 | 596146ea98914c69ab77b023ff3ade3e | regionOne | cinderv3 | volume | True | public | http://10.0.0.145:8776/v3/%(tenant_id)s | | 5e2429696b2143c8a9dbe7553fde9e1e | regionOne | cinderv3 | volume | True | admin | http://172.17.1.25:8776/v3/%(tenant_id)s | | 8eeb1f1780ba46769cd1112ac1899a46 | regionOne | cinderv3 | volume | True | internal | http://172.17.1.25:8776/v3/%(tenant_id)s | | 95dbaf739c3348cf8016ee50b53eea9a | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.145:8776/v3/%(tenant_id)s | | a5ca4c12a3f04ac5b11bd1c7334fc094 | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.25:8776/v3/%(tenant_id)s | | d98b138eba27410097ce333d6fb5a7cb | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.25:8776/v3/%(tenant_id)s | ~~~ Version-Release number of selected component (if applicable): (overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo | sort ansible-role-tripleo-modify-image-1.2.1-0.20200527233426.bc21900.el8ost.noarch ansible-tripleo-ipa-0.2.1-0.20200611104546.c22fc8d.el8ost.noarch ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch openstack-tripleo-common-11.3.3-0.20200611110657.f7715be.el8ost.noarch openstack-tripleo-common-containers-11.3.3-0.20200611110657.f7715be.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-0.20200616081539.396affd.el8ost.noarch openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch openstack-tripleo-puppet-elements-11.2.2-0.20200527003426.226ce95.el8ost.noarch openstack-tripleo-validations-11.3.2-0.20200611115253.08f469d.el8ost.noarch puppet-tripleo-11.5.0-0.20200616033428.8ff1c6a.el8ost.noarch python3-tripleoclient-12.3.2-0.20200615103427.6f877f6.el8ost.noarch python3-tripleoclient-heat-installer-12.3.2-0.20200615103427.6f877f6.el8ost.noarch python3-tripleo-common-11.3.3-0.20200611110657.f7715be.el8ost.noarch tripleo-ansible-0.5.1-0.20200611113659.34b8fcc.el8ost.noarch (overcloud) [stack@undercloud-0 ~]$ cat /etc/rhosp-release Red Hat OpenStack Platform release 16.1.1 GA (Train) How reproducible: Always Steps to Reproduce: 1. Upgrade RHOSP13 deployment to RHOSP16.1 2. run "openstack overcloud deploy" to update overcloud stack Actual results: deploy fails at "Check Keystone public endpoint status" task Expected results: deploy completes without failures Additional info:
I deleted cinder v1 api, but deploy still fails. ~~~ (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete 596146ea98914c69ab77b023ff3ade3e (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete 5e2429696b2143c8a9dbe7553fde9e1e (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete 8eeb1f1780ba46769cd1112ac1899a46 (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinderv3 | 95dbaf739c3348cf8016ee50b53eea9a | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.145:8776/v3/%(tenant_id)s | | a5ca4c12a3f04ac5b11bd1c7334fc094 | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.25:8776/v3/%(tenant_id)s | | d98b138eba27410097ce333d6fb5a7cb | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.25:8776/v3/%(tenant_id)s | ~~~ ~~~ TASK [tripleo-keystone-resources : Check Keystone public endpoint status] ****** Sunday 13 September 2020 12:45:24 +0000 (0:00:04.946) 0:25:56.754 ****** ... failed: [undercloud] (item={'started': 1, 'finished': 0, 'ansible_job_id': '737602304891.91428', 'results_file': '/root/.ansible_async/737602304891.91428', 'changed': True, 'failed': False, 'tripleo_keystone_resources_data': {'key': 'cinderv3', 'value': {'endpoints': {'admin': 'http://172.17.1.25:8776/v3/%(tenant_id)s', 'internal': 'http://172.17.1.25:8776/v3/%(tenant_id)s', 'public': 'http://10.0.0.145:8776/v3/%(tenant_id)s'}, 'region': 'regionOne', 'service': 'volumev3', 'users': {'cinderv3': {'password': 'NAR9UexZ7u28rCGRuEEArtt7v', 'roles': ['admin', 'service']}}}}, 'ansible_loop_var': 'tripleo_keystone_resources_data'}) => {"ansible_job_id": "737602304891.91428", "ansible_loop_var": "tripleo_keystone_resources_endpoint_async_result_item", "attempts": 1, "changed": false, "finished": 1, "msg": "Multiple matches found for cinderv3", "tripleo_keystone_resources_endpoint_async_result_item": {"ansible_job_id": "737602304891.91428", "ansible_loop_var": "tripleo_keystone_resources_data", "changed": true, "failed": false, "finished": 0, "results_file": "/root/.ansible_async/737602304891.91428", "started": 1, "tripleo_keystone_resources_data": {"key": "cinderv3", "value": {"endpoints": {"admin": "http://172.17.1.25:8776/v3/%(tenant_id)s", "internal": "http://172.17.1.25:8776/v3/%(tenant_id)s", "public": "http://10.0.0.145:8776/v3/%(tenant_id)s"}, "region": "regionOne", "service": "volumev3", "users": {"cinderv3": {"password": "NAR9UexZ7u28rCGRuEEArtt7v", "roles": ["admin", "service"]}}}}}} NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* compute-0 : ok=254 changed=103 unreachable=0 failed=0 skipped=119 rescued=0 ignored=0 compute-1 : ok=250 changed=103 unreachable=0 failed=0 skipped=119 rescued=0 ignored=0 controller-0 : ok=315 changed=138 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0 controller-1 : ok=304 changed=138 unreachable=0 failed=0 skipped=129 rescued=0 ignored=0 controller-2 : ok=304 changed=138 unreachable=0 failed=0 skipped=129 rescued=0 ignored=0 undercloud : ok=50 changed=16 unreachable=0 failed=1 skipped=44 rescued=0 ignored=0 Sunday 13 September 2020 12:45:27 +0000 (0:00:02.534) 0:25:59.288 ****** =============================================================================== Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Overcloud configuration failed. ~~~ I rerun deploy after deleting cinderv3 service with volume type, and this time deployment completes without failures. ~~~ (overcloud) [stack@undercloud-0 ~]$ openstack service list +----------------------------------+------------+----------------+ | ID | Name | Type | +----------------------------------+------------+----------------+ | 0712f3e20dfc40fb8d9b60aba776dee8 | heat-cfn | cloudformation | | 381864758ab54695a8ed42c7c25d91a2 | swift | object-store | | 52413dad9d374cc9a071de8291664f7b | keystone | identity | | 577b55c6e5394960b0b30497f97c012e | ceilometer | metering | | 634bb6d13e2440369f5c86082e2db3b0 | cinderv2 | volumev2 | | 8dff2d6971eb4e4ba8b346cae4aba181 | heat | orchestration | | 9238a83ed4aa4cf9a599c0f762e84946 | aodh | alarming | | a3d3f1f6992f4c1ca6f3271bfab1e71f | cinderv3 | volume | | aa0853b342b34f7585f33307ab8f4c66 | panko | event | | acd1aa6c6dd9475495dfafefdb69f106 | nova | compute | | b18244980c9541aea9ceb0714af43b35 | placement | placement | | c9183c0d5af148a89ddb79b8a7f5f54c | gnocchi | metric | | cecc0e92b7984e58b4b8cfd984bb9431 | neutron | network | | ec1b6b2da9f34e3181a9bc63689a5bb4 | cinderv3 | volumev3 | | f88125bfa1ae4bbc8db65007f506416b | glance | image | +----------------------------------+------------+----------------+ (overcloud) [stack@undercloud-0 ~]$ openstack service delete a3d3f1f6992f4c1ca6f3271bfab1e71f (overcloud) [stack@undercloud-0 ~]$ openstack service list +----------------------------------+------------+----------------+ | ID | Name | Type | +----------------------------------+------------+----------------+ | 0712f3e20dfc40fb8d9b60aba776dee8 | heat-cfn | cloudformation | | 381864758ab54695a8ed42c7c25d91a2 | swift | object-store | | 52413dad9d374cc9a071de8291664f7b | keystone | identity | | 577b55c6e5394960b0b30497f97c012e | ceilometer | metering | | 634bb6d13e2440369f5c86082e2db3b0 | cinderv2 | volumev2 | | 8dff2d6971eb4e4ba8b346cae4aba181 | heat | orchestration | | 9238a83ed4aa4cf9a599c0f762e84946 | aodh | alarming | | aa0853b342b34f7585f33307ab8f4c66 | panko | event | | acd1aa6c6dd9475495dfafefdb69f106 | nova | compute | | b18244980c9541aea9ceb0714af43b35 | placement | placement | | c9183c0d5af148a89ddb79b8a7f5f54c | gnocchi | metric | | cecc0e92b7984e58b4b8cfd984bb9431 | neutron | network | | ec1b6b2da9f34e3181a9bc63689a5bb4 | cinderv3 | volumev3 | | f88125bfa1ae4bbc8db65007f506416b | glance | image | +----------------------------------+------------+----------------+ ~~~
Takashi, When the system is broken, can you confirm whether or not cinder is actually running? Is it running on the public interface behind TLS maybe? Do you have any logs - or a system we can look at?
Hi Ade, > When the system is broken, can you confirm whether or not cinder is actually running? Unfortunately I didn't perform any volume operations at that time, but confirmed that cinder containers are all running without any failures. > Is it running on the public interface behind TLS maybe? No. As you can find in the output of endpoint list, I didn't enable TLS for public endpoints. I don't use TLS for internal endpoints, either. > Do you have any logs - or a system we can look at? I still have a deployment where I tested upgrade, but unfortunately most of logs are rotated and the issue was already fixed by removing endpoints and services with "volume" type. To reproduce the situation itself, I think we can manually create endpoints and services with volume type in fresh RHOSP16.1 deployment, as created in RHOSP13.
All of this is a side effect of something we had to introduce in OSP-13/queens. In order to work around a bug in tempest (the version pinned to test queens) we had to create a "volume" service using the cinderv3 endpoints (see bug #1472859, and patches [1] and [2]). [1] https://review.opendev.org/644550 [2] https://review.opendev.org/649084 Now that OSP-16/train is using a version of tempest that doesn't assume the presence of cinder's V1 API (the "volume" service), we end up with a stale "volume" service with corresponding endpoints. As Takashi-san noted, the fix involves deleting the stale catalog entries. I just became aware of this issue, and will start working on a fix so that this is handled automatically by the FFU process.
*** Bug 1856906 has been marked as a duplicate of this bug. ***
*** Bug 1882110 has been marked as a duplicate of this bug. ***
*** Bug 1853281 has been marked as a duplicate of this bug. ***
(In reply to Takashi Kajinami from comment #1) > I deleted cinder v1 api, but deploy still fails. > ~~~ > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete > 596146ea98914c69ab77b023ff3ade3e > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete > 5e2429696b2143c8a9dbe7553fde9e1e > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete > 8eeb1f1780ba46769cd1112ac1899a46 > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinderv3 > | 95dbaf739c3348cf8016ee50b53eea9a | regionOne | cinderv3 | volumev3 > | True | public | http://10.0.0.145:8776/v3/%(tenant_id)s | > | a5ca4c12a3f04ac5b11bd1c7334fc094 | regionOne | cinderv3 | volumev3 > | True | internal | http://172.17.1.25:8776/v3/%(tenant_id)s | > | d98b138eba27410097ce333d6fb5a7cb | regionOne | cinderv3 | volumev3 > | True | admin | http://172.17.1.25:8776/v3/%(tenant_id)s | > ~~~ Does it mean we have no workaround? We have a migration procedure that requires manual intervention and in case there is a workaround, we'd like to recommend it in case there is someone requesting FFU + migration.
(In reply to Jakub Libosvar from comment #9) > (In reply to Takashi Kajinami from comment #1) > > I deleted cinder v1 api, but deploy still fails. > > ~~~ > > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete > > 596146ea98914c69ab77b023ff3ade3e > > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete > > 5e2429696b2143c8a9dbe7553fde9e1e > > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint delete > > 8eeb1f1780ba46769cd1112ac1899a46 > > (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinderv3 > > | 95dbaf739c3348cf8016ee50b53eea9a | regionOne | cinderv3 | volumev3 > > | True | public | http://10.0.0.145:8776/v3/%(tenant_id)s | > > | a5ca4c12a3f04ac5b11bd1c7334fc094 | regionOne | cinderv3 | volumev3 > > | True | internal | http://172.17.1.25:8776/v3/%(tenant_id)s | > > | d98b138eba27410097ce333d6fb5a7cb | regionOne | cinderv3 | volumev3 > > | True | admin | http://172.17.1.25:8776/v3/%(tenant_id)s | > > ~~~ > > > Does it mean we have no workaround? We have a migration procedure that > requires manual intervention and in case there is a workaround, we'd like to > recommend it in case there is someone requesting FFU + migration. There is a available workaround which is to manually remove the following items from overcloud keystone after upgrade. 1. all endpoints for "volume" service type 2. the service with "volume" type According to the patch Alan proposed, I guess we need to remove only (2) and (1) will be removed automatically, but didn't confirm that removing only the volume service fixes the issue, as described in my comment 1.
That's correct. In fact, while developing the patch I discovered it's sufficient to just delete the "volume" service (step 2). Its endpoints are automatically deleted. Executing just this one command should be sufficient: (overcloud) [stack@undercloud-0 ~]$ openstack service delete volume Then run this and you should see the associated endpoints are gone: [stack@rhos-undercloud ~]$ openstack endpoint list | grep cinder | 435ef6e5b7064ac09d52f79012ace4c0 | regionOne | cinderv2 | volumev2 | True | public | http://192.168.24.14:8776/v2/%(tenant_id)s | | 45dd4494cd5d4f4bb8558ef3aee15710 | regionOne | cinderv3 | volumev3 | True | public | http://192.168.24.14:8776/v3/%(tenant_id)s | | 609d5234f6b74d719c916b497cbd3d0f | regionOne | cinderv2 | volumev2 | True | internal | http://192.168.24.14:8776/v2/%(tenant_id)s | | 654a9466ba5e41fd8a6b84dc033bb2d3 | regionOne | cinderv2 | volumev2 | True | admin | http://192.168.24.14:8776/v2/%(tenant_id)s | | a0385feccf254f84a798093b9ed6393a | regionOne | cinderv3 | volumev3 | True | internal | http://192.168.24.14:8776/v3/%(tenant_id)s | | dbf3c261a7e748d097dc9119f75b475c | regionOne | cinderv3 | volumev3 | True | admin | http://192.168.24.14:8776/v3/%(tenant_id)s |
Alan, Just to confirm, steps I should take to verify are: 1. Deploy OSP13, check that I start off with 3 versions of Cinder endpoints, similar to say this: (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinder | 4a8d29e364094703b7da86215bcb138a | regionOne | cinderv2 | volumev2 | True | internal | http://172.17.1.147:8776/v2/%(tenant_id)s | | 4ee73c2d15a14561a1bd4e57f0925dc4 | regionOne | cinderv3 | volume | True | public | http://10.0.0.108:8776/v3/%(tenant_id)s | | 63133b700dca466eae944271f7a3f825 | regionOne | cinderv3 | volume | True | internal | http://172.17.1.147:8776/v3/%(tenant_id)s | | 689255e4cbb0424f9fed31bda12cbe6d | regionOne | cinderv3 | volume | True | admin | http://172.17.1.147:8776/v3/%(tenant_id)s | | 68dc5070a9dd45ad9a37ae4c74523dd5 | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.147:8776/v3/%(tenant_id)s | | 943e91972d9943e190f4c5401f0eecbe | regionOne | cinderv2 | volumev2 | True | admin | http://172.17.1.147:8776/v2/%(tenant_id)s | | d8cdcccff4e548e7bb8d9cae3e48c4fa | regionOne | cinderv2 | volumev2 | True | public | http://10.0.0.108:8776/v2/%(tenant_id)s | | d9121c4b166a43159634c7e37073d309 | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.147:8776/v3/%(tenant_id)s | | ef746699f18f4f3eba6e4e62cf0e37eb | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.108:8776/v3/%(tenant_id)s | 2. Upgrade to OSP16.1 Here is where I'm a bit unsure, why I'm asking. After the upgrade I'm assuming I should still get "volume" endpoints correct? Thus I should manually issue the "openstack service delete volume" right? If I now re-list endpoints I shouldn't get any "volume" endpoints as they were removed. 3. And my final verification step, the crucial one than is, pending manual removal of "volume" endpoints after upgrading to 16.1 I should be able to complete, without errors, some sort of overcloud update on 16.1? Say change debug's setting from false to true or vise versa. If this is right I'll get working on it, if I misunderstood something or a step please correct me. Thanks
(In reply to Tzach Shefi from comment #24) > Just to confirm, steps I should take to verify are: > > 1. Deploy OSP13, check that I start off with 3 versions of Cinder endpoints, > similar to say this: Correct. You should see 3 cinder services and 9 endpoints (3 for each). > 2. Upgrade to OSP16.1 > Here is where I'm a bit unsure, why I'm asking. > After the upgrade I'm assuming I should still get "volume" endpoints > correct? > Thus I should manually issue the "openstack service delete volume" right? > > If I now re-list endpoints I shouldn't get any "volume" endpoints as they > were removed. No. What you described is the original (bad) behavior and the workaround. With the fix, after upgrading to 16.1 the "volume" and its three associated endpoints should no longer be present. The BZ fix automatically removes them. > 3. And my final verification step, the crucial one than is, > pending manual removal of "volume" endpoints after upgrading to 16.1 > I should be able to complete, without errors, some sort of overcloud update > on 16.1? > Say change debug's setting from false to true or vise versa. Sort of. After confirming the "volume" service and its endpoints were removed, you should be able to perform any stack update (even one that doesn't change any parameters). Here's an alternative sequence: 1. Install 13 2. FFU to 16.1.2 <=== z2! Confirm the "volume" service and its endpoints are still present 3. Perform a stack update and confirm it fails (this reproduces the bug). 4. Upgrade undercloud to 16.1.3 5. Repeat 3, and this time it should succeed. Also confirm the "volume" service and its endponts were removed.
Thanks Alan, During an attempted FFU verification run into a leapp bug https://bugzilla.redhat.com/show_bug.cgi?id=1900667 A waiting updates on leapp bz, also emailed rhos-upgrade/qe dep.
Ignore bz1900667, was my fault. Now stuck on a new upgrade bug: https://bugzilla.redhat.com/show_bug.cgi?id=1902628 Again emailed upgrades dfg, awaiting assistance.
Verified on: openstack-tripleo-heat-templates-11.3.2-1.20200914170176.el8ost.noarch Started with an OSP13 deployment 13 -p 2020-11-13.1 (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinder | 1ba6d3dcd6904c669ce46a4e839fbd02 | regionOne | cinderv2 | volumev2 | True | admin | http://172.17.1.98:8776/v2/%(tenant_id)s | | 1da848b4449d489fb36b1421e1e08f4a | regionOne | cinderv2 | volumev2 | True | public | http://10.0.0.107:8776/v2/%(tenant_id)s | | 213e3e6f702444a393400a7bab2409e3 | regionOne | cinderv3 | volume | True | internal | http://172.17.1.98:8776/v3/%(tenant_id)s | | 2f60909af1a64a78a22074663c0fc588 | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.107:8776/v3/%(tenant_id)s | | 3815901813fe40fa874cff57702c1db0 | regionOne | cinderv3 | volume | True | public | http://10.0.0.107:8776/v3/%(tenant_id)s | | 521895b698a6434a8cfbb2435917bba3 | regionOne | cinderv3 | volume | True | admin | http://172.17.1.98:8776/v3/%(tenant_id)s | | 74a8a9208cba4c9badf1158e20fe3847 | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.98:8776/v3/%(tenant_id)s | | d4648559b5ea4014805620c4c66481c5 | regionOne | cinderv2 | volumev2 | True | internal | http://172.17.1.98:8776/v2/%(tenant_id)s | | ed3689f6909a430d86c9d749ee0d634e | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.98:8776/v3/%(tenant_id)s | FFU to 16.1z3 passed this time without an issue, turns out having Babrican deployed caused bz1902628. Anyway after the FFU we still have same list of endpoints for Cinder v3 (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinder | 1ba6d3dcd6904c669ce46a4e839fbd02 | regionOne | cinderv2 | volumev2 | True | admin | http://172.17.1.98:8776/v2/%(tenant_id)s | | 1da848b4449d489fb36b1421e1e08f4a | regionOne | cinderv2 | volumev2 | True | public | http://10.0.0.107:8776/v2/%(tenant_id)s | | 213e3e6f702444a393400a7bab2409e3 | regionOne | cinderv3 | volume | True | internal | http://172.17.1.98:8776/v3/%(tenant_id)s | | 2f60909af1a64a78a22074663c0fc588 | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.107:8776/v3/%(tenant_id)s | | 3815901813fe40fa874cff57702c1db0 | regionOne | cinderv3 | volume | True | public | http://10.0.0.107:8776/v3/%(tenant_id)s | | 521895b698a6434a8cfbb2435917bba3 | regionOne | cinderv3 | volume | True | admin | http://172.17.1.98:8776/v3/%(tenant_id)s | | 74a8a9208cba4c9badf1158e20fe3847 | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.98:8776/v3/%(tenant_id)s | | d4648559b5ea4014805620c4c66481c5 | regionOne | cinderv2 | volumev2 | True | internal | http://172.17.1.98:8776/v2/%(tenant_id)s | | ed3689f6909a430d86c9d749ee0d634e | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.98:8776/v3/%(tenant_id)s | But as Alan pointed out in a chat session, his fix resides in a section of THT called "external_deploy_tasks" Which isn't executed during the FFU process, but rather during an overcloud update. I had cloned overcloud_upgrade_prepare.sh to overcloud_deploy16.1.sh And swapped the overcloud deploy "upgrade" with "deploy". ran it without any other changes and the expected result was achieved: +----------------------------------+-----------+--------------+----------------+---------+-----------+------------------------------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+----------------+---------+-----------+------------------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep cinder | 1ba6d3dcd6904c669ce46a4e839fbd02 | regionOne | cinderv2 | volumev2 | True | admin | http://172.17.1.98:8776/v2/%(tenant_id)s | | 1da848b4449d489fb36b1421e1e08f4a | regionOne | cinderv2 | volumev2 | True | public | http://10.0.0.107:8776/v2/%(tenant_id)s | | 2f60909af1a64a78a22074663c0fc588 | regionOne | cinderv3 | volumev3 | True | public | http://10.0.0.107:8776/v3/%(tenant_id)s | | 74a8a9208cba4c9badf1158e20fe3847 | regionOne | cinderv3 | volumev3 | True | internal | http://172.17.1.98:8776/v3/%(tenant_id)s | | d4648559b5ea4014805620c4c66481c5 | regionOne | cinderv2 | volumev2 | True | internal | http://172.17.1.98:8776/v2/%(tenant_id)s | | ed3689f6909a430d86c9d749ee0d634e | regionOne | cinderv3 | volumev3 | True | admin | http://172.17.1.98:8776/v3/%(tenant_id)s | This time as expected, no service points for Cinder v1 remain following an update of an upgraded OSP16.1z3.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413