Bug 1740325
| Summary: | [RFE] Provide tooling to remove Sahara prior to a 13-16 FFU | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Gregory Charot <gcharot> | ||||
| Component: | openstack-tripleo-heat-templates | Assignee: | Giulio Fidente <gfidente> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Luigi Toscano <ltoscano> | ||||
| Severity: | medium | Docs Contact: | Vlada Grosu <vgrosu> | ||||
| Priority: | medium | ||||||
| Version: | 16.0 (Train) | CC: | gfidente, hbrock, jfrancoa, jpretori, jslagle, kgilliga, lhh, ltoscano, mburns, mimccune, nlevinki, nwolf, shrjoshi, spower, tshefi, vgrosu | ||||
| Target Milestone: | Alpha | Keywords: | FutureFeature, TechPreview, TestOnly, Triaged | ||||
| Target Release: | 16.2 (Train on RHEL 8.4) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-1.20200914170169.el8ost openstack-tripleo-common-11.4.1-1.20200914165651.el8ost | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2009693 (view as bug list) | Environment: | |||||
| Last Closed: | 2021-09-15 07:07:46 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1791384, 2009693 | ||||||
| Attachments: |
|
||||||
|
Description
Gregory Charot
2019-08-12 16:40:12 UTC
The code in https://code.engineering.redhat.com/gerrit/#/c/194590/3/deployment/sahara/disable-sahara-engine.yaml is making the upgrade to fail with: TASK [remove cinder_backup init container on upgrade-scaleup to force re-init] *** Wednesday 26 August 2020 13:48:39 -0400 (0:00:00.175) 0:04:33.517 ****** TASK [tripleo-container-rm : include_tasks] ************************************ Wednesday 26 August 2020 13:48:40 -0400 (0:00:00.280) 0:04:33.797 ****** fatal: [controller-0]: FAILED! => {"reason": "Could not find or access '/var/lib/mistral/16cba9f9-7fc0-40c5-8598-5f684958137f/tripleo_['podman']_container_rm.yml' on the Ansible Controller."} fatal: [controller-1]: FAILED! => {"reason": "Could not find or access '/var/lib/mistral/16cba9f9-7fc0-40c5-8598-5f684958137f/tripleo_['podman']_container_rm.yml' on the Ansible Controller."} PLAY RECAP ********************************************************************* controller-0 : ok=56 changed=15 unreachable=0 failed=1 skipped=32 rescued=0 ignored=2 controller-1 : ok=55 changed=16 unreachable=0 failed=1 skipped=32 rescued=0 ignored=2 Wednesday 26 August 2020 13:48:40 -0400 (0:00:00.220) 0:04:34.017 ****** =============================================================================== This looks to be caused by this block of code: - name: Disable openstack-sahara-engine when: - step|int == 1 block: - name: Disable openstack-sahara-engine import_role: name: tripleo-container-stop vars: tripleo_containers_to_stop: - openstack-sahara-engine when: - sahara_engine_enabled|bool block: - name: Remove openstack-sahara-engine import_role: name: tripleo-container-rm vars: tripleo_containers_to_rm: - openstack-sahara-engine tripleo_container_cli: - podman when: - sahara_engine_enabled|bool File: https://code.engineering.redhat.com/gerrit/#/c/194590/3/deployment/sahara/disable-sahara-engine.yaml There is an error in this block as it contains another two blocks. When rendering the ansible code, it causes issues in the tripleo_container_rm role. Changing this code into: - name: Disable openstack-sahara-engine when: - step|int == 1 - sahara_engine_enabled|bool block: - name: Disable openstack-sahara-engine import_role: name: tripleo-container-stop vars: tripleo_containers_to_stop: - openstack-sahara-engine - name: Remove openstack-sahara-engine import_role: name: tripleo-container-rm vars: tripleo_containers_to_rm: - openstack-sahara-engine tripleo_container_cli: - podman And relaunching the upgrade step made the upgrade continue. So, the reason for the failure wasn't the block syntax but the tripleo_container_cli parameter. It was being set as a list:
tripleo_container_cli:
- podman
When the parameter is a single value. That is why we were seeing the /var/lib/mistral/16cba9f9-7fc0-40c5-8598-5f684958137f/tripleo_['podman']_container_rm.yml because tripleo_container_cli gets converted into ['podman'].
The solution is to convert tripleo_container_cli in deployment/sahara/disable-sahara-engine.yaml and deployment/sahara/disable-sahara-api.yaml into:
tripleo_container_cli: "podman"
Created attachment 1712906 [details] Related problem issue if Sahara isn't removed Adding related FYI, We should report/open a new bug, about OSP13 with Sahara installed getting stuck in FFU's overcloud controller upgrade[0]. If comment 8's change isn't implemented. Also a doc bz per "How address Sahara's removal during FFU" or at least an FFU release note about this. [0] If Sahara isn't removed on OSP13, this FFU step/command will fail: #openstack overcloud upgrade run --stack overcloud --limit controller-0,controller-1 tee oc-c1-upgrade-run.log tail oc-c1-upgrade-run.log TASK [tripleo-container-rm : include_tasks] ************************************ │······························································· Wednesday 26 August 2020 13:56:04 -0400 (0:00:00.289) 0:01:06.323 ****** │······························································· fatal: [controller-0]: FAILED! => {"reason": "Could not find or access '/var/lib/mistral/3dcc5a5c-046d-4765-92ad-bcd95c6e5cee/tripleo_['podman']_container_rm.yml' on the Ans│······························································· ible Controller."} │······························································· fatal: [controller-1]: FAILED! => {"reason": "Could not find or access '/var/lib/mistral/3dcc5a5c-046d-4765-92ad-bcd95c6e5cee/tripleo_['podman']_container_rm.yml' on the Ans│······························································· ible Controller."} │······························································· │······························································· PLAY RECAP ********************************************************************* │······························································· controller-0 : ok=56 changed=15 unreachable=0 failed=1 skipped=32 rescued=0 ignored=2 │······························································· controller-1 : ok=55 changed=15 unreachable=0 failed=1 skipped=32 rescued=0 ignored=2 │······························································· │······························································· Wednesday 26 August 2020 13:56:04 -0400 (0:00:00.240) 0:01:06.563 ****** │······························································· =============================================================================== │······························································· │······························································· Ansible failed, check log at /var/log/containers/mistral/package_update.log. │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun [-] Exception occured while running the command: RuntimeError: Update failed with: Ansible│······························································· failed, check log at /var/log/containers/mistral/package_update.log. │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun Traceback (most recent call last): │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun super(Command, self).run(parsed_args) │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun return super(Command, self).run(parsed_args) │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun return_code = self.take_action(parsed_args) or 0 │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_upgrade.py", line 238,│······························································· in take_action │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun priv_key=key) │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 1245, in run_update│······························································· _ansible_action │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun verbosity=verbosity, extra_vars=extra_vars) │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/package_update.py", line │······························································· 127, in update_ansible │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun raise RuntimeError('Update failed with: {}'.format(payload['message'])) │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun RuntimeError: Update failed with: Ansible failed, check log at /var/log/containers/mistral│······························································· /package_update.log. │······························································· 2020-08-26 13:56:05.145 529822 ERROR tripleoclient.v1.overcloud_upgrade.UpgradeRun ESC[00m │······························································· 2020-08-26 13:56:05.150 529822 ERROR openstack [-] Update failed with: Ansible failed, check log at /var/log/containers/mistral/package_update.log.: RuntimeError: Update fai│······························································· led with: Ansible failed, check log at /var/log/containers/mistral/package_update.log.ESC[00m │······························································· 2020-08-26 13:56:05.150 529822 INFO osc_lib.shell [-] END return value: 1ESC[00m (In reply to Tzach Shefi from comment #9) > Created attachment 1712906 [details] > Related problem issue if Sahara isn't removed > > Adding related FYI, > > We should report/open a new bug, about OSP13 with Sahara installed getting > stuck in FFU's overcloud controller upgrade[0]. > If comment 8's change isn't implemented. This bug is enough: the usage of the special environment to remove sahara and the related settings are tracked here. > > Also a doc bz per "How address Sahara's removal during FFU" or at least an > FFU release note about this. Right now users are prevented from upgrading when sahara is installed, and that's expected at this stage, until this feature is implemented. Workaround: they can still update the deployment on 13 without sahara before starting the upgrade to 16.1. Another important detail: after fixing the parameter and completing the upgrade process, there are no more sahara container around, but the endpoints still list the sahara ones: (overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep sahara | 21c372a4d389400985d2efff57defdf3 | regionOne | sahara | data-processing | True | public | http://10.0.0.141:8386/v1.1/%(tenant_id)s | | 84411bbc525e4c6fa518bf97e51f9e00 | regionOne | sahara | data-processing | True | admin | http://172.17.1.44:8386/v1.1/%(tenant_id)s | | fc1dbb69f0874c0d9e9dac7f7f953ba7 | regionOne | sahara | data-processing | True | internal | http://172.17.1.44:8386/v1.1/%(tenant_id)s | According to our records, this should be resolved by openstack-tripleo-heat-templates-11.3.2-0.20200616081539.396affd.el8ost. This build is available now. According to our records, this should be resolved by openstack-tripleo-common-11.4.1-1.20200914165651.el8ost. This build is available now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483 |