Description of problem:
We are trying to upgrade a RHOSP 10 HCI deployment to 13 using the FFU process.
We are currently at step [1];
Here's the failure that is noticed while running the command:
(undercloud) [stack@undercloud ~]$ openstack overcloud upgrade run --roles ComputeHCI | tee ~/logs/computeHCI
Waiting for messages on queue 'tripleo' with no timeout.
Started Mistral Workflow tripleo.package_update.v1.update_nodes. Execution ID: 811f7209-5fef-44b3-84ba-7aeff5deac5c
Using /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.cfg as config file
PLAY [overcloud] ***************************************************************
TASK [Gathering Facts] *********************************************************
Thursday 13 June 2019 00:08:35 -0400 (0:00:00.136) 0:00:00.136 *********
ok: [overcloud-computehci-1]
ok: [overcloud-computehci-0]
ok: [overcloud-computehci-2]
TASK [include] *****************************************************************
Thursday 13 June 2019 00:08:41 -0400 (0:00:06.490) 0:00:06.626 *********
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
TASK [include] *****************************************************************
Thursday 13 June 2019 00:08:42 -0400 (0:00:00.932) 0:00:07.559 *********
skipping: [overcloud-computehci-2] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [overcloud-computehci-0] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [overcloud-computehci-1] => {"changed": false, "skip_reason": "Conditional result was False"}
TASK [include] *****************************************************************
Thursday 13 June 2019 00:08:43 -0400 (0:00:00.463) 0:00:08.023 *********
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ComputeHCI/upgrade_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
TASK [Check legacy Ceph hieradata] *********************************************
Thursday 13 June 2019 00:08:44 -0400 (0:00:01.021) 0:00:09.045 *********
fatal: [overcloud-computehci-2]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.118111", "end": "2019-06-13 05:28:15.337381", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.219270", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [overcloud-computehci-0]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.119234", "end": "2019-06-13 05:28:15.344743", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.225509", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [overcloud-computehci-1]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.124309", "end": "2019-06-13 05:28:15.473064", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.348755", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
PLAY RECAP *********************************************************************
overcloud-computehci-0 : ok=8 changed=0 unreachable=0 failed=1
overcloud-computehci-1 : ok=8 changed=0 unreachable=0 failed=1
overcloud-computehci-2 : ok=8 changed=0 unreachable=0 failed=1
Thursday 13 June 2019 00:Update failed with: Ansible failed, check log at /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.log.
08:44 -0400 (0:00:00.813) 0:00:09.859 *********
===============================================================================
Ansible failed, check log at /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.log.
For us the ffwd-upgrade prepare and ffwd-upgrade run commands have been completed successfully and the controller nodes have been upgraded to RHOSP 13.
Here's the ceph-ansible template that we used with ffwd-upgrade prepare command to update the plan:
(undercloud) [stack@undercloud ~]$ cat 13-templates/storage-environment.yaml
# ******************************************************************************
# This file will not enable the deployment of Ceph in future releases.
# Use ./ceph-ansible/ceph-ansible.yaml for this purpose instead.
# ******************************************************************************
## A Heat environment file which can be used to set up storage
## backends. Defaults to Ceph used as a backend for Cinder, Glance and
## Nova ephemeral storage.
resource_registry:
OS::TripleO::Services::CephMgr: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-mgr.yaml
OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-mon.yaml
OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-osd.yaml
OS::TripleO::Services::CephClient: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-client.yaml
parameter_defaults:
#### BACKEND SELECTION ####
## Whether to enable iscsi backend for Cinder.
CinderEnableIscsiBackend: false
## Whether to enable rbd (Ceph) backend for Cinder.
CinderEnableRbdBackend: true
## Cinder Backup backend can be either 'ceph', 'swift' or 'nfs'.
CinderBackupBackend: ceph
## Whether to enable NFS backend for Cinder.
# CinderEnableNfsBackend: false
## Whether to enable rbd (Ceph) backend for Nova ephemeral storage.
NovaEnableRbdBackend: true
## Glance backend can be either 'rbd' (Ceph), 'swift' or 'file'.
GlanceBackend: rbd
## Gnocchi backend can be either 'rbd' (Ceph), 'swift' or 'file'.
GnocchiBackend: rbd
ExtraConfig: {}
CephPoolDefaultPgNum: 32
CephAnsibleDisksConfig:
devices:
- /dev/sdb
- /dev/sdc
- /dev/sdd
Here's the template used with original deployment:
(undercloud) [stack@undercloud ~]$ cat templates/storage-environment.yaml
## A Heat environment file which can be used to set up storage
## backends. Defaults to Ceph used as a backend for Cinder, Glance and
## Nova ephemeral storage.
resource_registry:
OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-mon.yaml
OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-osd.yaml
OS::TripleO::Services::CephClient: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-client.yaml
parameter_defaults:
CinderEnableIscsiBackend: false
CinderEnableRbdBackend: true
CinderBackupBackend: ceph
NovaEnableRbdBackend: true
GlanceBackend: rbd
GnocchiBackend: rbd
ExtraConfig:
ceph::profile::params::osd_pool_default_pg_num: 32
ceph::profile::params::osd_pool_default_pgp_num: 32
ceph::profile::params::osds:
'/dev/sdb': {}
'/dev/sdc': {}
'/dev/sdd': {}
So we have:
ExtraConfig: {} << in place to clean the puppet-ceph hieradata from the HCI nodes
But we can see that the hieradata is still present on the HCI nodes which is causing the validation to fail while running openstack overcloud upgrade run --roles ComputeHCI
We've tried a simple fix to the issue which is to use --skip-tags validation
But we need to ensure that the hieradata is cleaned before running openstack overcloud ceph-upgrade run in the next step so that ceph-ansible will not have any issues while upgrading the ceph cluster.
Please let us know if any data is required; we've managed to reproduce this twice so far.
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/assembly-upgrading_the_overcloud#upgrading-hyperconverged-nodes
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
Description of problem: We are trying to upgrade a RHOSP 10 HCI deployment to 13 using the FFU process. We are currently at step [1]; Here's the failure that is noticed while running the command: (undercloud) [stack@undercloud ~]$ openstack overcloud upgrade run --roles ComputeHCI | tee ~/logs/computeHCI Waiting for messages on queue 'tripleo' with no timeout. Started Mistral Workflow tripleo.package_update.v1.update_nodes. Execution ID: 811f7209-5fef-44b3-84ba-7aeff5deac5c Using /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.cfg as config file PLAY [overcloud] *************************************************************** TASK [Gathering Facts] ********************************************************* Thursday 13 June 2019 00:08:35 -0400 (0:00:00.136) 0:00:00.136 ********* ok: [overcloud-computehci-1] ok: [overcloud-computehci-0] ok: [overcloud-computehci-2] TASK [include] ***************************************************************** Thursday 13 June 2019 00:08:41 -0400 (0:00:06.490) 0:00:06.626 ********* included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 TASK [include] ***************************************************************** Thursday 13 June 2019 00:08:42 -0400 (0:00:00.932) 0:00:07.559 ********* skipping: [overcloud-computehci-2] => {"changed": false, "skip_reason": "Conditional result was False"} skipping: [overcloud-computehci-0] => {"changed": false, "skip_reason": "Conditional result was False"} skipping: [overcloud-computehci-1] => {"changed": false, "skip_reason": "Conditional result was False"} TASK [include] ***************************************************************** Thursday 13 June 2019 00:08:43 -0400 (0:00:00.463) 0:00:08.023 ********* included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ComputeHCI/upgrade_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1 TASK [Check legacy Ceph hieradata] ********************************************* Thursday 13 June 2019 00:08:44 -0400 (0:00:01.021) 0:00:09.045 ********* fatal: [overcloud-computehci-2]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.118111", "end": "2019-06-13 05:28:15.337381", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.219270", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} fatal: [overcloud-computehci-0]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.119234", "end": "2019-06-13 05:28:15.344743", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.225509", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} fatal: [overcloud-computehci-1]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.124309", "end": "2019-06-13 05:28:15.473064", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.348755", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} PLAY RECAP ********************************************************************* overcloud-computehci-0 : ok=8 changed=0 unreachable=0 failed=1 overcloud-computehci-1 : ok=8 changed=0 unreachable=0 failed=1 overcloud-computehci-2 : ok=8 changed=0 unreachable=0 failed=1 Thursday 13 June 2019 00:Update failed with: Ansible failed, check log at /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.log. 08:44 -0400 (0:00:00.813) 0:00:09.859 ********* =============================================================================== Ansible failed, check log at /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.log. For us the ffwd-upgrade prepare and ffwd-upgrade run commands have been completed successfully and the controller nodes have been upgraded to RHOSP 13. Here's the ceph-ansible template that we used with ffwd-upgrade prepare command to update the plan: (undercloud) [stack@undercloud ~]$ cat 13-templates/storage-environment.yaml # ****************************************************************************** # This file will not enable the deployment of Ceph in future releases. # Use ./ceph-ansible/ceph-ansible.yaml for this purpose instead. # ****************************************************************************** ## A Heat environment file which can be used to set up storage ## backends. Defaults to Ceph used as a backend for Cinder, Glance and ## Nova ephemeral storage. resource_registry: OS::TripleO::Services::CephMgr: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-mgr.yaml OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-mon.yaml OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-osd.yaml OS::TripleO::Services::CephClient: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-client.yaml parameter_defaults: #### BACKEND SELECTION #### ## Whether to enable iscsi backend for Cinder. CinderEnableIscsiBackend: false ## Whether to enable rbd (Ceph) backend for Cinder. CinderEnableRbdBackend: true ## Cinder Backup backend can be either 'ceph', 'swift' or 'nfs'. CinderBackupBackend: ceph ## Whether to enable NFS backend for Cinder. # CinderEnableNfsBackend: false ## Whether to enable rbd (Ceph) backend for Nova ephemeral storage. NovaEnableRbdBackend: true ## Glance backend can be either 'rbd' (Ceph), 'swift' or 'file'. GlanceBackend: rbd ## Gnocchi backend can be either 'rbd' (Ceph), 'swift' or 'file'. GnocchiBackend: rbd ExtraConfig: {} CephPoolDefaultPgNum: 32 CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd Here's the template used with original deployment: (undercloud) [stack@undercloud ~]$ cat templates/storage-environment.yaml ## A Heat environment file which can be used to set up storage ## backends. Defaults to Ceph used as a backend for Cinder, Glance and ## Nova ephemeral storage. resource_registry: OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-mon.yaml OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-osd.yaml OS::TripleO::Services::CephClient: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-client.yaml parameter_defaults: CinderEnableIscsiBackend: false CinderEnableRbdBackend: true CinderBackupBackend: ceph NovaEnableRbdBackend: true GlanceBackend: rbd GnocchiBackend: rbd ExtraConfig: ceph::profile::params::osd_pool_default_pg_num: 32 ceph::profile::params::osd_pool_default_pgp_num: 32 ceph::profile::params::osds: '/dev/sdb': {} '/dev/sdc': {} '/dev/sdd': {} So we have: ExtraConfig: {} << in place to clean the puppet-ceph hieradata from the HCI nodes But we can see that the hieradata is still present on the HCI nodes which is causing the validation to fail while running openstack overcloud upgrade run --roles ComputeHCI We've tried a simple fix to the issue which is to use --skip-tags validation But we need to ensure that the hieradata is cleaned before running openstack overcloud ceph-upgrade run in the next step so that ceph-ansible will not have any issues while upgrading the ceph cluster. Please let us know if any data is required; we've managed to reproduce this twice so far. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/assembly-upgrading_the_overcloud#upgrading-hyperconverged-nodes Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: