Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1720281

Summary: HCI + Fast Forward Upgrades: openstack overcloud upgrade run --roles ComputeHCI fails with hiera check failures for ceph::profile::params::osds
Product: Red Hat OpenStack Reporter: Punit Kundal <pkundal>
Component: openstack-tripleo-heat-templatesAssignee: Francesco Pantano <fpantano>
Status: CLOSED DUPLICATE QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aschultz, augol, fpantano, gfidente, jfrancoa, johfulto, kthakre, lbezdick, mburns, tenobreg
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-07 14:14:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Punit Kundal 2019-06-13 14:49:47 UTC
Description of problem:

We are trying to upgrade a RHOSP 10 HCI deployment to 13 using the FFU process.

We are currently at step [1];

Here's the failure that is noticed while running the command:

(undercloud) [stack@undercloud ~]$ openstack overcloud upgrade run --roles ComputeHCI |  tee ~/logs/computeHCI
Waiting for messages on queue 'tripleo' with no timeout.
Started Mistral Workflow tripleo.package_update.v1.update_nodes. Execution ID: 811f7209-5fef-44b3-84ba-7aeff5deac5c
Using /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.cfg as config file

PLAY [overcloud] ***************************************************************

TASK [Gathering Facts] *********************************************************
Thursday 13 June 2019  00:08:35 -0400 (0:00:00.136)       0:00:00.136 ********* 
ok: [overcloud-computehci-1]
ok: [overcloud-computehci-0]
ok: [overcloud-computehci-2]

TASK [include] *****************************************************************
Thursday 13 June 2019  00:08:41 -0400 (0:00:06.490)       0:00:06.626 ********* 
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/upgrade_steps_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1

TASK [include] *****************************************************************
Thursday 13 June 2019  00:08:42 -0400 (0:00:00.932)       0:00:07.559 ********* 
skipping: [overcloud-computehci-2] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [overcloud-computehci-0] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [overcloud-computehci-1] => {"changed": false, "skip_reason": "Conditional result was False"}

TASK [include] *****************************************************************
Thursday 13 June 2019  00:08:43 -0400 (0:00:00.463)       0:00:08.023 ********* 
included: /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ComputeHCI/upgrade_tasks.yaml for overcloud-computehci-2, overcloud-computehci-0, overcloud-computehci-1

TASK [Check legacy Ceph hieradata] *********************************************
Thursday 13 June 2019  00:08:44 -0400 (0:00:01.021)       0:00:09.045 ********* 
fatal: [overcloud-computehci-2]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.118111", "end": "2019-06-13 05:28:15.337381", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.219270", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [overcloud-computehci-0]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.119234", "end": "2019-06-13 05:28:15.344743", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.225509", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [overcloud-computehci-1]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.124309", "end": "2019-06-13 05:28:15.473064", "msg": "non-zero return code", "rc": 1, "start": "2019-06-13 05:28:15.348755", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
overcloud-computehci-0     : ok=8    changed=0    unreachable=0    failed=1   
overcloud-computehci-1     : ok=8    changed=0    unreachable=0    failed=1   
overcloud-computehci-2     : ok=8    changed=0    unreachable=0    failed=1   

Thursday 13 June 2019  00:Update failed with: Ansible failed, check log at /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.log.
08:44 -0400 (0:00:00.813)       0:00:09.859 ********* 
=============================================================================== 

Ansible failed, check log at /var/lib/mistral/811f7209-5fef-44b3-84ba-7aeff5deac5c/ansible.log.


For us the ffwd-upgrade prepare and ffwd-upgrade run commands have been completed successfully and the controller nodes have been upgraded to RHOSP 13.

Here's the ceph-ansible template that we used with ffwd-upgrade prepare command to update the plan:

(undercloud) [stack@undercloud ~]$ cat 13-templates/storage-environment.yaml 
# ******************************************************************************
# This file will not enable the deployment of Ceph in future releases.
# Use ./ceph-ansible/ceph-ansible.yaml for this purpose instead.
# ******************************************************************************
## A Heat environment file which can be used to set up storage
## backends. Defaults to Ceph used as a backend for Cinder, Glance and
## Nova ephemeral storage.
resource_registry:
  OS::TripleO::Services::CephMgr: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-mgr.yaml
  OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-mon.yaml
  OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-osd.yaml
  OS::TripleO::Services::CephClient: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-client.yaml

parameter_defaults:
  #### BACKEND SELECTION ####

  ## Whether to enable iscsi backend for Cinder.
  CinderEnableIscsiBackend: false
  ## Whether to enable rbd (Ceph) backend for Cinder.
  CinderEnableRbdBackend: true
  ## Cinder Backup backend can be either 'ceph', 'swift' or 'nfs'.
  CinderBackupBackend: ceph
  ## Whether to enable NFS backend for Cinder.
  # CinderEnableNfsBackend: false
  ## Whether to enable rbd (Ceph) backend for Nova ephemeral storage.
  NovaEnableRbdBackend: true
  ## Glance backend can be either 'rbd' (Ceph), 'swift' or 'file'.
  GlanceBackend: rbd
  ## Gnocchi backend can be either 'rbd' (Ceph), 'swift' or 'file'.
  GnocchiBackend: rbd
  ExtraConfig: {}
  CephPoolDefaultPgNum: 32
  CephAnsibleDisksConfig:
    devices:
      - /dev/sdb
      - /dev/sdc
      - /dev/sdd


Here's the template used with original deployment:

(undercloud) [stack@undercloud ~]$ cat templates/storage-environment.yaml 
## A Heat environment file which can be used to set up storage
## backends. Defaults to Ceph used as a backend for Cinder, Glance and
## Nova ephemeral storage.
resource_registry:
  OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-mon.yaml
  OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-osd.yaml
  OS::TripleO::Services::CephClient: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-client.yaml

parameter_defaults:
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: true
  CinderBackupBackend: ceph
  NovaEnableRbdBackend: true
  GlanceBackend: rbd
  GnocchiBackend: rbd
  ExtraConfig:
    ceph::profile::params::osd_pool_default_pg_num: 32
    ceph::profile::params::osd_pool_default_pgp_num: 32
    ceph::profile::params::osds:
      '/dev/sdb': {}
      '/dev/sdc': {}
      '/dev/sdd': {}


So we have: 

ExtraConfig: {}  << in place to clean the puppet-ceph hieradata from the HCI nodes 

But we can see that the hieradata is still present on the HCI nodes which is causing the validation to fail while running openstack overcloud upgrade run --roles ComputeHCI

We've tried a simple fix to the issue which is to use --skip-tags validation 

But we need to ensure that the hieradata is cleaned before running openstack overcloud ceph-upgrade run in the next step so that ceph-ansible will not have any issues while upgrading the ceph cluster.

Please let us know if any data is required; we've managed to reproduce this twice so far.


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/assembly-upgrading_the_overcloud#upgrading-hyperconverged-nodes

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 John Fulton 2019-08-07 14:14:32 UTC

*** This bug has been marked as a duplicate of bug 1738592 ***