Description of problem: When performing an upgrade using the new ansible based subscription management service from OSP12 to OSP13 the following error was obtained (overcloud upgrade prepare step): (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list --long overcloud overcloud.AllNodesDeploySteps.ComputeHostPrepDeployment.0: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: f3456aa9-310b-48ae-b4b1-45cade0ca615 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | PLAY [localhost] *************************************************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [create persistent logs directory] **************************************** ok: [localhost] TASK [ceilometer logs readme] ************************************************** ok: [localhost] TASK [create persistent logs directory] **************************************** changed: [localhost] => (item=/var/log/containers/neutron) ...... continues ..... TASK [make sure libvirt services are disabled] ********************************* ok: [localhost] => (item=libvirtd.service) ok: [localhost] => (item=virtlogd.socket) TASK [NTP settings] ************************************************************ ok: [localhost] TASK [Install ntpdate] ********************************************************* skipping: [localhost] TASK [Ensure system is NTP time synced] **************************************** changed: [localhost] TASK [include_role] ************************************************************ to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/d37af0a8-9bfd-44fe-b489-392303d5c216_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=20 changed=6 unreachable=0 failed=0 deploy_stderr: | [WARNING]: Consider using yum, dnf or zypper module rather than running rpm ERROR! the role 'redhat-subscription' was not found in /var/lib/heat-config/heat-config-ansible/roles:/etc/ansible/roles:/usr/share/ansible/ roles:/var/lib/heat-config/heat-config-ansible overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment.0: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: 8e6ae129-8afe-4d26-975c-7bdaeaa61077 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | PLAY [localhost] *************************************************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [create persistent logs directory] **************************************** ok: [localhost] => (item=/var/log/containers/aodh) ok: [localhost] => (item=/var/log/containers/httpd/aodh-api) .......continues ..... TASK [redis logs readme] ******************************************************* changed: [localhost] TASK [include_role] ************************************************************ to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/ccb6e904-0fad-4ea1-9816-28c0a72c5986_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=50 changed=13 unreachable=0 failed=0 deploy_stderr: | ERROR! the role 'redhat-subscription' was not found in /var/lib/heat-config/heat-config-ansible/roles:/etc/ansible/roles:/usr/share/ansible/ roles:/var/lib/heat-config/heat-config-ansible Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Deploy OSP12 registered via old rhel-registration scripts process 2. Upgrade undercloud to OSP13 (set repos via subscription-manager) 3. Create rhsm.yaml env file as stated in official docs 4. Run overcloud prepare command: (undercloud) [stack@undercloud-0 tripleo-w5h0Qc-config]$ cat ~/overcloud_prepare_osp13.sh #!/bin/bash openstack overcloud upgrade prepare \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/extra_templates.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/templates/rhsm.yml \ -e /home/stack/templates/overcloud_images_osp13.yaml Actual results: Upgrade prepare command failed because redhat-subscription role is not found in the overcloud nodes. Expected results: Upgrade prepara command successfully finishes Additional info:
From the analysis I could do, the role is present in the undercloud: (undercloud) [stack@undercloud-0 tripleo-w5h0Qc-config]$ ls /usr/share/ansible/roles/ redhat-subscription tripleo-bootstrap tripleo-ipsec tripleo-ssh-known-hosts Bug as the host_prep_tasks are run in the target node, the Compute and Controller in this case, there the role is not present: [heat-admin@controller-0 ~]$ sudo ls /usr/share/ansible/roles/ And the playbook fails.
the role is deployed on the Undercloud, not on the Overcloud. the tasks are executed from the undercloud, so this isn't a bug.
Re-opening this bug as there is a real issue here. We're executing twice the host_prep_tasks when using config-download, once via SoftwareDeployment heat resource and a second time via the deploy steps playbook run from the undercloud: 1st time: https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/common/deploy-steps.j2#L278 2nd time: https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/common/deploy-steps.j2#L449 The main issue here is that the execution of the ansible-role-redhat-subscription is included within those host_prep_tasks ( https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/extraconfig/services/rhsm.yaml#L69 ) and the heat resource will be executed in the overcloud nodes itself, where the role is not installed. This issue has been spotted by somebody from consulting deploying a fresh osp13 environment and trying to register the nodes in a follow up step with this command: openstack overcloud deploy --stack ocd97 --templates \ -r /home/stack/templates/roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/barbican.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/barbican-backend-simple-crypto.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/config-download-environment.yaml \ --config-download \ -e /home/stack/templates/registration.yaml \ -e /home/stack/templates/barbican.yaml \ -e /home/stack/templates/active-directory.yaml \ -e /home/stack/templates/banner.yaml \ -e /home/stack/templates/collectd.yaml \ -e /home/stack/templates/enable-tls.yaml \ -e /home/stack/templates/fencing.yaml \ -e /home/stack/templates/inject-trust-anchor-hiera.yaml \ -e /home/stack/templates/ips-from-pool-all.yaml \ -e /home/stack/templates/logging.yaml \ -e /home/stack/templates/monitoring.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /home/stack/templates/ocd97_overcloud_images.yaml \ -e /home/stack/templates/ocd97_node_info.yaml \ -e /home/stack/templates/puppet-ceph-external.yaml \ -e /home/stack/templates/scheduler_hints_env.yaml \ -e /home/stack/templates/timezone.yaml 2>&1 | tee ~/ospdeploy/log/ocd97_deploy.log Failing the overcloud deploy operationg with the following error: 2019-02-20 08:20:57,165 p=8058 u=mistral | TASK [Output for ControllerAllNodesDeployment] ********************************* 2019-02-20 08:20:57,166 p=8058 u=mistral | Wednesday 20 February 2019 08:20:57 -0600 (0:00:01.131) 0:00:24.734 **** 2019-02-20 08:20:57,262 p=8058 u=mistral | ok: [ocd97-controller-2] => { "failed_when_result": false, "msg": [ { "stderr": [ "[2019-02-20 08:20:56,230] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/hiera < /var/lib/heat-config/deployed/b44b3644-f60d-4769-9dc0-6ef60be9f0a1.json", "[2019-02-20 08:20:56,424] (heat-config) [INFO] {\"deploy_stdout\": \"\", \"deploy_stderr\": \"\", \"deploy_status_code\": 0}", "[2019-02-20 08:20:56,424] (heat-config) [DEBUG] ", "[2019-02-20 08:20:56,424] (heat-config) [INFO] Completed /usr/libexec/heat-config/hooks/hiera", "[2019-02-20 08:20:56,424] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/b44b3644-f60d-4769-9dc0-6ef60be9f0a1.json < /var/lib/heat-config/deployed/b44b3644-f60d-4769-9dc0-6ef60be9f0a1.notify.json", "[2019-02-20 08:20:57,034] (heat-config) [INFO] ", "[2019-02-20 08:20:57,035] (heat-config) [DEBUG] " ] }, { "status_code": "0" } ] } 2019-02-20 08:20:57,278 p=8058 u=mistral | ok: [ocd97-controller-0] => { "failed_when_result": false, "msg": [ { "stderr": [ "[2019-02-20 08:20:56,253] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/hiera < /var/lib/heat-config/deployed/a99622cb-3665-4cea-a599-7f11c5dd54df.json", "[2019-02-20 08:20:56,460] (heat-config) [INFO] {\"deploy_stdout\": \"\", \"deploy_stderr\": \"\", \"deploy_status_code\": 0}", "[2019-02-20 08:20:56,461] (heat-config) [DEBUG] ", "[2019-02-20 08:20:56,461] (heat-config) [INFO] Completed /usr/libexec/heat-config/hooks/hiera", "[2019-02-20 08:20:56,461] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/a99622cb-3665-4cea-a599-7f11c5dd54df.json < /var/lib/heat-config/deployed/a99622cb-3665-4cea-a599-7f11c5dd54df.notify.json", "[2019-02-20 08:20:57,012] (heat-config) [INFO] ", "[2019-02-20 08:20:57,013] (heat-config) [DEBUG] " ] }, { "status_code": "0" } ] } 2019-02-20 08:20:57,297 p=8058 u=mistral | ok: [ocd97-controller-1] => { "failed_when_result": false, "msg": [ { "stderr": [ "[2019-02-20 08:20:56,273] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/hiera < /var/lib/heat-config/deployed/c492dc5a-e371-4451-86e8-35f712b213f2.json", "[2019-02-20 08:20:56,492] (heat-config) [INFO] {\"deploy_stdout\": \"\", \"deploy_stderr\": \"\", \"deploy_status_code\": 0}", "[2019-02-20 08:20:56,493] (heat-config) [DEBUG] ", "[2019-02-20 08:20:56,493] (heat-config) [INFO] Completed /usr/libexec/heat-config/hooks/hiera", "[2019-02-20 08:20:56,493] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/c492dc5a-e371-4451-86e8-35f712b213f2.json < /var/lib/heat-config/deployed/c492dc5a-e371-4451-86e8-35f712b213f2.notify.json", "[2019-02-20 08:20:57,123] (heat-config) [INFO] ", "[2019-02-20 08:20:57,123] (heat-config) [DEBUG] " ] }, { "status_code": "0" } ] }
The best way to solve it, in my opinion, would be to identify a way to pass into deploy_steps if we're using config-download or not and add it to this condition: https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/common/deploy-steps.j2#L275 However, I don't know the deployment guts that well to implement it myself. Could anybody from the DFG:DF give a hand?
Yes, that was the first thing I went to check (ansible-role-redhat-subscription is installed in the undercloud), as I also expected to have the role installed as it is a requirement of tht. But we install tht in the undercloud node not in the overcloud nodes and as {{role.name}}HostPrepConfig is a SoftwareDeployment resource, it will be executed in each overcloud node. So in the end, we are trying to run the redhat-subscription role from the overcloud node itself, where it is not present. 2019-02-21 07:55:29Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerHostPrepDeployment.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2019-02-21 07:55:32Z [overcloud-AllNodesDeploySteps-srm5xm3btoyx-ComputeHostPrepDeployment-c3nwmiz5uluc.0]: SIGNAL_IN_PROGRESS Signal: deployment b575150f-763e-4236-8b95-a68e3ec6d0cd failed (1) Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.ComputeHostPrepDeployment.0: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: b575150f-763e-4236-8b95-a68e3ec6d0cd status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 deploy_stdout: | deploy_stderr: | ERROR! the role 'redhat-subscription' was not found in /var/lib/heat-config/heat-config-ansible/roles:/etc/ansible/roles:/usr/share/ansible/roles:/var/lib/heat-config/heat-config-ansible overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment.0: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: 25c0058e-f78d-416f-9fff-44d413e02573 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 deploy_stdout: | deploy_stderr: | ERROR! the role 'redhat-subscription' was not found in /var/lib/heat-config/heat-config-ansible/roles:/etc/ansible/roles:/usr/share/ansible/roles:/var/lib/heat-config/heat-config-ansible For that reason we need to ensure that the ansible-role-redhat-subscription gets installed in the overcloud nodes.
This error should get solved once the overcloud-images get build including the ansible-role-redhat-subscription, included in this patch (merged already downstream) https://code.engineering.redhat.com/gerrit/gitweb?p=openstack-tripleo-puppet-elements.git;a=commitdiff;h=d4169671213706d97049c652ecc72571e08979db
According to our records, this should be resolved by openstack-tripleo-puppet-elements-8.0.1-2.el7ost. This build is available now.
Verified openstack-tripleo-puppet-elements-8.0.1-2.el7ost is contained in passed_pahse1 compose
It looks like this BZ1640167 wasn't fully fixed and it didn't take into account the minor updates from an release older than the one where the fix landed. As OSP13 is EOL we can't submit a fix for it. However, the workaround is fairly simple as the required package is in the OSP repositories: # First, generate the inventory. From the Undercloud, source the stackrc first tripleo-ansible-inventory --stack <stack-name> --static-yaml-inventory inventory.yaml # Run the installation of ansible-role-redhat-subscription in all the overcloud nodes ansible overcloud -i inventory.yaml -m package -a "name=ansible-role-redhat-subscription state=present" Re-run the failing command.