Description of problem: FFU: ceph upgrade fails and exits with 'queue_name'. It looks that the upgrade completed ok but in the end it exits with the 'queue_name' output: [...] 2018-04-30 20:39:53Z [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully Stack overcloud UPDATE_COMPLETE Started Mistral Workflow tripleo.package_update.v1.get_config. Execution ID: 8ed4f0f1-00e5-4546-bb21-fe31f20894d6 Waiting for messages on queue 'tripleo' with no timeout. Success Ceph Upgrade on stack overcloud complete. Cleaning up Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 298a2ae0-f017-4cd0-9508-43d0ad43a8b6 Waiting for messages on queue 'tripleo' with no timeout. Removing the current plan files Uploading new plan files Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: b013332f-199d-40eb-88ff-2e6ecc60f719 Plan updated. Processing templates in the directory /tmp/tripleoclient-r28yp9/tripleo-heat-templates Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 5476097f-99a3-4bc9-9ed8-6a26311b508d WARNING: Following parameters are deprecated and still defined. Deprecated parameters will be removed soon! OvercloudControlFlavor WARNING: Following parameters are defined but not used in plan. Could be possible that parameter is valid but currently not used. CephAnsiblePlaybook StorageNetCidr StorageMgmtNetCidr ControlPlaneDefaultRoute CephAnsiblePlaybookVerbosity StorageMgmtNetworkVlanID ExternalAllocationPools TenantNetCidr InternalApiNetworkVlanID EC2MetadataIp CephAnsibleDisksConfig InternalApiNetCidr ExternalInterfaceDefaultRoute StorageAllocationPools ExternalNetworkVlanID DnsServers StorageMgmtAllocationPools TenantNetworkVlanID StorageNetworkVlanID CinderBackupBackend CephPoolDefaultPgNum InternalApiAllocationPools ExternalNetCidr TenantAllocationPools 'queue_name' Version-Release number of selected component (if applicable): python-tripleoclient-9.2.1-3.el7ost.noarch openstack-tripleo-heat-templates-8.0.2-4.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. openstack overcloud ffwd-upgrade prepare 2. openstack overcloud ffwd-upgrade run 3. openstack overcloud upgrade run --roles Controller --skip-tags validation 4. openstack overcloud upgrade run --roles Compute --skip-tags validation 5. openstack overcloud ffwd-upgrade converge 6. workaround bug 1573307 7. openstack overcloud ceph-upgrade run \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/ffu_repos.yaml \ -e /home/stack/cli_opts_params.yaml \ -e /home/stack/ceph-ansible-env.yaml \ --ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml' Actual results: It looks that the ceph upgrade completed ok but a post step fails and the client exits with 'queue_name' output. Expected results: The upgrade exits in a clean manner. Additional info:
It looks like a KeyError to me, something somewhere trying to read 'queue_name' in a dict where it doesn't exist (maybe it's missing from a workflow?), but only the unfound key name is returned in the error message. Running the command with --debug usually yields a more precise exception/trace when that happens, and hopefully should help with pinpointing the issue.
I wonder if it might be this one? This calls ffwd_converge_nodes() with only 'clients' and 'containers': https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_ceph_upgrade.py#L81 but then ffwd_converge_nodes() itself seems to expect a 'queue_name' argument to have been explicitly defined: https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/workflows/package_update.py#L158
lbezdick can you please triage this (from triage call round robin)
This bug is fixed by this other patch: https://review.openstack.org/#/c/566944/1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086