Bug 1573327 - FFU: ceph upgrade fails and exits with 'queue_name'
Summary: FFU: ceph upgrade fails and exits with 'queue_name'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 13.0 (Queens)
Assignee: Jose Luis Franco
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks: 1561169
TreeView+ depends on / blocked
 
Reported: 2018-04-30 20:47 UTC by Marius Cornea
Modified: 2018-06-27 13:55 UTC (History)
11 users (show)

Fixed In Version: python-tripleoclient-9.2.1-8.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:54:52 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 566944 None None None 2018-05-09 16:57:58 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:55:51 UTC

Description Marius Cornea 2018-04-30 20:47:13 UTC
Description of problem:
FFU: ceph upgrade fails and exits with 'queue_name'. It looks that the upgrade completed ok but in the end it exits with the 'queue_name' output:

[...]
2018-04-30 20:39:53Z [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully

 Stack overcloud UPDATE_COMPLETE 

Started Mistral Workflow tripleo.package_update.v1.get_config. Execution ID: 8ed4f0f1-00e5-4546-bb21-fe31f20894d6
Waiting for messages on queue 'tripleo' with no timeout.
Success
Ceph Upgrade on stack overcloud complete. Cleaning up
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 298a2ae0-f017-4cd0-9508-43d0ad43a8b6
Waiting for messages on queue 'tripleo' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: b013332f-199d-40eb-88ff-2e6ecc60f719
Plan updated.
Processing templates in the directory /tmp/tripleoclient-r28yp9/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 5476097f-99a3-4bc9-9ed8-6a26311b508d
WARNING: Following parameters are deprecated and still defined. Deprecated parameters will be removed soon!
  OvercloudControlFlavor
WARNING: Following parameters are defined but not used in plan. Could be possible that parameter is valid but currently not used.
  CephAnsiblePlaybook
  StorageNetCidr
  StorageMgmtNetCidr
  ControlPlaneDefaultRoute
  CephAnsiblePlaybookVerbosity
  StorageMgmtNetworkVlanID
  ExternalAllocationPools
  TenantNetCidr
  InternalApiNetworkVlanID
  EC2MetadataIp
  CephAnsibleDisksConfig
  InternalApiNetCidr
  ExternalInterfaceDefaultRoute
  StorageAllocationPools
  ExternalNetworkVlanID
  DnsServers
  StorageMgmtAllocationPools
  TenantNetworkVlanID
  StorageNetworkVlanID
  CinderBackupBackend
  CephPoolDefaultPgNum
  InternalApiAllocationPools
  ExternalNetCidr
  TenantAllocationPools
'queue_name'


Version-Release number of selected component (if applicable):
python-tripleoclient-9.2.1-3.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-4.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. openstack overcloud ffwd-upgrade prepare 
2. openstack overcloud ffwd-upgrade run
3. openstack overcloud upgrade run --roles Controller --skip-tags validation
4. openstack overcloud upgrade run --roles Compute --skip-tags validation
5. openstack overcloud ffwd-upgrade converge
6. workaround bug 1573307
7. openstack overcloud ceph-upgrade run \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/ffu_repos.yaml \
-e /home/stack/cli_opts_params.yaml \
-e /home/stack/ceph-ansible-env.yaml \
--ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml'

Actual results:
It looks that the ceph upgrade completed ok but a post step fails and the client exits with 'queue_name' output.

Expected results:
The upgrade exits in a clean manner.

Additional info:

Comment 1 Julie Pichon 2018-05-02 13:40:53 UTC
It looks like a KeyError to me, something somewhere trying to read 'queue_name' in a dict where it doesn't exist (maybe it's missing from a workflow?), but only the unfound key name is returned in the error message. Running the command with --debug usually yields a more precise exception/trace when that happens, and hopefully should help with pinpointing the issue.

Comment 2 Julie Pichon 2018-05-02 13:48:59 UTC
I wonder if it might be this one?

This calls ffwd_converge_nodes() with only 'clients' and 'containers':

https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_ceph_upgrade.py#L81

but then ffwd_converge_nodes() itself seems to expect a 'queue_name' argument to have been explicitly defined:

https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/workflows/package_update.py#L158

Comment 5 Marios Andreou 2018-05-07 12:44:53 UTC
lbezdick can you please triage this (from triage call round robin)

Comment 7 Jose Luis Franco 2018-05-09 16:57:59 UTC
This bug is fixed by this other patch: https://review.openstack.org/#/c/566944/1

Comment 13 errata-xmlrpc 2018-06-27 13:54:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.