Bug 1573327

Summary: FFU: ceph upgrade fails and exits with 'queue_name'
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: python-tripleoclientAssignee: Jose Luis Franco <jfrancoa>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: ccamacho, dbecker, hbrock, jfrancoa, jpichon, jslagle, lbezdick, mandreou, mbracho, mburns, morazi
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-9.2.1-8.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:54:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1561169    

Description Marius Cornea 2018-04-30 20:47:13 UTC
Description of problem:
FFU: ceph upgrade fails and exits with 'queue_name'. It looks that the upgrade completed ok but in the end it exits with the 'queue_name' output:

[...]
2018-04-30 20:39:53Z [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully

 Stack overcloud UPDATE_COMPLETE 

Started Mistral Workflow tripleo.package_update.v1.get_config. Execution ID: 8ed4f0f1-00e5-4546-bb21-fe31f20894d6
Waiting for messages on queue 'tripleo' with no timeout.
Success
Ceph Upgrade on stack overcloud complete. Cleaning up
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 298a2ae0-f017-4cd0-9508-43d0ad43a8b6
Waiting for messages on queue 'tripleo' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: b013332f-199d-40eb-88ff-2e6ecc60f719
Plan updated.
Processing templates in the directory /tmp/tripleoclient-r28yp9/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 5476097f-99a3-4bc9-9ed8-6a26311b508d
WARNING: Following parameters are deprecated and still defined. Deprecated parameters will be removed soon!
  OvercloudControlFlavor
WARNING: Following parameters are defined but not used in plan. Could be possible that parameter is valid but currently not used.
  CephAnsiblePlaybook
  StorageNetCidr
  StorageMgmtNetCidr
  ControlPlaneDefaultRoute
  CephAnsiblePlaybookVerbosity
  StorageMgmtNetworkVlanID
  ExternalAllocationPools
  TenantNetCidr
  InternalApiNetworkVlanID
  EC2MetadataIp
  CephAnsibleDisksConfig
  InternalApiNetCidr
  ExternalInterfaceDefaultRoute
  StorageAllocationPools
  ExternalNetworkVlanID
  DnsServers
  StorageMgmtAllocationPools
  TenantNetworkVlanID
  StorageNetworkVlanID
  CinderBackupBackend
  CephPoolDefaultPgNum
  InternalApiAllocationPools
  ExternalNetCidr
  TenantAllocationPools
'queue_name'


Version-Release number of selected component (if applicable):
python-tripleoclient-9.2.1-3.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-4.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. openstack overcloud ffwd-upgrade prepare 
2. openstack overcloud ffwd-upgrade run
3. openstack overcloud upgrade run --roles Controller --skip-tags validation
4. openstack overcloud upgrade run --roles Compute --skip-tags validation
5. openstack overcloud ffwd-upgrade converge
6. workaround bug 1573307
7. openstack overcloud ceph-upgrade run \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/ffu_repos.yaml \
-e /home/stack/cli_opts_params.yaml \
-e /home/stack/ceph-ansible-env.yaml \
--ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml'

Actual results:
It looks that the ceph upgrade completed ok but a post step fails and the client exits with 'queue_name' output.

Expected results:
The upgrade exits in a clean manner.

Additional info:

Comment 1 Julie Pichon 2018-05-02 13:40:53 UTC
It looks like a KeyError to me, something somewhere trying to read 'queue_name' in a dict where it doesn't exist (maybe it's missing from a workflow?), but only the unfound key name is returned in the error message. Running the command with --debug usually yields a more precise exception/trace when that happens, and hopefully should help with pinpointing the issue.

Comment 2 Julie Pichon 2018-05-02 13:48:59 UTC
I wonder if it might be this one?

This calls ffwd_converge_nodes() with only 'clients' and 'containers':

https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_ceph_upgrade.py#L81

but then ffwd_converge_nodes() itself seems to expect a 'queue_name' argument to have been explicitly defined:

https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/workflows/package_update.py#L158

Comment 5 Marios Andreou 2018-05-07 12:44:53 UTC
lbezdick can you please triage this (from triage call round robin)

Comment 7 Jose Luis Franco 2018-05-09 16:57:59 UTC
This bug is fixed by this other patch: https://review.openstack.org/#/c/566944/1

Comment 13 errata-xmlrpc 2018-06-27 13:54:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086