Bug 1599409 - [OSP13] Upgrade converge failed: cinder-manage db sync returned 1 instead of one of
Summary: [OSP13] Upgrade converge failed: cinder-manage db sync returned 1 instead of...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 13.0 (Queens)
Assignee: Alan Bishop
QA Contact: Avi Avraham
URL:
Whiteboard:
Depends On:
Blocks: 1488066 1595315 1599410
TreeView+ depends on / blocked
 
Reported: 2018-07-09 17:15 UTC by Alan Bishop
Modified: 2018-12-24 11:40 UTC (History)
16 users (show)

Fixed In Version: puppet-tripleo-8.3.4-2.el7ost
Doc Type: Bug Fix
Doc Text:
During a version upgrade, Cinder's database synchronization is now executed only on the bootstrap node. This prevents database synchronization and upgrade failures that occurred when database synchronization was executed on all Controller nodes.
Clone Of: 1595315
: 1599410 (view as bug list)
Environment:
Last Closed: 2018-08-29 16:37:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1779112 0 None None None 2018-07-09 17:15:14 UTC
OpenStack gerrit 579732 0 None MERGED Run cinder's db sync only on bootstrap node 2020-08-24 09:48:01 UTC
Red Hat Product Errata RHBA-2018:2574 0 None None None 2018-08-29 16:38:58 UTC

Description Alan Bishop 2018-07-09 17:15:15 UTC
+++ This bug was initially created as a clone of Bug #1595315 +++

Description of problem:
-----------------------
Overcloud upgrade converge failed:

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
--control-scale 3 \
--control-flavor controller \
--compute-scale 2 \
--compute-flavor compute \
--ceph-storage-scale 3 \
--ceph-storage-flavor ceph \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml

2018-06-26 14:22:56Z [AllNodesDeploySteps.ControllerDeployment_Step4.1]: CREATE_FAILED  Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2018-06-26 14:24:21Z [AllNodesDeploySteps.ControllerDeployment_Step4.0]: SIGNAL_IN_PROGRESS  Signal: deployment 86543e7e-a7d7-4755-a76f-097132ffb089 succeeded
2018-06-26 14:24:22Z [AllNodesDeploySteps.ControllerDeployment_Step4.0]: CREATE_COMPLETE  state changed
2018-06-26 14:24:22Z [AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED  Resource CREATE failed: Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2018-06-26 14:24:23Z [AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED  Error: resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2018-06-26 14:24:23Z [AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2018-06-26 14:24:24Z [AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2018-06-26 14:24:24Z [overcloud]: UPDATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6

 Stack overcloud UPDATE_FAILED 

openstack stack resource list --filter status=FAILED -n5 overcloud -f yaml
- physical_resource_id: 8c105c72-c8e0-4ca2-bac5-591516502e95
  resource_name: AllNodesDeploySteps
  resource_status: CREATE_FAILED
  resource_type: OS::TripleO::PostDeploySteps
  stack_name: overcloud
  updated_time: '2018-06-26T14:03:56Z'
- physical_resource_id: 80d73748-0b64-441c-b0d0-9a51d80fc5bb
  resource_name: ControllerDeployment_Step4
  resource_status: CREATE_FAILED
  resource_type: OS::Heat::StructuredDeploymentGroup
  stack_name: overcloud-AllNodesDeploySteps-epmolx5a24x3
  updated_time: '2018-06-26T14:03:57Z'
- physical_resource_id: eb7ee291-8d46-4b34-bdab-97e18440107d
  resource_name: '1'
  resource_status: CREATE_FAILED
  resource_type: OS::Heat::StructuredDeployment
  stack_name: overcloud-AllNodesDeploySteps-epmolx5a24x3-ControllerDeployment_Step4-iksgenqifsmp
  updated_time: '2018-06-26T14:15:40Z'
- physical_resource_id: 1e87c4c6-64a5-4e49-b59f-83907f2e0cf4
  resource_name: '2'
  resource_status: CREATE_FAILED
  resource_type: OS::Heat::StructuredDeployment
  stack_name: overcloud-AllNodesDeploySteps-epmolx5a24x3-ControllerDeployment_Step4-iksgenqifsmp
  updated_time: '2018-06-26T14:15:40Z'

On controller-1 and controller-2 next error present:
...
Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: cinder-manage  db sync returned 1 instead of one of [0]\u001b[0m\n", "deploy_status_code": 6}


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-5.3.10-7.el7ost.noarch
puppet-tripleo-5.6.8-7.el7ost.noarch

python-cinder-9.1.4-33.el7ost.noarch
puppet-cinder-9.5.0-6.el7ost.noarch
python-cinderclient-1.9.0-6.el7ost.noarch
openstack-cinder-9.1.4-33.el7ost.noarch

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-10
2. Launch VM with floating ip on OC
3. Setup rhos-10 repos on OC
4. Start ping test to VM's fip
5. Run 9->10 upgrade procedure 

Actual results:
---------------
Upgrade fails on converge step

Additional info:
----------------
Virtual env: 3controllers + 2computes + 3ceph

Comment 1 Alan Bishop 2018-07-09 17:20:45 UTC
Upstream patch has merged into stable/queens, and will arrive in next RDO bulk import.

Comment 12 Tzach Shefi 2018-08-20 12:23:49 UTC
Verified on: 
puppet-tripleo-8.3.4-5.el7ost.noarch

This was an FFU upgrade from osp10 to osp13 all in the same day. 
Upgrade passed without any issue.

Comment 14 errata-xmlrpc 2018-08-29 16:37:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574


Note You need to log in before you can comment on or make changes to this bug.