Bug 1599410

Summary: [OSP12] Upgrade converge failed: cinder-manage db sync returned 1 instead of one of
Product: Red Hat OpenStack Reporter: Alan Bishop <abishop>
Component: puppet-tripleoAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Avi Avraham <aavraham>
Severity: medium Docs Contact:
Priority: medium    
Version: 12.0 (Pike)CC: aavraham, abishop, augol, ccamacho, geguileo, jamsmith, jjoyce, jschluet, knylande, morazi, nlevinki, sathlang, slinaber, srevivo, tshefi, tvignaud, yprokule
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-7.4.12-5.el7ost Doc Type: Bug Fix
Doc Text:
During a version upgrade, Cinder's database synchronization is now executed only on the bootstrap node. This prevents database synchronization and upgrade failures that occurred when database synchronization was executed on all Controller nodes.
Story Points: ---
Clone Of: 1599409 Environment:
Last Closed: 2018-08-20 13:02:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1599409    
Bug Blocks: 1595315    

Description Alan Bishop 2018-07-09 17:17:01 UTC
+++ This bug was initially created as a clone of Bug #1599409 +++

+++ This bug was initially created as a clone of Bug #1595315 +++

Description of problem:
-----------------------
Overcloud upgrade converge failed:

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
--control-scale 3 \
--control-flavor controller \
--compute-scale 2 \
--compute-flavor compute \
--ceph-storage-scale 3 \
--ceph-storage-flavor ceph \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml

2018-06-26 14:22:56Z [AllNodesDeploySteps.ControllerDeployment_Step4.1]: CREATE_FAILED  Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2018-06-26 14:24:21Z [AllNodesDeploySteps.ControllerDeployment_Step4.0]: SIGNAL_IN_PROGRESS  Signal: deployment 86543e7e-a7d7-4755-a76f-097132ffb089 succeeded
2018-06-26 14:24:22Z [AllNodesDeploySteps.ControllerDeployment_Step4.0]: CREATE_COMPLETE  state changed
2018-06-26 14:24:22Z [AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED  Resource CREATE failed: Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2018-06-26 14:24:23Z [AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED  Error: resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2018-06-26 14:24:23Z [AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2018-06-26 14:24:24Z [AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2018-06-26 14:24:24Z [overcloud]: UPDATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step4.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6

 Stack overcloud UPDATE_FAILED 

openstack stack resource list --filter status=FAILED -n5 overcloud -f yaml
- physical_resource_id: 8c105c72-c8e0-4ca2-bac5-591516502e95
  resource_name: AllNodesDeploySteps
  resource_status: CREATE_FAILED
  resource_type: OS::TripleO::PostDeploySteps
  stack_name: overcloud
  updated_time: '2018-06-26T14:03:56Z'
- physical_resource_id: 80d73748-0b64-441c-b0d0-9a51d80fc5bb
  resource_name: ControllerDeployment_Step4
  resource_status: CREATE_FAILED
  resource_type: OS::Heat::StructuredDeploymentGroup
  stack_name: overcloud-AllNodesDeploySteps-epmolx5a24x3
  updated_time: '2018-06-26T14:03:57Z'
- physical_resource_id: eb7ee291-8d46-4b34-bdab-97e18440107d
  resource_name: '1'
  resource_status: CREATE_FAILED
  resource_type: OS::Heat::StructuredDeployment
  stack_name: overcloud-AllNodesDeploySteps-epmolx5a24x3-ControllerDeployment_Step4-iksgenqifsmp
  updated_time: '2018-06-26T14:15:40Z'
- physical_resource_id: 1e87c4c6-64a5-4e49-b59f-83907f2e0cf4
  resource_name: '2'
  resource_status: CREATE_FAILED
  resource_type: OS::Heat::StructuredDeployment
  stack_name: overcloud-AllNodesDeploySteps-epmolx5a24x3-ControllerDeployment_Step4-iksgenqifsmp
  updated_time: '2018-06-26T14:15:40Z'

On controller-1 and controller-2 next error present:
...
Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: cinder-manage  db sync returned 1 instead of one of [0]\u001b[0m\n", "deploy_status_code": 6}


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-5.3.10-7.el7ost.noarch
puppet-tripleo-5.6.8-7.el7ost.noarch

python-cinder-9.1.4-33.el7ost.noarch
puppet-cinder-9.5.0-6.el7ost.noarch
python-cinderclient-1.9.0-6.el7ost.noarch
openstack-cinder-9.1.4-33.el7ost.noarch

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-10
2. Launch VM with floating ip on OC
3. Setup rhos-10 repos on OC
4. Start ping test to VM's fip
5. Run 9->10 upgrade procedure 

Actual results:
---------------
Upgrade fails on converge step

Additional info:
----------------
Virtual env: 3controllers + 2computes + 3ceph

Comment 5 errata-xmlrpc 2018-08-20 13:02:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331