Bug 1244810

Summary:

Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh secret that wasn't created

Product:

Red Hat OpenStack

Reporter:

Udi Kalifon <ukalifon>

Component:

python-rdomanager-oscplugin

Assignee:

Brad P. Crochet <brad>

Status:

CLOSED ERRATA

QA Contact:

Udi Kalifon <ukalifon>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

Director

CC:

brad, calfonso, gfidente, jslagle, mburns, mcornea, rhel-osp-director-maint, rrosa, sasha

Target Milestone:

Keywords:

Triaged

Target Release:

Director

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

python-rdomanager-oscplugin-0.0.8-42.el7ost

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-08-05 13:59:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1243274

Bug Blocks:

Attachments:

Description	Flags
Failed resource after scale up	none

Description Udi Kalifon 2015-07-20 14:16:10 UTC

Created attachment 1053921 [details]
Failed resource after scale up

Description of problem:
I tried to scale up from 1 compute to 3 (on BMs with puddle 2015-07-13) and the stack failed on resource "ComputePuppetDeployment" of the 1st compute node (the one that already existed couldn't be updated). The failure reason is not very informative: "Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6".

Further debugging shows that on the compute node that failed, there was an attempt to run "virsh secret-set-value" while the secret table is really empty and the uuid of the secret didn't exist. It seems the fsid was updated during the scale-up when it should not have been.

Additional info is attached to the bug. It shows the error from "heat deployment-show".


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-32.el7ost.noarch


How reproducible:
100%


Steps to Reproduce:
1. Deploy with 3 controllers, 1 compute and 1 ceph. I deployed on bare metals, without network isolation. I deployed with tuskar.
2. Run the deployment command again and scale up to 3 computes


Actual results:
Scale up fails.

Comment 3 Brad P. Crochet 2015-07-20 17:04:33 UTC

Believed to be fixed by: https://review.gerrithub.io/#/c/239994/

Comment 4 Giulio Fidente 2015-07-20 17:19:12 UTC

Brad, this is a different BZ; we need to make sure the params at [1] are not re-created when updating an existing deployment.

1. https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L314-L316

Comment 5 Brad P. Crochet 2015-07-20 17:32:19 UTC

The previous fix would partially fix it. Here is the remainder:

https://review.gerrithub.io/240650

Comment 6 Mike Burns 2015-07-23 11:34:29 UTC

*** Bug 1246023 has been marked as a duplicate of this bug. ***

Comment 8 Udi Kalifon 2015-07-30 14:48:29 UTC

Verified in: python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch

Comment 10 errata-xmlrpc 2015-08-05 13:59:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549