1244810 – Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh secret that wasn't created

Bug 1244810 - Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh secret that wasn't created

Summary: Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh s...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-rdomanager-oscplugin
Sub Component:
Version:	Director
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	ga
Target Release:	Director
Assignee:	Brad P. Crochet
QA Contact:	Udi Kalifon
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1246023 (view as bug list)
Depends On:	1243274
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-20 14:16 UTC by Udi Kalifon
Modified:	2015-08-05 13:59 UTC (History)
CC List:	9 users (show)
Fixed In Version:	python-rdomanager-oscplugin-0.0.8-42.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-05 13:59:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Failed resource after scale up (11.29 KB, text/plain) 2015-07-20 14:16 UTC, Udi Kalifon	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gerrithub.io	239994	None	None	None	Never
Gerrithub.io	240650	None	None	None	Never
Red Hat Product Errata	RHEA-2015:1549	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform director Release	2015-08-05 17:49:10 UTC

Description Udi Kalifon 2015-07-20 14:16:10 UTC

Created attachment 1053921 [details]
Failed resource after scale up

Description of problem:
I tried to scale up from 1 compute to 3 (on BMs with puddle 2015-07-13) and the stack failed on resource "ComputePuppetDeployment" of the 1st compute node (the one that already existed couldn't be updated). The failure reason is not very informative: "Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6".

Further debugging shows that on the compute node that failed, there was an attempt to run "virsh secret-set-value" while the secret table is really empty and the uuid of the secret didn't exist. It seems the fsid was updated during the scale-up when it should not have been.

Additional info is attached to the bug. It shows the error from "heat deployment-show".


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-32.el7ost.noarch


How reproducible:
100%


Steps to Reproduce:
1. Deploy with 3 controllers, 1 compute and 1 ceph. I deployed on bare metals, without network isolation. I deployed with tuskar.
2. Run the deployment command again and scale up to 3 computes


Actual results:
Scale up fails.

Comment 3 Brad P. Crochet 2015-07-20 17:04:33 UTC

Believed to be fixed by: https://review.gerrithub.io/#/c/239994/

Comment 4 Giulio Fidente 2015-07-20 17:19:12 UTC

Brad, this is a different BZ; we need to make sure the params at [1] are not re-created when updating an existing deployment.

1. https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L314-L316

Comment 5 Brad P. Crochet 2015-07-20 17:32:19 UTC

The previous fix would partially fix it. Here is the remainder:

https://review.gerrithub.io/240650

Comment 6 Mike Burns 2015-07-23 11:34:29 UTC

*** Bug 1246023 has been marked as a duplicate of this bug. ***

Comment 8 Udi Kalifon 2015-07-30 14:48:29 UTC

Verified in: python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch

Comment 10 errata-xmlrpc 2015-08-05 13:59:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549

Note You need to log in before you can comment on or make changes to this bug.