Bug 1244810 - Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh secret that wasn't created
Summary: Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin
Version: Director
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ga
: Director
Assignee: Brad P. Crochet
QA Contact: Udi Kalifon
URL:
Whiteboard:
: 1246023 (view as bug list)
Depends On: 1243274
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-20 14:16 UTC by Udi Kalifon
Modified: 2015-08-05 13:59 UTC (History)
9 users (show)

Fixed In Version: python-rdomanager-oscplugin-0.0.8-42.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-05 13:59:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Failed resource after scale up (11.29 KB, text/plain)
2015-07-20 14:16 UTC, Udi Kalifon
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gerrithub.io 239994 0 None None None Never
Gerrithub.io 240650 0 None None None Never
Red Hat Product Errata RHEA-2015:1549 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Udi Kalifon 2015-07-20 14:16:10 UTC
Created attachment 1053921 [details]
Failed resource after scale up

Description of problem:
I tried to scale up from 1 compute to 3 (on BMs with puddle 2015-07-13) and the stack failed on resource "ComputePuppetDeployment" of the 1st compute node (the one that already existed couldn't be updated). The failure reason is not very informative: "Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6".

Further debugging shows that on the compute node that failed, there was an attempt to run "virsh secret-set-value" while the secret table is really empty and the uuid of the secret didn't exist. It seems the fsid was updated during the scale-up when it should not have been.

Additional info is attached to the bug. It shows the error from "heat deployment-show".


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-32.el7ost.noarch


How reproducible:
100%


Steps to Reproduce:
1. Deploy with 3 controllers, 1 compute and 1 ceph. I deployed on bare metals, without network isolation. I deployed with tuskar.
2. Run the deployment command again and scale up to 3 computes


Actual results:
Scale up fails.

Comment 3 Brad P. Crochet 2015-07-20 17:04:33 UTC
Believed to be fixed by: https://review.gerrithub.io/#/c/239994/

Comment 4 Giulio Fidente 2015-07-20 17:19:12 UTC
Brad, this is a different BZ; we need to make sure the params at [1] are not re-created when updating an existing deployment.

1. https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L314-L316

Comment 5 Brad P. Crochet 2015-07-20 17:32:19 UTC
The previous fix would partially fix it. Here is the remainder:

https://review.gerrithub.io/240650

Comment 6 Mike Burns 2015-07-23 11:34:29 UTC
*** Bug 1246023 has been marked as a duplicate of this bug. ***

Comment 8 Udi Kalifon 2015-07-30 14:48:29 UTC
Verified in: python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch

Comment 10 errata-xmlrpc 2015-08-05 13:59:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.