Bug 1244810

Summary: Scale out from 1 compute to 3 in a BM setup with Ceph, fails due to a virsh secret that wasn't created
Product: Red Hat OpenStack Reporter: Udi Kalifon <ukalifon>
Component: python-rdomanager-oscpluginAssignee: Brad P. Crochet <brad>
Status: CLOSED ERRATA QA Contact: Udi Kalifon <ukalifon>
Severity: urgent Docs Contact:
Priority: high    
Version: DirectorCC: brad, calfonso, gfidente, jslagle, mburns, mcornea, rhel-osp-director-maint, rrosa, sasha
Target Milestone: gaKeywords: Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-rdomanager-oscplugin-0.0.8-42.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:59:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1243274    
Bug Blocks:    
Attachments:
Description Flags
Failed resource after scale up none

Description Udi Kalifon 2015-07-20 14:16:10 UTC
Created attachment 1053921 [details]
Failed resource after scale up

Description of problem:
I tried to scale up from 1 compute to 3 (on BMs with puddle 2015-07-13) and the stack failed on resource "ComputePuppetDeployment" of the 1st compute node (the one that already existed couldn't be updated). The failure reason is not very informative: "Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6".

Further debugging shows that on the compute node that failed, there was an attempt to run "virsh secret-set-value" while the secret table is really empty and the uuid of the secret didn't exist. It seems the fsid was updated during the scale-up when it should not have been.

Additional info is attached to the bug. It shows the error from "heat deployment-show".


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-32.el7ost.noarch


How reproducible:
100%


Steps to Reproduce:
1. Deploy with 3 controllers, 1 compute and 1 ceph. I deployed on bare metals, without network isolation. I deployed with tuskar.
2. Run the deployment command again and scale up to 3 computes


Actual results:
Scale up fails.

Comment 3 Brad P. Crochet 2015-07-20 17:04:33 UTC
Believed to be fixed by: https://review.gerrithub.io/#/c/239994/

Comment 4 Giulio Fidente 2015-07-20 17:19:12 UTC
Brad, this is a different BZ; we need to make sure the params at [1] are not re-created when updating an existing deployment.

1. https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L314-L316

Comment 5 Brad P. Crochet 2015-07-20 17:32:19 UTC
The previous fix would partially fix it. Here is the remainder:

https://review.gerrithub.io/240650

Comment 6 Mike Burns 2015-07-23 11:34:29 UTC
*** Bug 1246023 has been marked as a duplicate of this bug. ***

Comment 8 Udi Kalifon 2015-07-30 14:48:29 UTC
Verified in: python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch

Comment 10 errata-xmlrpc 2015-08-05 13:59:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549