Bug 1267598

Summary: nova: when attempted 'nova resize' on setup with two compute nodes the instance switched to ERROR state.
Product: Red Hat OpenStack Reporter: Mike Orazi <morazi>
Component: rhosp-directorAssignee: Ollie Walsh <owalsh>
Status: CLOSED CURRENTRELEASE QA Contact: Archit Modi <amodi>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: arkady_kanevsky, audra_cooper, christopher_dearborn, dallan, david_paterson, eglynn, jcoufal, jen, jguiditt, jhakimra, John_walsh, jschluet, j_t_williams, kimi.zhang, mbooth, mburns, nbarcet, nlevinki, ohochman, owalsh, rhel-osp-director-maint, rhos-maint, sasha, sclewis, sgordon, smerrow, sreichar, srevivo, stoner, svanders, tshefi, tytus.kurek
Target Milestone: z3Keywords: AutomationBlocker, InstallerIntegration, Reopened, TestOnly, ZStream
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-2015.1.4-45.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 975014
: 1292532 (view as bug list) Environment:
Last Closed: 2017-11-28 19:19:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 975014, 1472723    
Bug Blocks: 1028186, 1156010, 1198809, 1241501, 1243520, 1258302, 1292532, 1356451    

Comment 2 Mike Orazi 2015-09-30 14:03:33 UTC
This is to ensure that nova resize works as expected when installation is driven by osp-director.

Comment 3 Mike Burns 2015-10-02 16:18:57 UTC
What we'd like to validate is that live migration and instance resize works after a director deployment.

Comment 4 arkady kanevsky 2015-10-02 16:23:46 UTC
Great that this is fixed in OSP?

Comment 5 Mike Orazi 2015-10-02 16:29:56 UTC
To clarify on this, we are requesting a retest of live migration and instance resize on director-based deployments.  There is a distinct possibility this will bounce back to development with specific issues that still need to be addressed but we want to re-validate the state of the functionality.

Comment 8 nlevinki 2015-12-16 09:26:49 UTC
I marked this ticket as a blocker, if we try to deploy with 1000 compute nodes or more, customer will need to access each node and add the cert, this is unacceptable user experience. 
The issue is that nova@compute is trying to do a passwordless ssh into nova@controller.  However, nova@compute doesn't have a cert registered with nova@controller, so the passwordless login fails.

Comment 10 Jaromir Coufal 2015-12-16 09:55:31 UTC
Given timeframe, this is not a blocker for 7.2, we can clearly document it and make sure to fix it in OSP8.

Comment 13 arkady kanevsky 2015-12-17 14:35:55 UTC
Can you clarify where and which certs are needed?
Controller nodes need certs for each nova/compute node? Inverse?
All nova nodes need certs for all other nova nodes?
Undercloud need certs for all overcloud nodes?

Also it looks like we will need to split this BZ into two.
One for documentation for OSP7 and one for actual fix for OPS8/OSP8-d.

Comment 15 Mike Burns 2016-04-07 20:50:54 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 16 Karl Hastings 2016-07-22 17:57:42 UTC
Since this is deferred to OSP 10, I'm moving it to the delljs7.0 tracker.

Note that versions of this BZ exist for OSP 5, 975014, OSP 6 1028186, and OSP 7 1292532.

Should some of those BZ's be closed, or should this BZ have been left for OSP 8, and cloned for OSP 10?

Comment 17 Stephen Gordon 2016-11-14 20:47:32 UTC
*** Bug 1221776 has been marked as a duplicate of this bug. ***

Comment 18 arkady kanevsky 2016-11-14 20:57:19 UTC
So does this BZ makes it for OSP10 or not?
I am seeing a lot of pointers to various BZs that are either closed but not fixed or re-targeted to OSP11.

Comment 19 Stephen Gordon 2016-11-15 20:01:23 UTC
(In reply to arkady kanevsky from comment #18)
> So does this BZ makes it for OSP10 or not?
> I am seeing a lot of pointers to various BZs that are either closed but not
> fixed or re-targeted to OSP11.

No, there are a number of issues in this area that the OOTB configuration does not currently support. The current documented workaround is the same as for live migration:

https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage/#sect-Migrating_VMs_from_an_Overcloud_Compute_Node

Correcting this will require director/tripleo to handle this additional configuration (which in some environments will not be desired, so it will need to be togglable but default to on - not all operators accept the hosts having access to each other in this fashion), we are exploring what this will look like in Ocata.

I've left a 10.0.z flag for now in the hope that it might be backportable but this will depend on the resolution.

Comment 20 Sean Merrow 2017-03-15 17:06:57 UTC
Hi Steve, just checking in to see if there have been any further discussions on this since your November update in comment 19.

Comment 21 Stephen Gordon 2017-03-15 20:13:13 UTC
(In reply to Sean Merrow from comment #20)
> Hi Steve, just checking in to see if there have been any further discussions
> on this since your November update in comment 19.

We're still determining what whether we can offer an OOTB solution in 12, any opportunity for backport is obviously contingent on that, the priority is ensuring we have secure OOTB live migration - the cold migration setups (including resize) would need to be addressed after this. See also Bug # 1404294.

Comment 22 Audra Cooper 2017-04-14 13:34:25 UTC
I did a test of Nova Resize after setting up the ssh keys to the computes (following the instructions in the link noted in Comment 19), and it was successful with OSP10.

Comment 23 Audra Cooper 2017-07-14 13:51:09 UTC
This is now working OOTB in JS10.0.1.60 without the workaround of ssh keys.

Comment 25 Jon Schlueter 2017-11-15 01:58:53 UTC
According to our records, this should be resolved by openstack-nova-2015.1.4-46.el7ost.  This build is available now.