Bug 1318632 - nova resize/migrate/evacuate fails due to ssh accross compute nodes not setup
Summary: nova resize/migrate/evacuate fails due to ssh accross compute nodes not setup
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Angus Thomas
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-17 12:12 UTC by Asaf Hirshberg
Modified: 2019-11-14 07:37 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-21 15:47:14 UTC
Target Upstream Version:


Attachments (Terms of Use)
full error in /var/log/nova/nova-compute (7.24 KB, text/plain)
2016-03-17 12:12 UTC, Asaf Hirshberg
no flags Details

Description Asaf Hirshberg 2016-03-17 12:12:36 UTC
Created attachment 1137364 [details]
full error in /var/log/nova/nova-compute

Description of problem:
When testing instance-ha and nova-evacuate on ospd8(2016-03-11.1) I encountered the following error which cause the operation to fail. Also nova resize/migrate/live-migration failed.

2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher [req-592a33fc-e70f-49ee-9371-ac0764e8bae4 ed998e6b6296432983c92153ecf4ec96 072cb395c62b4cbbaf2b7e8e93461071 - - -] Exception during message handling: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh 192.0.2.17 mkdir -p /var/lib/nova/instances/50749f71-f281-4115-89a1-f1c1212f2caa
Exit code: 255
Stdout: u''
Stderr: u'Host key verification failed.\r\n'
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher     executor_callback))
...
...
error_out_instance_on_exception
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher     raise error.inner_exception
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher Command: ssh 192.0.2.17 mkdir -p /var/lib/nova/instances/50749f71-f281-4115-89a1-f1c1212f2caa
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher Exit code: 255
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher Stdout: u''
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher Stderr: u'Host key verification failed.\r\n'
2016-03-17 09:21:27.321 22849 ERROR oslo_messaging.rpc.dispatcher
2016-03-17 09:21:40.122 22849 WARNING nova.compute.resource_tracker [req-9ca643e4-2f08-414c-8883-7fa537dac4ae - - - - -] [instance: df807fbe-13ed-42bb-a0cb-d4604acf5ac7] Instance not resizing, skipping migration.


Version-Release number of selected component (if applicable):
openstack-puppet-modules-7.0.13-1.el7ost.noarch
puppet-3.6.2-3.el7.noarch
openstack-tripleo-puppet-elements-0.0.4-1.el7ost.noarch


How reproducible:
reproduced on multiple installations

Steps to Reproduce:
1.deploy ospd8 with 2+ compute nodes
2.use reszie/migrate/live-migration on one of the vm
3. less /var/log/nova/nova-compute on the compute node hosting
the vm.


Additional info:
RHEL-OSP director 8.0 puddle - 2016-03-11.1

Comment 2 Yogev Rabl 2016-03-17 14:59:29 UTC
This bug occurs when Nova uses RBD as the back end of the instance's disks.

Comment 3 James Slagle 2016-03-18 12:22:50 UTC
why is this a Regression? director never set up ssh keys between compute nodes

Comment 4 Ofer Blaut 2016-03-21 06:40:34 UTC
Hi James

The regression is not about the keys, 

In ospd 7.3 we had instance HA/resize  working with internal/external ceph , 

This is not working in ospd 8.0, this is why it is regrssion

Ofer

Comment 5 James Slagle 2016-03-21 12:49:59 UTC
(In reply to Ofer Blaut from comment #4)
> Hi James
> 
> The regression is not about the keys, 
> 
> In ospd 7.3 we had instance HA/resize  working with internal/external ceph , 
> 
> This is not working in ospd 8.0, this is why it is regrssion
> 
> Ofer

Is this the same bug Asaf is reporting?

Or a different bug?

The traceback shows an error about ssh keys not being configured. What is the traceback/error when using internal/external ceph? If it's different, please file a new bug as it's a different issue.

Comment 8 Udi Shkalim 2016-03-21 15:47:14 UTC
Problem happened as part of the Instance-HA feature which use nova evacuate. New parameter was added - no_shared_storage which was configured to true and that caused the evacuate to fail:

As we know, nova evacuation have two ways to be preformed:
1. without shared storage - Using the resize in order to rebuild the instance from scratch
2.  with shared storage  - not using the the resize command

But evacuation will not need ssh key exchange when using shared storage.
When we deployed with ceph as shared storage we saw the failure regarding the resize error, which did not make sense for us, hence we opened the bug.


Note You need to log in before you can comment on or make changes to this bug.