Bug 1473863 - live-migration with boot from volume + config_drive takes a very long time.
live-migration with boot from volume + config_drive takes a very long time.
Status: NEW
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
10.0 (Newton)
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Eoghan Glynn
Joe H. Rahme
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-21 18:01 EDT by Siggy Sigwald
Modified: 2017-08-11 09:45 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Siggy Sigwald 2017-07-21 18:01:24 EDT
Description of problem:
Using openstack server migrate --live takes a long time to move specific instances:
boot from volume no config_drive it took less than 5 seconds 
boot from image with config_drive it took less than 5 seconds
boot from volume with config_drive it took between 40 and 90 minutes

We migrated 4 Virtual machines between compute nodes in a RHOSP10 deployed overcloud.

one migration resulted in a huge wait for the VM to migrate.  /var/log/nova/nova-compute.log  shows that this single migration took more than 45+ minutes without any real activity until the end of the migration 
openstack server migrate --live  edtnabtfhv23.localdomain 416db9ca-e3ee-4062-8560-60823f9a0350

Migrating 3 other VM's between the same 2 physical servers took far less time.  These three instances took mere seconds to complete.
openstack server migrate --block-migration --live  edtnabtfhv23.localdomain 657b8e74-2879-4177-8f5f-bd3cfb81a259
openstack server migrate --live  edtnabtfhv23.localdomain 256850f7-0efc-43bc-be69-fb3b380f071d
openstack server migrate --live  edtnabtfhv23.localdomain 09e5084f-e8a7-4c6a-aa6c-aca6d91b63e9

The virtual machines that are booted from volume without config drive take seconds to migrate.
The virtual machines that are booted from image with or without config drive take 10's of seconds to migrate.
The virtual machine booted from volume with config drive took a huge amount of time.

sosreports and testing details can be found here --> 
http://collab-shell.usersys.redhat.com/01891999/
Comment 1 Andreas Karis 2017-07-24 12:15:40 EDT
Hello,

Can we get some traction for this BZ?

Thanks,

Andreas
Comment 3 Matthew Booth 2017-07-28 07:50:04 EDT
I've had a look through the logs in the sosreports. Unfortunately I don't see any end-to-end migration logs. I've put the request ids below, along with instance uuid and the first and last log timestamp I see for that request:

req-f83c757a-5a4f-4440-824c-0e097432192b 657b8e74-2879-4177-8f5f-bd3cfb81a259

2017-07-17 21:49:59.548
2017-07-17 21:50:16.175

req-0f29dfcc-e98a-4801-ba52-333aab8e5a63 416db9ca-e3ee-4062-8560-60823f9a0350

2017-07-17 21:51:26.794
2017-07-17 22:37:12.449

req-4724b7f4-65e1-448d-b784-d0fd8b67fd5c 256850f7-0efc-43bc-be69-fb3b380f071d

2017-07-17 21:51:50.982
2017-07-17 22:37:56.528

req-953a63dc-73d2-429d-a5db-fa058bb826c6 09e5084f-e8a7-4c6a-aa6c-aca6d91b63e9

2017-07-17 21:51:57.979
2017-07-17 22:38:33.305

The first request seems to complete quite quickly. The other 3 all take a suspiciously similar amount of time, like the delay has some common cause. Perhaps it was blocked on some error elsewhere?

The only logs we have are nova-api.logs from seemingly all controllers, and nova-compute.log from edtnabtfhv23.localdomain. All of the above logs are from requests where edtnabtfhv23.localdomain is the destination compute.

To try to get to the bottom of this we need:

* nova-api.log from all controllers
* nova-conductor.log from all controllers
* nova-compute.log from both source and destination computes

My best guess is that the cause of the delay is either on the source compute or in conductor, because I don't see it on the destination.
Comment 4 Matthew Booth 2017-07-28 12:57:37 EDT
I can't reproduce this locally, btw. Migration of boot from volume with config drive completes normally here.

Note You need to log in before you can comment on or make changes to this bug.