Bug 915274
| Summary: | Attempting to 'nova live-migrate' to a non-existing host, it fails, & the instance remains in a perpetual state of MIGRATING | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Kashyap Chamarthy <kchamart> |
| Component: | openstack-nova | Assignee: | Nikola Dipanov <ndipanov> |
| Status: | CLOSED ERRATA | QA Contact: | Kashyap Chamarthy <kchamart> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.1 | CC: | eglynn, ndipanov |
| Target Milestone: | snapshot5 | Keywords: | Triaged |
| Target Release: | 2.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-nova-2012.2.3-5.el6ost | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-04-04 20:21:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Kashyap Chamarthy
2013-02-25 11:14:55 UTC
Some more investigation: -> List the running instances: #=================================# [tuser1@interceptor ~(keystone_admin)]$ nova list --all-tenants +--------------------------------------+-----------+-----------+-------------------+ | ID | Name | Status| Networks | +--------------------------------------+-----------+-----------+-------------------+ | 08d616a9-87a1-4c0d-b986-7d6aa5ed6780 | fedora-t1 | ACTIVE| net1=10.65.207.50 | | 3e487977-37e8-4f26-9443-d65ecbdf83c9 | fedora-t2 | MIGRATING | net1=10.65.207.51 | | 48d9e518-a91f-48db-9d9b-965b243e7113 | fedora-t4 | ACTIVE| net1=10.65.207.52 | +--------------------------------------+-----------+-----------+-------------------+ #=================================# -> Let's try to 'delete' the MIGRATING instance #=================================# [tuser1@interceptor ~(keystone_admin)]$ nova delete 3e487977-37e8-4f26-9443-d65ecbdf83c9 #=================================# -> List all the running instances. Note, it indicates, the just deleted instance as 'ACTIVE' (a #=================================# [tuser1@interceptor ~(keystone_admin)]$ nova list --all-tenants +--------------------------------------+-----------+--------+-------------------+ | ID | Name | Status | Networks | +--------------------------------------+-----------+--------+-------------------+ | 08d616a9-87a1-4c0d-b986-7d6aa5ed6780 | fedora-t1 | ACTIVE | net1=10.65.207.50 | | 3e487977-37e8-4f26-9443-d65ecbdf83c9 | fedora-t2 | ACTIVE | net1=10.65.207.51 | | 48d9e518-a91f-48db-9d9b-965b243e7113 | fedora-t4 | ACTIVE | net1=10.65.207.52 | +--------------------------------------+-----------+--------+-------------------+ [tuser1@interceptor ~(keystone_admin)]$ #=================================# -> Let's try to ssh into the just 'active' #=================================# [tuser1@interceptor ~(keystone_admin)]$ ssh -i oskey.priv root.207.51 ssh: connect to host 10.65.207.51 port 22: No route to host [tuser1@interceptor ~(keystone_admin)]$ #=================================# Since, it says "No route to host", let's use 'virt-cat' to find out if there's a new IP address for the guest #=================================# [tuser1@interceptor ~(keystone_user1)]$ sudo virt-cat instance-0000000d /var/log/messages | grep 'dhclient.*bound to' | tail -5 Feb 25 07:21:18 localhost dhclient[636]: bound to 10.65.207.51 -- renewal in 55 seconds. Feb 25 07:22:14 localhost dhclient[636]: bound to 10.65.207.51 -- renewal in 47 seconds. Feb 25 07:23:01 localhost dhclient[636]: bound to 10.65.207.51 -- renewal in 43 seconds. Feb 25 07:23:44 localhost dhclient[636]: bound to 10.65.207.51 -- renewal in 54 seconds. Feb 25 07:24:38 localhost dhclient[636]: bound to 10.65.207.221 -- renewal in 33524 seconds. [tuser1@interceptor ~(keystone_user1)]$ #=================================# So, turns out, there's a new IP address (note the last line, from the above command) , but it doesn't seem to reflect in the "Networks" status when we list the running instances. -> Let's see if we can ssh into the new IP address #=================================# [tuser1@interceptor ~(keystone_user1)]$ ssh -i oskey.priv root.207.221 ssh: connect to host 10.65.207.221 port 22: No route to host #=================================# [tuser1@interceptor ~(keystone_user1)]$ ping 10.65.207.52 PING 10.65.207.52 (10.65.207.52) 56(84) bytes of data. From 10.65.207.49 icmp_seq=1 Destination Host Unreachable From 10.65.207.49 icmp_seq=2 Destination Host Unreachable From 10.65.207.49 icmp_seq=3 Destination Host Unreachable ^C --- 10.65.207.52 ping statistics --- 4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3844ms pipe 3 [tuser1@interceptor ~(keystone_user1)]$ #=================================# So, from the above, essentially the guest is hosed, and 'nova list' dubiously indicates the guest is ACTIVE. Fixed proposed upstream on the stable branch https://review.openstack.org/#/c/22873/ will be backported when accepted there Short: I still don't see the fix taking effect. the string "Compute service of <hostname> is unavailable at this time" does not relfect in logs (after enabling verbose/debug, restart all nova services)
Long:
====
Verification info:
1] Ensure to have the right version:
#-------------#
$ rpm -q openstack-nova ; uname -r ; arch
openstack-nova-2012.2.3-7.el6ost.noarch
2.6.32-358.el6.x86_64
x86_64
#-------------#
2] Ensure the fix is in:
#-------------#
$ rpm -q openstack-nova --changelog | grep 915274
- Handle unavailable host for live migration #915274
#-------------#
3] Restart all openstack nova services
#-------------#
$ for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep
running ; done | awk '{print $1}'` ; do service $j restart ; done
#-------------#
4] Try live-migration to an invalid host
#-------------#
[(keystone_user1)]$ nova live-migration f16-t4 maelstrom.lab.eng.pnq.redhat.com
ERROR: Policy doesn't allow compute_extension:admin_actions:migrateLive to be performed. (HTTP 403) (Request-ID: req-b054d839-2f6a-4af0-840c-17b522818f16)
#-------------#
5] Enable verbose/debug to "true" in nova.conf
5.1] Again restart all nova services (refer instruction 3] above)
6] Attempt to live migrate to an invalid host (as keystone_admin)
#-------------#
$ nova live-migration 639b3bf0-cb97-466c-9f8e-3cf369077e1f maelstrom.lab.eng.pnq.redhat.com
ERROR: Live migration of instance 639b3bf0-cb97-466c-9f8e-3cf369077e1f to host maelstrom.lab.eng.pnq.redhat.com failed (HTTP 400) (Request-ID: req-4f28ef34-3d82-4336-bbe7-db14d949f5cc)
#-------------#
Observations:
=============
a] The instance is still in migrating state:
#-------------#
$ nova list --all-tenants | grep f16-t4
| 639b3bf0-cb97-466c-9f8e-3cf369077e1f | f16-t4 | MIGRATING | net1=10.65.207.56 |
#-------------#
b] grep for the string "unavailable" , referred from here -- https://review.openstack.org/#/c/22873/1/nova/exception.py,
#-------------#
# tail -1000000 /var/log/nova/compute.log| grep -i "unavailable at this time"
[root@interceptor python-swiftc(keystone_admin)]#
#-------------#
Nothing is observed in the logs and the instance is still in "MIGRATING" state:
Am I missing anything trivial here ?
Somehow, the script doesn't seem to take effect on the previous machine.
VERIFIED on a new machine.
Version info:
#============================#
$ rpm -q openstack-nova-compute-2012.2.3-7.el6ost.noarch --changelog | grep -i 915274
- Handle unavailable host for live migration #915274
#============================#
Now, list and then try to migrate using UUID
#============================#
[root@node-01 ~(keystone_admin)]$ nova list
+--------------------------------------+--------+--------+--------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+--------+--------+--------------------------+
| 6fdcafa3-b86b-479e-b1cf-5e6859912e65 | server | ACTIVE | novanetwork=192.168.32.2 |
+--------------------------------------+--------+--------+--------------------------+
#============================#
[root@node-01 ~(keystone_admin)]$ nova live-migration 6fdcafa3-b86b-479e-b1cf-5e6859912e65 maelstrom.lab.eng.pnq.redhat.com
ERROR: Compute service of maelstrom.lab.eng.pnq.redhat.com is unavailable at this time.
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 276, in _process_data
rval = self.proxy.dispatch(ctxt, version, method, **args)
File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch
return getattr(proxyobj, method)(ctxt, **kwargs)
File "/usr/lib/python2.6/site-packages/nova/scheduler/manager.py", line 101, in live_migration
context, ex, request_spec)
File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
self.gen.next()
File "/usr/lib/python2.6/site-packages/nova/scheduler/manager.py", line 91, in live_migration
block_migration, disk_over_commit)
File "/usr/lib/python2.6/site-packages/nova/scheduler/driver.py", line 232, in schedule_live_migration
self._live_migration_dest_check(context, instance, dest)
File "/usr/lib/python2.6/site-packages/nova/scheduler/driver.py", line 281, in _live_migration_dest_check
raise exception.ComputeServiceUnavailable(host=dest)
ComputeServiceUnavailable: Compute service of maelstrom.lab.eng.pnq.redhat.com is unavailable at this time.
#============================#
#============================#
$ nova live-migration server maelstrom.lab.eng.pnq.redhat.com
ERROR: Compute service of maelstrom.lab.eng.pnq.redhat.com is unavailable at this time.
#============================#
-> Enumerating nova instances shows the server as ACTIVE. As expected.
#============================#
$ nova list
+--------------------------------------+--------+--------+--------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+--------+--------+--------------------------+
| 6fdcafa3-b86b-479e-b1cf-5e6859912e65 | server | ACTIVE | novanetwork=192.168.32.2 |
+--------------------------------------+--------+--------+--------------------------+
#============================#
The original bug is fixed.
For the above stack trace, filed a different bug -- https://bugzilla.redhat.com/show_bug.cgi?id=927167
Summary of the above comment: I tried live migration with both UUID and name of the nova instance. It now throws a valid exception that the remote host is not available. And the instance state is ACTIVE ==== $ ping 192.168.32.2 PING 192.168.32.2 (192.168.32.2) 56(84) bytes of data. 64 bytes from 192.168.32.2: icmp_seq=1 ttl=64 time=0.167 ms 64 bytes from 192.168.32.2: icmp_seq=2 ttl=64 time=1.15 ms ==== Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0709.html |