Bug 913613 - Some Instances are shutoff after they're suspended externally to nova
Summary: Some Instances are shutoff after they're suspended externally to nova
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 2.0 (Folsom)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: snapshot4
: 2.1
Assignee: Pádraig Brady
QA Contact: Kashyap Chamarthy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-21 15:51 UTC by Pádraig Brady
Modified: 2022-07-09 06:09 UTC (History)
5 users (show)

Fixed In Version: openstack-nova-2012.2.3-2.el6ost
Doc Type: Bug Fix
Doc Text:
Clone Of: 890512
Environment:
Last Closed: 2013-03-21 18:16:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1097806 0 None None None Never
OpenStack gerrit 20337 0 'None' 'MERGED' 'Fix state sync logic related to the PAUSED VM state' 2019-12-03 10:06:39 UTC
Red Hat Issue Tracker OSP-16353 0 None None None 2022-07-09 06:09:56 UTC
Red Hat Product Errata RHSA-2013:0657 0 normal SHIPPED_LIVE Moderate: openstack-nova security, bug fix, and enhancement update 2013-03-21 22:12:14 UTC

Description Pádraig Brady 2013-02-21 15:51:50 UTC
This is similar to bug #890512

To test, suspend the VM externally to nova and ensure that nova doesn't stop the VM after a while.

Comment 3 Kashyap Chamarthy 2013-03-20 06:09:32 UTC
1] Version info:
#-------------#
$ cat /etc/redhat-release ; arch
Red Hat Enterprise Linux Server release 6.4 (Santiago)
x86_64
#-------------#


== Verification info: ==

2] Ensure the fix is in:
#-------------#
$ rpm -q openstack-nova --changelog | grep 913613
- Fix state sync logic related to the PAUSED VM state #913613
#-------------#

2.1] Check the patche referenced in Comment #1
#-------------#
$ grep -i "Instance is paused" /usr/lib/python2.6/site-packages/nova/compute/manager.py -B2
                    # the VM state will go back to running after the external
                    # instrumentation is done. See bug 1097806 for details.
                    LOG.warn(_("Instance is paused unexpectedly. Ignore."),

#-------------#


3] Suspend a running guest externally using "virsh"

======
Notes: 

   - first run $ nova list
   - pick an instance
   - grep -i 639b3bf0-cb97-466c-9f8e-3cf369077e1f /etc/libvirt/qemu/*  
     - so that you get the instance id used by virsh 
======
#-------------#
$ nova list | grep -i f16-t4
| 639b3bf0-cb97-466c-9f8e-3cf369077e1f | f16-t4        | ACTIVE | net1=ww.xx.yy.zz |
#-------------#
[root@interceptor ~(keystone_user1)]# grep -i 639b3bf0-cb97-466c-9f8e-3cf369077e1f /etc/libvirt/qemu/*
/etc/libvirt/qemu/instance-00000044.xml:  <uuid>639b3bf0-cb97-466c-9f8e-3cf369077e1f</uuid>
/etc/libvirt/qemu/instance-00000044.xml:      <entry name='uuid'>639b3bf0-cb97-466c-9f8e-3cf369077e1f</entry>
#-------------#
$ virsh suspend instance-00000044
#-------------#



4] Observe nova compute log -- /var/log/nova/compute.log
#-------------#
.
.
2013-03-20 11:27:39 2751 WARNING nova.compute.manager [-] Found 7 in the database and 8 on the hypervisor.
2013-03-20 11:27:40 2751 WARNING nova.compute.manager [-] [instance: 639b3bf0-cb97-466c-9f8e-3cf369077e1f] Instance is paused unexpectedly. Ignore.
2013-03-20 11:27:42 2751 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 54168
2013-03-20 11:27:42 2751 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 36
2013-03-20 11:27:42 2751 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 41
2013-03-20 11:27:42 2751 INFO nova.compute.resource_tracker [-] Compute_service record updated for interceptor.lab.eng.pnq.redhat.com 
2013-03-20 11:27:45 2751 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 54168
2013-03-20 11:27:45 2751 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 36
.
.
#-------------#

You can see the message from the patch --- "Instance is paused unexpectedly. Ignore."

Comment 4 Kashyap Chamarthy 2013-03-20 06:13:54 UTC
Now, again, list the instance with nova, it still reports as 'ACTIVE'
#-------------#
$ nova list | grep f16-t4
| 639b3bf0-cb97-466c-9f8e-3cf369077e1f | f16-t4        | ACTIVE | net1=ww.xx.yy.zz |
#-------------#

However:
#-------------#
$ ssh -i oskey3.priv root.yy.zz
ssh: connect to host ww.xx.yy.zz port 22: No route to host
#-------------#


So, is this consistently reporting ? The 'ACTIVE' state above sounds dubious.



From the referenced patch, a fragment of the relevant 'elif' control flow statement:
#----------------#
.
.
.
                elif vm_power_state == power_state.PAUSED:
                    # Note(maoy): a VM may get into the paused state not only
                    # because the user request via API calls, but also
                    # due to (temporary) external instrumentations.
                    # Before the virt layer can reliably report the reason,
                    # we simply ignore the state discrepancy. In many cases,
                    # the VM state will go back to running after the external
                    # instrumentation is done. See bug 1097806 for details.
                    LOG.warn(_("Instance is paused unexpectedly. Ignore."),
#----------------#

Comment 5 Pádraig Brady 2013-03-20 11:57:22 UTC
This is OK, as from Nova's point of view it's active.
Are there warnings in the logs that the instance is paused?
If so it's good to go as per upstream arguments at least.

Comment 6 Kashyap Chamarthy 2013-03-20 12:17:10 UTC
Yes, there is a warning as noted in Comment #4 :

 "Instance is paused unexpectedly. Ignore."

which in the commit referenced.

Relevant log fragment:
#-------------#
.

2013-03-20 11:27:40 2751 WARNING nova.compute.manager [-] [instance: 639b3bf0-cb97-466c-9f8e-3cf369077e1f] Instance is paused unexpectedly. Ignore.
.
.
#-------------#


Conlusion: Turning the bug to VERIFIED, per above comment as the fix is effective, and is demonstrated in the nova compute log file.

Comment 8 errata-xmlrpc 2013-03-21 18:16:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0657.html


Note You need to log in before you can comment on or make changes to this bug.