Bug 913613

Summary: Some Instances are shutoff after they're suspended externally to nova
Product: Red Hat OpenStack Reporter: Pádraig Brady <pbrady>
Component: openstack-novaAssignee: Pádraig Brady <pbrady>
Status: CLOSED ERRATA QA Contact: Kashyap Chamarthy <kchamart>
Severity: medium Docs Contact:
Priority: high    
Version: 2.0 (Folsom)CC: afazekas, beagles, ndipanov, oblaut, pbrady
Target Milestone: snapshot4Keywords: Triaged
Target Release: 2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-2012.2.3-2.el6ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 890512 Environment:
Last Closed: 2013-03-21 18:16:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pádraig Brady 2013-02-21 15:51:50 UTC
This is similar to bug #890512

To test, suspend the VM externally to nova and ensure that nova doesn't stop the VM after a while.

Comment 3 Kashyap Chamarthy 2013-03-20 06:09:32 UTC
1] Version info:
#-------------#
$ cat /etc/redhat-release ; arch
Red Hat Enterprise Linux Server release 6.4 (Santiago)
x86_64
#-------------#


== Verification info: ==

2] Ensure the fix is in:
#-------------#
$ rpm -q openstack-nova --changelog | grep 913613
- Fix state sync logic related to the PAUSED VM state #913613
#-------------#

2.1] Check the patche referenced in Comment #1
#-------------#
$ grep -i "Instance is paused" /usr/lib/python2.6/site-packages/nova/compute/manager.py -B2
                    # the VM state will go back to running after the external
                    # instrumentation is done. See bug 1097806 for details.
                    LOG.warn(_("Instance is paused unexpectedly. Ignore."),

#-------------#


3] Suspend a running guest externally using "virsh"

======
Notes: 

   - first run $ nova list
   - pick an instance
   - grep -i 639b3bf0-cb97-466c-9f8e-3cf369077e1f /etc/libvirt/qemu/*  
     - so that you get the instance id used by virsh 
======
#-------------#
$ nova list | grep -i f16-t4
| 639b3bf0-cb97-466c-9f8e-3cf369077e1f | f16-t4        | ACTIVE | net1=ww.xx.yy.zz |
#-------------#
[root@interceptor ~(keystone_user1)]# grep -i 639b3bf0-cb97-466c-9f8e-3cf369077e1f /etc/libvirt/qemu/*
/etc/libvirt/qemu/instance-00000044.xml:  <uuid>639b3bf0-cb97-466c-9f8e-3cf369077e1f</uuid>
/etc/libvirt/qemu/instance-00000044.xml:      <entry name='uuid'>639b3bf0-cb97-466c-9f8e-3cf369077e1f</entry>
#-------------#
$ virsh suspend instance-00000044
#-------------#



4] Observe nova compute log -- /var/log/nova/compute.log
#-------------#
.
.
2013-03-20 11:27:39 2751 WARNING nova.compute.manager [-] Found 7 in the database and 8 on the hypervisor.
2013-03-20 11:27:40 2751 WARNING nova.compute.manager [-] [instance: 639b3bf0-cb97-466c-9f8e-3cf369077e1f] Instance is paused unexpectedly. Ignore.
2013-03-20 11:27:42 2751 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 54168
2013-03-20 11:27:42 2751 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 36
2013-03-20 11:27:42 2751 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 41
2013-03-20 11:27:42 2751 INFO nova.compute.resource_tracker [-] Compute_service record updated for interceptor.lab.eng.pnq.redhat.com 
2013-03-20 11:27:45 2751 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 54168
2013-03-20 11:27:45 2751 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 36
.
.
#-------------#

You can see the message from the patch --- "Instance is paused unexpectedly. Ignore."

Comment 4 Kashyap Chamarthy 2013-03-20 06:13:54 UTC
Now, again, list the instance with nova, it still reports as 'ACTIVE'
#-------------#
$ nova list | grep f16-t4
| 639b3bf0-cb97-466c-9f8e-3cf369077e1f | f16-t4        | ACTIVE | net1=ww.xx.yy.zz |
#-------------#

However:
#-------------#
$ ssh -i oskey3.priv root.yy.zz
ssh: connect to host ww.xx.yy.zz port 22: No route to host
#-------------#


So, is this consistently reporting ? The 'ACTIVE' state above sounds dubious.



From the referenced patch, a fragment of the relevant 'elif' control flow statement:
#----------------#
.
.
.
                elif vm_power_state == power_state.PAUSED:
                    # Note(maoy): a VM may get into the paused state not only
                    # because the user request via API calls, but also
                    # due to (temporary) external instrumentations.
                    # Before the virt layer can reliably report the reason,
                    # we simply ignore the state discrepancy. In many cases,
                    # the VM state will go back to running after the external
                    # instrumentation is done. See bug 1097806 for details.
                    LOG.warn(_("Instance is paused unexpectedly. Ignore."),
#----------------#

Comment 5 Pádraig Brady 2013-03-20 11:57:22 UTC
This is OK, as from Nova's point of view it's active.
Are there warnings in the logs that the instance is paused?
If so it's good to go as per upstream arguments at least.

Comment 6 Kashyap Chamarthy 2013-03-20 12:17:10 UTC
Yes, there is a warning as noted in Comment #4 :

 "Instance is paused unexpectedly. Ignore."

which in the commit referenced.

Relevant log fragment:
#-------------#
.

2013-03-20 11:27:40 2751 WARNING nova.compute.manager [-] [instance: 639b3bf0-cb97-466c-9f8e-3cf369077e1f] Instance is paused unexpectedly. Ignore.
.
.
#-------------#


Conlusion: Turning the bug to VERIFIED, per above comment as the fix is effective, and is demonstrated in the nova compute log file.

Comment 8 errata-xmlrpc 2013-03-21 18:16:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0657.html