Bug 891422

Summary: Compute Node should check for VM state inconsistencies on service startup vs. waiting for 10 minutes
Product: Red Hat OpenStack Reporter: Perry Myers <pmyers>
Component: openstack-novaAssignee: Nikola Dipanov <ndipanov>
Status: CLOSED ERRATA QA Contact: Yaniv Kaul <ykaul>
Severity: low Docs Contact:
Priority: medium    
Version: 2.0 (Folsom)CC: eglynn, jkt, ndipanov
Target Milestone: snapshot2Keywords: Triaged
Target Release: 2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-2012.2.2-9.el6ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-14 18:24:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Perry Myers 2013-01-02 21:50:55 UTC
Description of problem:
computenode has a periodic task that refreshes the state in the database for any VMs that don't match the recorded state.  It looks like by default this task runs every 10 minutes.  But it would be better on service start to do this task immediately vs. waiting for 10 minutes to refresh the state.

Version-Release number of selected component (if applicable):
openstack-nova-compute-2012.2.1-2.el6ost.noarch

How reproducible:
Every time

Steps to Reproduce:
1. Start up an instance on a compute node
2. Shut down the compute node (which will kill both the CN service as well as the VM)
3. Start up the compute node
  
Actual results:
The compute node will start up, but periodic task that refreshes the database to sync the VM states takes 10 minutes to run.

Expected results:
State should be refreshed immediately

Additional info:
2013-01-02 16:33:50 16247 INFO nova.openstack.common.rpc.impl_qpid [-] Connected to AMQP server on 192.168.15.2:5672
2013-01-02 16:33:50 INFO nova.compute.manager [req-d954dd04-4831-4ed8-ba2e-7617b0c752e9 None None] Updating host status

...

2013-01-02 16:43:53 16247 WARNING nova.compute.manager [-] Found 3 in the database and 1 on the hypervisor.
2013-01-02 16:43:53 16247 WARNING nova.compute.manager [-] [instance: 21879e7d-9a44-48aa-b507-ea88690860bb] Instance shutdown by itself. Calling the stop API.
2013-01-02 16:43:54 16247 INFO nova.virt.libvirt.driver [-] [instance: 21879e7d-9a44-48aa-b507-ea88690860bb] Instance destroyed successfully.

Comment 1 Russell Bryant 2013-01-15 15:53:31 UTC
There is an option called "resume_guests_state_on_host_boot" that will change the behavior of this use case (see init_host() in nova/compute/manager.py).  When enabled, it would have automatically restarted the instances that were supposed to be running when nova-compute starts up.  Personally that is what I would expect to happen by default.  We could consider changing this to be on by default in RHOS, I suppose.

There are cases covered by the sync_power_states periodic task that are not covered by init_host().  It seems like those two methods need a bit of refactoring so that init_host can sync the power state of every instance as it traverses them.

However, it does seem like doing sync_power_states() in init_host() would be good, too.

Comment 2 Russell Bryant 2013-01-15 15:57:00 UTC
Pretend the last sentence in my last comment isn't there ...

Comment 6 Yaniv Kaul 2013-01-29 18:43:15 UTC
Note to QE: test both service shutdown and restart as well as host cycle work with that parameter:
1. That VMs re-run on that host.
2. That if you opt to run that VM on a different host, they do NOT run.

Comment 8 Yaniv Kaul 2013-02-10 15:25:33 UTC
Verified on:
[root@cougar10 ~]# rpm -qa |grep nova
python-novaclient-2.10.0-2.el6ost.noarch
python-nova-2012.2.2-9.el6ost.noarch
openstack-nova-network-2012.2.2-9.el6ost.noarch
openstack-nova-common-2012.2.2-9.el6ost.noarch
openstack-nova-compute-2012.2.2-9.el6ost.noarch

Comment 10 errata-xmlrpc 2013-02-14 18:24:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0260.html