This seems like it might be caused by a race on start up among the services. Some more investigation is required to confirm.
After monitoring the interaction between the services, including libvirt, I discovered that there is a race condition in the startup scripts between openstack and libvirt-guests. The "Failed to resume" is being caused because the libvirt-guests script is in the process of restarting the VMs as well as the startup in nova. It is a tricky interaction that isn't seen all of the time because all of the openstack startup must be running past the initialization stage for any of the libvirt-guests attempts to run. Furthermore, for any of the libvirt-guests attempts to actually succeed, nova has to have setup the network interface etc.. I am testing a patch that makes nova *not* error out the VMs when this happens. Most likely the appropriate thing to do is to ignore/log the exception when it comes from the createDomainWithFlags() call at the site that it occurs. Other configuration will be necessary for the VM to be of any use. This is probably relevant to grizzly and upstream folsom as well.
As a sidenote, you can pretty much tell when this happens by running virsh list. If the VM you are expecting not to be there is actually running, it is a pretty good indicator this is what has occurred.
Probably a more effective way to resolve this is to disable automatic restart of persistent VMs by changing the value of the ON_BOOT variable in libvirt_guests to an empty string etc. This will avoid any other potential race conditions that might occur by having libvirt concurrently kicking off VMs.
Dan, it doesn't look to me like we have agreement between OpenStack and libvirt about what component is going to be starting VMs. Can you shed any light on how this is supposed to work?
The libvirt-guests script simply should be enabled on any OpenStack system. Nova must retain full control over startup.
And when i said 'enabled' there, i meant libvirt-guests should be DISABLED.
Adding addtional depands on bug, since resume after host reboots = true will endup im shutoff or error states
What needs to happen here ? I see it was assigned to packstack but am not clear what needs to happen, disable the libvirt-guests script ?
Yes, that is correct. We need to alter the libvirt-guests file to prevent the start or restart actions from booting VMs or remove the script altogether.
Verified NVR: openstack-packstack-2012.2.3-0.12.dev495 Verifications steps: ==================== 1. Installed openstack via packstack (all-in-one topology). 2. Uploaded an image to glance. 3. Launched 4 instances. 4. Rebooted the server. 5. Re-connected and listed instances via nova # nova list +--------------------------------------+------+---------+--------------------------+ | ID | Name | Status | Networks | +--------------------------------------+------+---------+--------------------------+ | 2565330b-056f-470e-b555-3849a9a9a3fe | test | SHUTOFF | novanetwork=192.168.32.4 | | 5b94a2de-5c90-48c3-a3a8-4823dfc00403 | test | SHUTOFF | novanetwork=192.168.32.3 | | ae80283d-98b4-4d85-969e-e93287b7d467 | test | SHUTOFF | novanetwork=192.168.32.5 | | cb39ee9e-2f62-4cb7-9f8a-c7e57aa1c284 | test | SHUTOFF | novanetwork=192.168.32.2 | +--------------------------------------+------+---------+--------------------------+ 6. Rebooted (hard reboot) all instances and Verified that their status changed back to ACTIVE +--------------------------------------+------+---------+--------------------------+ | ID | Name | Status | Networks | +--------------------------------------+------+---------+--------------------------+ | 2565330b-056f-470e-b555-3849a9a9a3fe | test | SHUTOFF | novanetwork=192.168.32.4 | | 5b94a2de-5c90-48c3-a3a8-4823dfc00403 | test | SHUTOFF | novanetwork=192.168.32.3 | | ae80283d-98b4-4d85-969e-e93287b7d467 | test | SHUTOFF | novanetwork=192.168.32.5 | | cb39ee9e-2f62-4cb7-9f8a-c7e57aa1c284 | test | SHUTOFF | novanetwork=192.168.32.2 | +--------------------------------------+------+---------+--------------------------+ 7. Verified that there are no errors both in nova and libvirt logs Additional Info: ================ answer-file I used: CONFIG_GLANCE_INSTALL=y CONFIG_CINDER_INSTALL=y CONFIG_NOVA_INSTALL=y CONFIG_HORIZON_INSTALL=y CONFIG_SWIFT_INSTALL=y CONFIG_CLIENT_INSTALL=y CONFIG_NTP_SERVERS= CONFIG_NAGIOS_INSTALL=y CONFIG_SSH_KEY=/root/.ssh/id_rsa.pub CONFIG_MYSQL_HOST=IP_Address CONFIG_MYSQL_USER=root CONFIG_MYSQL_PW=11b22a572e4e4dcd CONFIG_QPID_HOST=IP_Address CONFIG_KEYSTONE_HOST=IP_Address CONFIG_KEYSTONE_DB_PW=194892c3b5964441 CONFIG_KEYSTONE_ADMIN_TOKEN=b078d5e8ef11425ba1eb2b8204000f57 CONFIG_KEYSTONE_ADMIN_PW=123456 CONFIG_GLANCE_HOST=IP_Address CONFIG_GLANCE_DB_PW=9a2851280f594546 CONFIG_GLANCE_KS_PW=3037970a19c741b1 CONFIG_CINDER_HOST=IP_Address CONFIG_CINDER_DB_PW=078eb967731c49fc CONFIG_CINDER_KS_PW=05d43dda2d574c81 CONFIG_CINDER_VOLUMES_CREATE=y CONFIG_CINDER_VOLUMES_SIZE=20G CONFIG_NOVA_API_HOST=IP_Address CONFIG_NOVA_CERT_HOST=IP_Address CONFIG_NOVA_VNCPROXY_HOST=IP_Address CONFIG_NOVA_COMPUTE_HOSTS=IP_Address CONFIG_NOVA_COMPUTE_PRIVIF=eth1 CONFIG_NOVA_NETWORK_HOST=IP_Address CONFIG_NOVA_DB_PW=4aba7dd07c2a46aa CONFIG_NOVA_KS_PW=d6f04927d8fb4f23 CONFIG_NOVA_NETWORK_PUBIF=eth2 CONFIG_NOVA_NETWORK_PRIVIF=eth1 CONFIG_NOVA_NETWORK_FIXEDRANGE=192.168.32.0/22 CONFIG_NOVA_NETWORK_FLOATRANGE=10.3.4.0/22 CONFIG_NOVA_NETWORK_AUTOASSIGNFLOATINGIP=n CONFIG_NOVA_SCHED_HOST=IP_Address CONFIG_NOVA_SCHED_CPU_ALLOC_RATIO=16.0 CONFIG_NOVA_SCHED_RAM_ALLOC_RATIO=1.5 CONFIG_OSCLIENT_HOST=IP_Address CONFIG_HORIZON_HOST=IP_Address CONFIG_HORIZON_SSL=y CONFIG_SSL_CERT= CONFIG_SSL_KEY= CONFIG_SWIFT_PROXY_HOSTS=IP_Address CONFIG_SWIFT_KS_PW=ae9003c2cacd424a CONFIG_SWIFT_STORAGE_HOSTS=IP_Address CONFIG_SWIFT_STORAGE_ZONES=1 CONFIG_SWIFT_STORAGE_REPLICAS=1 CONFIG_SWIFT_STORAGE_FSTYPE=ext4 CONFIG_REPO= CONFIG_RH_USER= CONFIG_RH_PW= CONFIG_RH_BETA_REPO=n CONFIG_SATELLITE_URL= CONFIG_SATELLITE_USER= CONFIG_SATELLITE_PW= CONFIG_SATELLITE_AKEY= CONFIG_SATELLITE_CACERT= CONFIG_SATELLITE_PROFILE= CONFIG_SATELLITE_FLAGS= CONFIG_SATELLITE_PROXY= CONFIG_SATELLITE_PROXY_USER= CONFIG_SATELLITE_PROXY_PW= CONFIG_NAGIOS_HOST=IP_Address CONFIG_NAGIOS_PW=123456
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1082.html