While running Nova tests, (I havn't been able to pinpont exactly how), nova managed to get into a state where one of its instances didn't contain a disk file in the instance directory, causing the compute server to crash during startup. To reproduce this I manually deleted /var/lib/nova/instances/instance-0000003c/disk from an instance while the compute service was running. This exception appears in the logs, every minute after the file is deleted but compute keeps runing and I can still destroy and create instances 2012-03-09 12:44:11 ERROR nova.manager [-] Error during ComputeManager.update_available_resource: Unexpected error while running command. Command: qemu-img info /var/lib/nova/instances/instance-0000003c/disk Exit code: 1 Stdout: '' Stderr: "qemu-img: Could not open '/var/lib/nova/instances/instance-0000003c/disk': No such file or directory\n" (nova.manager): TRACE: Traceback (most recent call last): (nova.manager): TRACE: File "/usr/lib/python2.7/site-packages/nova/manager.py", line 155, in periodic_tasks (nova.manager): TRACE: task(self, context) (nova.manager): TRACE: File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2293, in update_available_resource (nova.manager): TRACE: self.driver.update_available_resource(context, self.host) (nova.manager): TRACE: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/connection.py", line 1786, in update_available_resource (nova.manager): TRACE: 'disk_available_least': self.get_disk_available_least()} (nova.manager): TRACE: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/connection.py", line 2136, in get_disk_available_least (nova.manager): TRACE: disk_infos = utils.loads(self.get_instance_disk_info(i_name)) (nova.manager): TRACE: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/connection.py", line 2095, in get_instance_disk_info (nova.manager): TRACE: out, err = utils.execute('qemu-img', 'info', path) (nova.manager): TRACE: File "/usr/lib/python2.7/site-packages/nova/utils.py", line 236, in execute (nova.manager): TRACE: cmd=' '.join(cmd)) (nova.manager): TRACE: ProcessExecutionError: Unexpected error while running command. (nova.manager): TRACE: Command: qemu-img info /var/lib/nova/instances/instance-0000003c/disk (nova.manager): TRACE: Exit code: 1 (nova.manager): TRACE: Stdout: '' (nova.manager): TRACE: Stderr: "qemu-img: Could not open '/var/lib/nova/instances/instance-0000003c/disk': No such file or directory\n" (nova.manager): TRACE: 2012-03-09 12:44:12 INFO nova.compute.manager [-] Updating host status Restarting the compute service, results in the error below and the service fails to start 2012-03-09 12:44:33 AUDIT nova.service [-] Starting compute node (version 2012.1-LOCALBRANCH:LOCALREVISION) 2012-03-09 12:44:33 INFO nova.rpc.common [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Connected to AMQP server on localhost:5672 2012-03-09 12:44:33 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: 5c3bbebd-cd0b-4945-b5a2-071a8044fce1] Called setup_basic_filtering in nwfilter 2012-03-09 12:44:33 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: 5c3bbebd-cd0b-4945-b5a2-071a8044fce1] Ensuring static filters 2012-03-09 12:44:34 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using cidr '0.0.0.0/0' 2012-03-09 12:44:34 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using fw_rules: ['-m state --state INVALID -j DROP', '-m state --state ESTABLISHED,RELATED -j ACCEPT', '-j $provider', u'-s 10.0.0.1 -p udp --sport 67 --dport 68 -j ACCEPT', u'-s 10.0.0.0/24 -j ACCEPT', '-j ACCEPT -p tcp --dport 22 -s 0.0.0.0/0'] 2012-03-09 12:44:40 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: e98da2c2-14f8-41a5-ab64-d8f84185658e] Called setup_basic_filtering in nwfilter 2012-03-09 12:44:40 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: e98da2c2-14f8-41a5-ab64-d8f84185658e] Ensuring static filters 2012-03-09 12:44:40 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using cidr '0.0.0.0/0' 2012-03-09 12:44:40 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using fw_rules: ['-m state --state INVALID -j DROP', '-m state --state ESTABLISHED,RELATED -j ACCEPT', '-j $provider', u'-s 10.0.0.1 -p udp --sport 67 --dport 68 -j ACCEPT', u'-s 10.0.0.0/24 -j ACCEPT', '-j ACCEPT -p tcp --dport 22 -s 0.0.0.0/0'] 2012-03-09 12:44:45 CRITICAL nova [-] Unexpected error while running command. Command: qemu-img info /var/lib/nova/instances/instance-0000003c/disk Exit code: 1 Stdout: '' Stderr: "qemu-img: Could not open '/var/lib/nova/instances/instance-0000003c/disk': No such file or directory\n" the difference seems to be that update_available_resource is called from "/usr/lib/python2.7/site-packages/nova/manager.py", line 155, in periodic_tasks while compute is running where the exception is caught but on startup its called from "/usr/lib/python2.7/site-packages/nova/service.py", line 174, in start where is isn't caught should both case be handled the same? using openstack-nova-2012.1-0.7.e4.fc17.noarch python-nova-2012.1-0.7.e4.fc17.noarch
Nice one Derek. Well the missing disk file could happen if the process was stopped just at that time (bug, reboot, power, ...) so nova should be more resilient in this situation. I'll look at making that file creation transactional, and also make the startup robust in this condition. cheers
The trigger for this issue may have been: https://review.openstack.org/#change,5296 Also possible are transient image creation failures addressed with: https://review.openstack.org/#change,5442 A direct avoidance of the startup issue is proposed with: https://review.openstack.org/5456 That logs about missing files. Note any other exceptions in this path will cause the service to fail. I wasn't sure whether to change that behavior.
https://bugs.launchpad.net/nova/+bug/956719 has the same symptom, but the cause may be different?
openstack-nova-2012.1-0.8.e4.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.8.e4.fc17
Package openstack-nova-2012.1-0.8.e4.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing openstack-nova-2012.1-0.8.e4.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-3964/openstack-nova-2012.1-0.8.e4.fc17 then log in and leave karma (feedback).
openstack-nova-2012.1-0.8.rc1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.8.rc1.fc17
openstack-nova-2012.1-0.9.rc1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.9.rc1.fc17
openstack-nova-2012.1-0.10.rc1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.10.rc1.fc17
openstack-nova-2012.1-0.10.rc1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.