Bug 801791 - nova-compute fails to start on qemu-img error
nova-compute fails to start on qemu-img error
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: openstack-nova (Show other bugs)
17
Unspecified Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Pádraig Brady
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-09 09:23 EST by Derek Higgins
Modified: 2016-01-04 09:46 EST (History)
12 users (show)

See Also:
Fixed In Version: openstack-nova-2012.1-0.10.rc1.fc17
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-11 23:00:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Derek Higgins 2012-03-09 09:23:51 EST
While running Nova tests, (I havn't been able to pinpont exactly how), nova managed to get into a state where one of its instances didn't contain a disk file in the instance directory, causing the compute server to crash during startup. To reproduce this I manually deleted /var/lib/nova/instances/instance-0000003c/disk from an instance while the compute service was running.

This exception appears in the logs, every minute after the file is deleted but compute keeps runing and I can still destroy and create instances

2012-03-09 12:44:11 ERROR nova.manager [-] Error during ComputeManager.update_available_resource: Unexpected error while running command.
Command: qemu-img info /var/lib/nova/instances/instance-0000003c/disk
Exit code: 1
Stdout: ''
Stderr: "qemu-img: Could not open '/var/lib/nova/instances/instance-0000003c/disk': No such file or directory\n"
(nova.manager): TRACE: Traceback (most recent call last):
(nova.manager): TRACE:   File "/usr/lib/python2.7/site-packages/nova/manager.py", line 155, in periodic_tasks
(nova.manager): TRACE:     task(self, context)
(nova.manager): TRACE:   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2293, in update_available_resource
(nova.manager): TRACE:     self.driver.update_available_resource(context, self.host)
(nova.manager): TRACE:   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/connection.py", line 1786, in update_available_resource
(nova.manager): TRACE:     'disk_available_least': self.get_disk_available_least()}
(nova.manager): TRACE:   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/connection.py", line 2136, in get_disk_available_least
(nova.manager): TRACE:     disk_infos = utils.loads(self.get_instance_disk_info(i_name))
(nova.manager): TRACE:   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/connection.py", line 2095, in get_instance_disk_info
(nova.manager): TRACE:     out, err = utils.execute('qemu-img', 'info', path)
(nova.manager): TRACE:   File "/usr/lib/python2.7/site-packages/nova/utils.py", line 236, in execute
(nova.manager): TRACE:     cmd=' '.join(cmd))
(nova.manager): TRACE: ProcessExecutionError: Unexpected error while running command.
(nova.manager): TRACE: Command: qemu-img info /var/lib/nova/instances/instance-0000003c/disk
(nova.manager): TRACE: Exit code: 1
(nova.manager): TRACE: Stdout: ''
(nova.manager): TRACE: Stderr: "qemu-img: Could not open '/var/lib/nova/instances/instance-0000003c/disk': No such file or directory\n"
(nova.manager): TRACE:
2012-03-09 12:44:12 INFO nova.compute.manager [-] Updating host status




Restarting the compute service, results in the error below and the service fails to start


2012-03-09 12:44:33 AUDIT nova.service [-] Starting compute node (version 2012.1-LOCALBRANCH:LOCALREVISION)
2012-03-09 12:44:33 INFO nova.rpc.common [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Connected to AMQP server on localhost:5672

2012-03-09 12:44:33 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: 5c3bbebd-cd0b-4945-b5a2-071a8044fce1] Called setup_basic_filtering in nwfilter
2012-03-09 12:44:33 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: 5c3bbebd-cd0b-4945-b5a2-071a8044fce1] Ensuring static filters
2012-03-09 12:44:34 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using cidr '0.0.0.0/0'
2012-03-09 12:44:34 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using fw_rules: ['-m state --state INVALID -j DROP', '-m state --state ESTABLISHED,RELATED -j ACCEPT', '-j $provider', u'-s 10.0.0.1 -p udp --sport 67 --dport 68 -j ACCEPT', u'-s 10.0.0.0/24 -j ACCEPT', '-j ACCEPT -p tcp --dport 22 -s 0.0.0.0/0']

2012-03-09 12:44:40 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: e98da2c2-14f8-41a5-ab64-d8f84185658e] Called setup_basic_filtering in nwfilter
2012-03-09 12:44:40 INFO nova.virt.libvirt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] [instance: e98da2c2-14f8-41a5-ab64-d8f84185658e] Ensuring static filters
2012-03-09 12:44:40 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using cidr '0.0.0.0/0'
2012-03-09 12:44:40 INFO nova.virt.firewall [req-29839fbd-6c46-476c-8065-5359f5fbbf3a None None] Using fw_rules: ['-m state --state INVALID -j DROP', '-m state --state ESTABLISHED,RELATED -j ACCEPT', '-j $provider', u'-s 10.0.0.1 -p udp --sport 67 --dport 68 -j ACCEPT', u'-s 10.0.0.0/24 -j ACCEPT', '-j ACCEPT -p tcp --dport 22 -s 0.0.0.0/0']

2012-03-09 12:44:45 CRITICAL nova [-] Unexpected error while running command.
Command: qemu-img info /var/lib/nova/instances/instance-0000003c/disk
Exit code: 1
Stdout: ''
Stderr: "qemu-img: Could not open '/var/lib/nova/instances/instance-0000003c/disk': No such file or directory\n"


the difference seems to be that update_available_resource is called from
"/usr/lib/python2.7/site-packages/nova/manager.py", line 155, in periodic_tasks
while compute is running where the exception is caught

but on startup its called from
"/usr/lib/python2.7/site-packages/nova/service.py", line 174, in start
where is isn't caught

should both case be handled the same?


using
openstack-nova-2012.1-0.7.e4.fc17.noarch
python-nova-2012.1-0.7.e4.fc17.noarch
Comment 1 Pádraig Brady 2012-03-09 09:41:19 EST
Nice one Derek.

Well the missing disk file could happen if the process was stopped just at that time (bug, reboot, power, ...) so nova should be more resilient in this situation.

I'll look at making that file creation transactional,
and also make the startup robust in this condition.

cheers
Comment 2 Pádraig Brady 2012-03-16 12:17:52 EDT
The trigger for this issue may have been:
https://review.openstack.org/#change,5296
Also possible are transient image creation failures addressed with:
https://review.openstack.org/#change,5442

A direct avoidance of the startup issue is proposed with:
https://review.openstack.org/5456
That logs about missing files.

Note any other exceptions in this path will
cause the service to fail. I wasn't sure whether
to change that behavior.
Comment 3 Pádraig Brady 2012-03-16 12:32:46 EDT
https://bugs.launchpad.net/nova/+bug/956719 has the same symptom,
but the cause may be different?
Comment 4 Fedora Update System 2012-03-16 19:55:04 EDT
openstack-nova-2012.1-0.8.e4.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.8.e4.fc17
Comment 5 Fedora Update System 2012-03-17 12:50:07 EDT
Package openstack-nova-2012.1-0.8.e4.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing openstack-nova-2012.1-0.8.e4.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-3964/openstack-nova-2012.1-0.8.e4.fc17
then log in and leave karma (feedback).
Comment 6 Fedora Update System 2012-03-20 12:07:38 EDT
openstack-nova-2012.1-0.8.rc1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.8.rc1.fc17
Comment 7 Fedora Update System 2012-03-26 05:15:36 EDT
openstack-nova-2012.1-0.9.rc1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.9.rc1.fc17
Comment 8 Fedora Update System 2012-03-29 15:01:44 EDT
openstack-nova-2012.1-0.10.rc1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/openstack-nova-2012.1-0.10.rc1.fc17
Comment 9 Fedora Update System 2012-04-11 23:00:56 EDT
openstack-nova-2012.1-0.10.rc1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.