1310143 – Instance will not boot if image is deleted, and image cache image is lost

Bug 1310143 - Instance will not boot if image is deleted, and image cache image is lost

Summary: Instance will not boot if image is deleted, and image cache image is lost

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	7.0 (Kilo)
Assignee:	Lee Yarwood
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-19 15:06 UTC by Jeremy
Modified:	2023-09-14 03:18 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-21 16:14:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	282357	0	None	None	None	2016-02-21 14:53:35 UTC

Description Jeremy 2016-02-19 15:06:34 UTC

Description of problem:
 Instance's original image used to create it was deleted.Normally this would not be a problem, the instance can still be booted because nova's ImageCache mechanism keeps a cache'd version of the image stored on the compute node in /var/lib/nova/instances/_base/*. However if that directory is lost for some storage system failure for example then the cached image is gone and the original image is deleted from glance so the image does not boot.

ERROR seen when trying to boot the instnace.
/nova/nova-compute.log
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/image/api.py", line 182, in download
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     dst_path=dest_path)
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/image/glance.py", line 351, in download
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     _reraise_translated_image_exception(image_id)
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/image/glance.py", line 349, in download
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     image_chunks = self._client.call(context, 1, 'data', image_id)
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/image/glance.py", line 218, in call
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     return getattr(client.images, method)(*args, **kwargs)
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/glanceclient/v1/images.py", line 143, in data
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     % urlparse.quote(str(image_id)))
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/glanceclient/common/http.py", line 262, in get
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     return self._request('GET', url, **kwargs)
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/glanceclient/common/http.py", line 230, in _request
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher     raise exc.from_response(resp, resp.text)
2016-02-18 11:21:56.770 3275 TRACE oslo_messaging.rpc.dispatcher ImageNotFound: Image d7066af1-2e0c-4028-8a47-4e69f8a7a9b6 could not be found

Version-Release number of selected component (if applicable):
openstack-nova-api-2015.1.2-7.el7ost.noarch 

How reproducible:
100%

Steps to Reproduce:
1.boot an instance using a glance image
2.delete the glance image after the instance has booted.
3. Now simulate a storage failure by deleting the cached image /var/lib/nova/instances/_base/<cache-image-uuid>
4. reboot the instance and note it fails to boot with ImageNotFound errors in nova-compute.log.

Actual results:
fails to boot

Expected results:
boots

Additional info:
The following kcs describes how to recover if you have a backup of the original glance image used to create the instance. https://access.redhat.com/solutions/2172601

Comment 1 Jeremy 2016-02-19 15:09:44 UTC

//From engineering
Reviewing the code this looks like a race anyway between the storage
coming back online, each compute registering as a user of this shared
storage and finally an imagecache update being called by each compute.

https://github.com/openstack/nova/blob/master/nova/virt/storage_users.py#L76

At present, if a given compute has not registered as a user of the
instance store for 24 hours, it and any instances previously running on
it are not considered by the next cache update. As a result any images
cached for instances on these hosts will be removed as they no longer
appear to be in use by any undeleted instances.

Checking to see if storage was lost for 24 hours.

Comment 2 Jeremy 2016-02-19 15:19:25 UTC

Update from customer:

The storage was down for 24hrs.

Comment 4 Lee Yarwood 2016-02-19 17:12:56 UTC

(In reply to Jeremy from comment #2)
> Update from customer:
> 
> The storage was down for 24hrs.

Thanks that shows the working theory documented in c#1 is possibly valid but I'd still like to reproduce or confirm with logs from the customer.

Comment 5 Mike McCune 2016-03-28 23:00:31 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 6 Lee Yarwood 2017-01-21 16:14:04 UTC

Closing with INSUFFICIENT_DATA, I've been unable to reproduce this and the imagebackend is now being heavily refactored upstream. Happy to reopen if we see this again and have logs.

Comment 7 awaugama 2017-09-07 19:13:13 UTC

Closed without a fix therefore QE won't automate

Comment 8 Red Hat Bugzilla 2023-09-14 03:18:12 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.