1311762 – Unable to resume a suspended instance

Bug 1311762 - Unable to resume a suspended instance

Summary: Unable to resume a suspended instance

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	7.0 (Kilo)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	7.0 (Kilo)
Assignee:	Vladik Romanovsky
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-24 22:43 UTC by nalmond
Modified:	2019-10-10 11:19 UTC (History)
CC List:	14 users (show)
Fixed In Version:	openstack-nova-2015.1.3-4.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-03-24 13:55:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0507	0	normal	SHIPPED_LIVE	openstack-nova bug fix advisory	2016-03-24 17:52:57 UTC

Description nalmond 2016-02-24 22:43:56 UTC

Description of problem:
Instance enters error state when resuming from suspend. The following error is seen in the nova-compute log:

libvirtError: Cannot access backing file '/var/lib/nova/instances/_base/swap_1024' of storage file '/var/lib/nova/instances/813878c9-47c0-4430-9640-bddda7fe5b10/disk.swap' (as uid:107, gid:107): No such file or directory

Confirmed that file does not exist when this occurs. Workaround (to an extent) is to cycle the instance with nova start/stop before putting into a suspend state. This is also impacting nova migrate in a different environment.

How reproducible:
Every time an instance has been up for a while and is suspended or migrated.

Steps to Reproduce:
1. nova suspend <uuid>
2. nova resume <uuid>

Actual results:
instance enters error state

Expected results:
instance resumes successfully

Comment 3 David Hill 2016-02-29 17:13:39 UTC

Recreating the file manually and rebooting the compute is a workaround.   Would it be possible to get a hotfix on this?  This is a severe bug on an operational point of view.

Comment 4 Vladik Romanovsky 2016-02-29 21:17:08 UTC

(In reply to David Hill from comment #3)
> Recreating the file manually and rebooting the compute is a workaround.  
> Would it be possible to get a hotfix on this?  This is a severe bug on an
> operational point of view.

I've submitted a patch for review: https://code.engineering.redhat.com/gerrit/#/c/68594/

Comment 7 Vladik Romanovsky 2016-03-14 15:47:41 UTC

Hi David,

Were they resuming an instance that was created after the fix was applied or is it an instance that was created prior to the hot fix?

The patch fixes the block device mapping of an instance, which previously wasn't tracking the ephemeral and a swap disks.
Instance that were create before the fix was applied will still have the old block device mapping..

Thanks,
Vladik

Comment 15 nlevinki 2016-03-22 10:42:20 UTC

rpm is in
openstack-nova-common-2015.1.3-7.el7ost.noarch

automation passed
https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS7/job/rhos-jenkins-rhos-7.0-puddle-rhel-7.2-3networkers-packstack-neutron-ml2-vxlan-rabbitmq-tempest-git-all/34/

Comment 17 errata-xmlrpc 2016-03-24 13:55:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0507.html

Note You need to log in before you can comment on or make changes to this bug.