Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1401796

Summary:	Instance resize operation got failed now not able to bring up the instance
Product:	Red Hat OpenStack	Reporter:	VIKRANT <vaggarwa>
Component:	openstack-nova	Assignee:	Eoghan Glynn <eglynn>
Status:	CLOSED NOTABUG	QA Contact:	Prasanth Anbalagan <panbalag>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	7.0 (Kilo)	CC:	berrange, dasmith, eglynn, kchamart, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone:	async
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-02-09 21:07:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description VIKRANT 2016-12-06 06:30:08 UTC

Description of problem:

Instance resize operation got failed because of some backend storage issue and instance went into ERROR state. Now the backend storage issue is fixed but not able to start the instance.

It's showing following call trace while trying to start the instance. 

~~~
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 357, in decorated_function
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2868, in start_instance
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     self._power_on(context, instance)
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2841, in _power_on
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     block_device_info)
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2364, in power_on
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     self._hard_reboot(context, instance, network_info, block_device_info)
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2241, in _hard_reboot
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     block_device_info)
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6289, in _get_instance_disk_info
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     dk_size = int(os.path.getsize(path))
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/genericpath.py", line 49, in getsize
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher     return os.stat(filename).st_size
2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0/disk'
~~~



Version-Release number of selected component (if applicable):
RHEL osp 7

How reproducible:
Everytime for Cu. 

Steps to Reproduce:

1. Nova [boot from image] instance was running on compute-0 with two cinder volumes attached to it. Hitachi is backend storage for cinder.

2. Tried to resize the instance, instance moved from compute-0 to compute-1 during the resize operation but due to backend storage permission issue, it end up in ERROR state on compute-1.

3. Fix the backend storage issue. Reset the state of instance following below sequence of commands : 

~~~
[stack@manager ~]$ nova reset-state --active 5ca9bddc-5230-4c14-8baf-f052e06195f0
[stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0'
| 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0          | 56a3086cce3349dd888b89cf2bba1451 | ACTIVE  | -          | Shutdown    | Ultimatix Dev - AppDB=xx.xx.xx.xx |
[stack@manager ~]$ nova stop 5ca9bddc-5230-4c14-8baf-f052e06195f0
Request to stop server 5ca9bddc-5230-4c14-8baf-f052e06195f0 has been accepted.
[stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0'
| 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0          | 56a3086cce3349dd888b89cf2bba1451 | SHUTOFF | -          | Shutdown    | Ultimatix Dev - AppDB=xx.xx.xx.xx |
[stack@manager ~]$ nova start 5ca9bddc-5230-4c14-8baf-f052e06195f0
Request to start server 5ca9bddc-5230-4c14-8baf-f052e06195f0 has been accepted.
[stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0'
| 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0          | 56a3086cce3349dd888b89cf2bba1451 | SHUTOFF | powering-on | Shutdown    | Ultimatix Dev - AppDB=xx.xx.xx.xx |

[stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0'
| 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0          | 56a3086cce3349dd888b89cf2bba1451 | SHUTOFF | -          | Shutdown    | Ultimatix Dev - AppDB=xx.xx.xx.xx |
~~~



Actual results:
Instnace is not getting start.

Expected results:
We should be able to start the instance. 

Additional info:


On source compute node : 

~~~
[root@overcloud-compute-0 instances]# cd 5ca9bddc-5230-4c14-8baf-f052e06195f0/
[root@overcloud-compute-0 5ca9bddc-5230-4c14-8baf-f052e06195f0]# ll
total 26844944
-rw-r--r--. 1 nova nova 27489337344 Dec  3 06:04 disk
-rw-r--r--. 1 nova nova          79 Dec  3 06:04 disk.info
~~~

On destination compute node : 

We can see that disk is not present in "/var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0" hence while starting the instance it's not able to locate the disk and showing call trace in log file. 

~~~
[root@overcloud-compute-1 ~]# cd /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0
[root@overcloud-compute-1 5ca9bddc-5230-4c14-8baf-f052e06195f0]# ll
total 8
-rw-r--r--. 1 nova nova   79 Dec  6 05:57 disk.info
-rw-r--r--. 1 nova nova 3176 Dec  6 06:00 libvirt.xml

[root@overcloud-compute-1 5ca9bddc-5230-4c14-8baf-f052e06195f0]# cd /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0_resize/
[root@overcloud-compute-1 5ca9bddc-5230-4c14-8baf-f052e06195f0_resize]# ll
total 25939124
-rw-rw----. 1 root root       68071 Dec  3 05:46 console.log
-rw-r--r--. 1 root root 26561216512 Dec  3 05:46 disk
-rw-r--r--. 1 nova nova          79 Jul 12 06:15 disk.info
-rw-r--r--. 1 nova nova        3188 Nov 18 10:29 libvirt.xml
~~~


AFAIK, instance should be in confirmResize state as per my understanding so that we can confirm the resize and directory "5ca9bddc-5230-4c14-8baf-f052e06195f0_resize" then should remove automatically and disk should automatically start appearing in "cd /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0" directory.

Comment 6 Sylvain Bauza 2016-12-06 10:31:30 UTC

Just a comment, I think there is a confusion between source and destination.
As resize renames the instance path only on the source node, we can assume that the initial source is compute-1 and the initial destination is compute-0.

That is confirmed by http://pastebin.test.redhat.com/436557 that shows Nova still considering the instance on the source compute as the exception occurred before its state was changed.

Consequently, I think that the resize has not really made too much invasive changes, and that we can probably try to resurrect the instance on the source host. For that, I would suggest the following steps :

#1 backup /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0_resize/ (as it's a pet instance, I want to make sure we can somehow store the data)

#2 rename on compute-1 var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0 to something else (worth keeping the files for possible revert) and rename /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0_resize/ to /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0/

#3 nova reset-state 5ca9bddc-5230-4c14-8baf-f052e06195f0

#4 nova reboot 5ca9bddc-5230-4c14-8baf-f052e06195f0

Comment 7 Sylvain Bauza 2016-12-06 10:32:59 UTC

Oops, I forgot to mention that the reset-state command has to use the --active flag.

Comment 11 awaugama 2017-08-30 17:55:14 UTC

WONTFIX/NOTABUG therefore QE Won't automate