Hide Forgot
Description of problem: Instance resize operation got failed because of some backend storage issue and instance went into ERROR state. Now the backend storage issue is fixed but not able to start the instance. It's showing following call trace while trying to start the instance. ~~~ 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 357, in decorated_function 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2868, in start_instance 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher self._power_on(context, instance) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2841, in _power_on 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher block_device_info) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2364, in power_on 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher self._hard_reboot(context, instance, network_info, block_device_info) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2241, in _hard_reboot 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher block_device_info) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6289, in _get_instance_disk_info 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher dk_size = int(os.path.getsize(path)) 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib64/python2.7/genericpath.py", line 49, in getsize 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher return os.stat(filename).st_size 2016-12-06 06:00:11.548 5710 TRACE oslo_messaging.rpc.dispatcher OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0/disk' ~~~ Version-Release number of selected component (if applicable): RHEL osp 7 How reproducible: Everytime for Cu. Steps to Reproduce: 1. Nova [boot from image] instance was running on compute-0 with two cinder volumes attached to it. Hitachi is backend storage for cinder. 2. Tried to resize the instance, instance moved from compute-0 to compute-1 during the resize operation but due to backend storage permission issue, it end up in ERROR state on compute-1. 3. Fix the backend storage issue. Reset the state of instance following below sequence of commands : ~~~ [stack@manager ~]$ nova reset-state --active 5ca9bddc-5230-4c14-8baf-f052e06195f0 [stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0' | 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0 | 56a3086cce3349dd888b89cf2bba1451 | ACTIVE | - | Shutdown | Ultimatix Dev - AppDB=xx.xx.xx.xx | [stack@manager ~]$ nova stop 5ca9bddc-5230-4c14-8baf-f052e06195f0 Request to stop server 5ca9bddc-5230-4c14-8baf-f052e06195f0 has been accepted. [stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0' | 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0 | 56a3086cce3349dd888b89cf2bba1451 | SHUTOFF | - | Shutdown | Ultimatix Dev - AppDB=xx.xx.xx.xx | [stack@manager ~]$ nova start 5ca9bddc-5230-4c14-8baf-f052e06195f0 Request to start server 5ca9bddc-5230-4c14-8baf-f052e06195f0 has been accepted. [stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0' | 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0 | 56a3086cce3349dd888b89cf2bba1451 | SHUTOFF | powering-on | Shutdown | Ultimatix Dev - AppDB=xx.xx.xx.xx | [stack@manager ~]$ nova list --all-tenants | grep '5ca9bddc-5230-4c14-8baf-f052e06195f0' | 5ca9bddc-5230-4c14-8baf-f052e06195f0 | GESS_DOCKER0 | 56a3086cce3349dd888b89cf2bba1451 | SHUTOFF | - | Shutdown | Ultimatix Dev - AppDB=xx.xx.xx.xx | ~~~ Actual results: Instnace is not getting start. Expected results: We should be able to start the instance. Additional info: On source compute node : ~~~ [root@overcloud-compute-0 instances]# cd 5ca9bddc-5230-4c14-8baf-f052e06195f0/ [root@overcloud-compute-0 5ca9bddc-5230-4c14-8baf-f052e06195f0]# ll total 26844944 -rw-r--r--. 1 nova nova 27489337344 Dec 3 06:04 disk -rw-r--r--. 1 nova nova 79 Dec 3 06:04 disk.info ~~~ On destination compute node : We can see that disk is not present in "/var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0" hence while starting the instance it's not able to locate the disk and showing call trace in log file. ~~~ [root@overcloud-compute-1 ~]# cd /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0 [root@overcloud-compute-1 5ca9bddc-5230-4c14-8baf-f052e06195f0]# ll total 8 -rw-r--r--. 1 nova nova 79 Dec 6 05:57 disk.info -rw-r--r--. 1 nova nova 3176 Dec 6 06:00 libvirt.xml [root@overcloud-compute-1 5ca9bddc-5230-4c14-8baf-f052e06195f0]# cd /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0_resize/ [root@overcloud-compute-1 5ca9bddc-5230-4c14-8baf-f052e06195f0_resize]# ll total 25939124 -rw-rw----. 1 root root 68071 Dec 3 05:46 console.log -rw-r--r--. 1 root root 26561216512 Dec 3 05:46 disk -rw-r--r--. 1 nova nova 79 Jul 12 06:15 disk.info -rw-r--r--. 1 nova nova 3188 Nov 18 10:29 libvirt.xml ~~~ AFAIK, instance should be in confirmResize state as per my understanding so that we can confirm the resize and directory "5ca9bddc-5230-4c14-8baf-f052e06195f0_resize" then should remove automatically and disk should automatically start appearing in "cd /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0" directory.
Just a comment, I think there is a confusion between source and destination. As resize renames the instance path only on the source node, we can assume that the initial source is compute-1 and the initial destination is compute-0. That is confirmed by http://pastebin.test.redhat.com/436557 that shows Nova still considering the instance on the source compute as the exception occurred before its state was changed. Consequently, I think that the resize has not really made too much invasive changes, and that we can probably try to resurrect the instance on the source host. For that, I would suggest the following steps : #1 backup /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0_resize/ (as it's a pet instance, I want to make sure we can somehow store the data) #2 rename on compute-1 var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0 to something else (worth keeping the files for possible revert) and rename /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0_resize/ to /var/lib/nova/instances/5ca9bddc-5230-4c14-8baf-f052e06195f0/ #3 nova reset-state 5ca9bddc-5230-4c14-8baf-f052e06195f0 #4 nova reboot 5ca9bddc-5230-4c14-8baf-f052e06195f0
Oops, I forgot to mention that the reset-state command has to use the --active flag.
WONTFIX/NOTABUG therefore QE Won't automate