I am trying to perform "ping-pong" host evacuation - take down ungracefully host A, evacuate host A to host B, take down ungracefully host B and evacuate to host A. The second evacuation fails at following error: [Errno 13] Permission denied: '/var/lib/nova/instances/e17a325c-f0da-4f2f-aad3-4b1c098f295f/console.log' Traceback: 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] Traceback (most recent call last): 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6566, in _error_out_instance_on_exception 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] yield 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2687, in rebuild_instance 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] bdms, recreate, on_shared_storage, preserve_ephemeral) 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2731, in _do_rebuild_instance_with_claim 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._do_rebuild_instance(*args, **kwargs) 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2846, in _do_rebuild_instance 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._rebuild_default_impl(**kwargs) 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2612, in _rebuild_default_impl 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] block_device_info=new_block_device_info) 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2574, in spawn 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._ensure_console_log_for_instance(instance) 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2835, in _ensure_console_log_for_instance 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] libvirt_utils.file_open(console_file, 'a').close() 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 313, in file_open 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] return open(*args, **kwargs) 2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] IOError: [Errno 13] Permission denied: '/var/lib/nova/instances/a7ba9743-b425-4e47-aeb0-d545e66fffe9/console.log' The problem is that the console file is left on host A with qemu:qemu ownership set by libvirt after first evacuation (the host A was taken down ungracefully) and once nova tries to open that file with append permission under nova user during second evacution from host B to host A It fails due to permissions. This try of nova to open the console.log file was introduce in: https://github.com/openstack/nova/commit/ec6ed24cb844dcdf834d283d496c9b920ff1db83 Since I believe that in default installations usually dynamic_ownership=0 is not set and qemu is not started by the same user as Nova I would consider this as a regression. Steps to reproduce: 1. Boot a VM on host A 2. Disrupt host A and trigger evacuation to host B 3. Wait for host A to be online again 4. Disrupt host B and trigger evacuation to host A Expected result: Successful 2nd host evacuation Actual result: Nova does not have permission to open console.log file and fails the evacuation. The nova evacuation is used by Instance HA and at repeated evacuation triggered by Instance HA can fail. Newton release used
noticed that a patch was proposed to master from the launchpad bug https://review.openstack.org/#/c/392643/
Hi Stephen. I see you wrote the upstream Ocata patch from c#1: https://review.openstack.org/#/c/392643/ Any plans on backporting it to downstream Newton? It's just a simple exception that should do the trick, right? Any considerations that we're missing here?
Hey Stephen. Could please comment on this at least (we had the customer perform a small test for us): 1) take note of the user and group ownership of the console.log file. [root@cpt4 ~]# ls -l /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log -rw-r--r--. 1 qemu qemu 19229 Mar 23 17:02 /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log 2) evacuate the instance from Compute A to Compute B. Instance evacuated from cpt4 to cpt7 3) once Compute A running again, check the user and group ownership of the console.log file once more (they should be the same, if not let me know). [root@cpt4 ~]# ls -l /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log -rw-r--r--. 1 root root 19229 Mar 23 17:02 /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log [root@cpt7 ~]# ls -l /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log -rw-r--r--. 1 qemu qemu 19157 Mar 23 17:10 /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log 4) attempt to evacuate the instance from Compute B to Compute A (it should fail). the evacuation fails [Errno 13] Permission denied: '/var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log 5) change the user and group ownership of the console.log file on Compute A from qemu:qemu to nova:nova. I changed user and group ownership of the console.log on cpt4 from root:root to qemu:qemu and the evacuation fails then I changed user and group ownership of the console.log on cpt4 from root:root to nova:nova and the evacuation succeded 6) attempt to evacuate the instance from Compute B to Compute A once again (it should succeed). The evacuation succeded with console.log ownership nova:nova
(In reply to Irina Petrova from comment #2) > Hi Stephen. I see you wrote the upstream Ocata patch from c#1: > https://review.openstack.org/#/c/392643/ > > Any plans on backporting it to downstream Newton? It's just a simple > exception that should do the trick, right? Any considerations that we're > missing here? Apologies for the delay - I'd waiting for the backported fix to OSP 10 to merge [1] and should have updated as such. The downstream-only backport change has since been abandoned as the patch has been backported upstream [2]. This means it should be included as part of a rebase in the near future. I'm changing this to POST while we wait for an internal build. [1] https://code.engineering.redhat.com/gerrit/#/c/102808/ [2] https://review.openstack.org/#/c/454593/
*** Bug 1422154 has been marked as a duplicate of this bug. ***
Verified as a medium severity bug. Feel free to reopen the ticket if you find any issue with it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2652
I just encountered the same exact error in an environment with openstack-nova-14.0.8-2.el7ost.noarch. I took the sosreports from all the nodes here [1]. Problem happened doing the same exact test Marian wrote on the bug description. [1] http://file.rdu.redhat.com/~rscarazz/BZ1386420/
(In reply to Raoul Scarazzini from comment #40) > I just encountered the same exact error in an environment with > openstack-nova-14.0.8-2.el7ost.noarch. > I took the sosreports from all the nodes here [1]. Problem happened doing > the same exact test Marian wrote on the bug description. > > [1] http://file.rdu.redhat.com/~rscarazz/BZ1386420/ Please do not reopen bugs that are closed errata. You can either clone into a new bug or file a clean new bug. The release process doesn't allow re-using bugs once they're closed errata.
I'm the one who suggested reopening the ticket if there was an issue (in comment #37). I was under the impression that it was the proper workflow. My bad. Sorry for the confusion.
Cloned here: https://bugzilla.redhat.com/show_bug.cgi?id=1491767 Lesson learned for the future.
*** Bug 1441368 has been marked as a duplicate of this bug. ***
Which version of python-nova package is the fix delivered? I have python-nova-14.0.8-2 installed and the driver.py file checking for EACCESS, yet the upstream code checks for EPERM? Which is correct? Upstream code: except IOError as ex: if ex.errno != errno.EPERM: raise LOG.debug('Console file already exists: %s.', console_file)
This was resolved in upstream commit 3072b0afbc1, which has been pulled in as downstream commit 9d299ae50ea. I guess we need to wait for the next point release?