Bug 1386420 - Nova fails to open console.log file at repeated host evacuation
Summary: Nova fails to open console.log file at repeated host evacuation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: z4
: 10.0 (Newton)
Assignee: Stephen Finucane
QA Contact: Joe H. Rahme
URL:
Whiteboard:
: 1422154 1441368 (view as bug list)
Depends On:
Blocks: 1335596 1356451 1491767
TreeView+ depends on / blocked
 
Reported: 2016-10-18 22:01 UTC by Marian Krcmarik
Modified: 2020-12-21 19:39 UTC (History)
27 users (show)

Fixed In Version: openstack-nova-14.0.7-9.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1491767 (view as bug list)
Environment:
Last Closed: 2017-09-14 13:21:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1634282 0 None None None 2016-10-18 22:01:26 UTC
OpenStack gerrit 392643 0 None MERGED Ignore IOError when creating 'console.log' 2020-12-17 05:12:21 UTC
OpenStack gerrit 454593 0 None MERGED Ignore IOError when creating 'console.log' 2020-12-17 05:12:20 UTC
Red Hat Product Errata RHBA-2017:2652 0 normal SHIPPED_LIVE openstack-nova bug fix advisory 2017-09-06 20:54:28 UTC

Description Marian Krcmarik 2016-10-18 22:01:27 UTC
I am trying to perform "ping-pong" host evacuation - take down ungracefully host A, evacuate host A to host B, take down ungracefully host B and evacuate to host A. The second evacuation fails at following error:
[Errno 13] Permission denied: '/var/lib/nova/instances/e17a325c-f0da-4f2f-aad3-4b1c098f295f/console.log'

Traceback:
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] Traceback (most recent call last):
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6566, in _error_out_instance_on_exception
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] yield
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2687, in rebuild_instance
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] bdms, recreate, on_shared_storage, preserve_ephemeral)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2731, in _do_rebuild_instance_with_claim
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._do_rebuild_instance(*args, **kwargs)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2846, in _do_rebuild_instance
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._rebuild_default_impl(**kwargs)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2612, in _rebuild_default_impl
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] block_device_info=new_block_device_info)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2574, in spawn
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._ensure_console_log_for_instance(instance)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2835, in _ensure_console_log_for_instance
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] libvirt_utils.file_open(console_file, 'a').close()
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 313, in file_open
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] return open(*args, **kwargs)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] IOError: [Errno 13] Permission denied: '/var/lib/nova/instances/a7ba9743-b425-4e47-aeb0-d545e66fffe9/console.log'

The problem is that the console file is left on host A with qemu:qemu ownership set by libvirt after first evacuation (the host A was taken down ungracefully) and once nova tries to open that file with append permission under nova user during second evacution from host B to host A It fails due to permissions. This try of nova to open the console.log file was introduce in:
https://github.com/openstack/nova/commit/ec6ed24cb844dcdf834d283d496c9b920ff1db83

Since I believe that in default installations usually dynamic_ownership=0 is not set and qemu is not started by the same user as Nova I would consider this as a regression.

Steps to reproduce:
1. Boot a VM on host A
2. Disrupt host A and trigger evacuation to host B
3. Wait for host A to be online again
4. Disrupt host B and trigger evacuation to host A

Expected result:
Successful 2nd host evacuation

Actual result:
Nova does not have permission to open console.log file and fails the evacuation.

The nova evacuation is used by Instance HA and at repeated evacuation triggered by Instance HA can fail.
Newton release used

Comment 1 Jon Schlueter 2017-01-06 16:43:32 UTC
noticed that a patch was proposed to master from the launchpad bug

https://review.openstack.org/#/c/392643/

Comment 2 Irina Petrova 2017-03-23 14:19:33 UTC
Hi Stephen. I see you wrote the upstream Ocata patch from c#1:
https://review.openstack.org/#/c/392643/

Any plans on backporting it to downstream Newton? It's just a simple exception that should do the trick, right? Any considerations that we're missing here?

Comment 4 Irina Petrova 2017-05-05 12:10:47 UTC
Hey Stephen.

Could please comment on this at least (we had the customer perform a small test for us):

1) take note of the user and group ownership of the console.log file.
     [root@cpt4 ~]# ls -l /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log 
     -rw-r--r--. 1 qemu qemu 19229 Mar 23 17:02 /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log

2) evacuate the instance from Compute A to Compute B.
     Instance evacuated from cpt4 to cpt7

3) once Compute A running again, check the user and group ownership of the console.log file once more (they should be the same, if not let me know).
     [root@cpt4 ~]# ls -l /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log
     -rw-r--r--. 1 root root 19229 Mar 23 17:02 /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log

     [root@cpt7 ~]# ls -l /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log
     -rw-r--r--. 1 qemu qemu 19157 Mar 23 17:10 /var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log

4) attempt to evacuate the instance from Compute B to Compute A (it should fail).
     the evacuation fails
     [Errno 13] Permission denied: '/var/lib/nova/instances/982a36e1-a913-456e-a6f5-cf2930a32d3a/console.log

5) change the user and group ownership of the console.log file on Compute A from qemu:qemu to nova:nova.
     I changed user and group ownership of the console.log on cpt4 from root:root to qemu:qemu and the evacuation fails
     then I changed user and group ownership of the console.log on cpt4 from root:root to nova:nova and the evacuation succeded

6) attempt to evacuate the instance from Compute B to Compute A once again (it should succeed).
     The evacuation succeded with console.log ownership nova:nova

Comment 5 Stephen Finucane 2017-05-05 13:18:54 UTC
(In reply to Irina Petrova from comment #2)
> Hi Stephen. I see you wrote the upstream Ocata patch from c#1:
> https://review.openstack.org/#/c/392643/
> 
> Any plans on backporting it to downstream Newton? It's just a simple
> exception that should do the trick, right? Any considerations that we're
> missing here?

Apologies for the delay - I'd waiting for the backported fix to OSP 10 to merge [1] and should have updated as such.

The downstream-only backport change has since been abandoned as the patch has been backported upstream [2]. This means it should be included as part of a rebase in the near future.

I'm changing this to POST while we wait for an internal build.

[1] https://code.engineering.redhat.com/gerrit/#/c/102808/
[2] https://review.openstack.org/#/c/454593/

Comment 17 Stephen Finucane 2017-05-30 09:01:01 UTC
*** Bug 1422154 has been marked as a duplicate of this bug. ***

Comment 37 Joe H. Rahme 2017-08-14 14:59:53 UTC
Verified as a medium severity bug. Feel free to reopen the ticket if you find any issue with it.

Comment 39 errata-xmlrpc 2017-09-06 17:04:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2652

Comment 40 Raoul Scarazzini 2017-09-12 18:08:39 UTC
I just encountered the same exact error in an environment with openstack-nova-14.0.8-2.el7ost.noarch.
I took the sosreports from all the nodes here [1]. Problem happened doing the same exact test Marian wrote on the bug description.

[1] http://file.rdu.redhat.com/~rscarazz/BZ1386420/

Comment 41 Mike Burns 2017-09-14 13:21:58 UTC
(In reply to Raoul Scarazzini from comment #40)
> I just encountered the same exact error in an environment with
> openstack-nova-14.0.8-2.el7ost.noarch.
> I took the sosreports from all the nodes here [1]. Problem happened doing
> the same exact test Marian wrote on the bug description.
> 
> [1] http://file.rdu.redhat.com/~rscarazz/BZ1386420/

Please do not reopen bugs that are closed errata.  You can either clone into a new bug or file a clean new bug.  The release process doesn't allow re-using bugs once they're closed errata.

Comment 42 Joe H. Rahme 2017-09-14 13:33:55 UTC
I'm the one who suggested reopening the ticket if there was an issue (in comment #37). I was under the impression that it was the proper workflow. My bad.

Sorry for the confusion.

Comment 43 Joe H. Rahme 2017-09-14 13:34:20 UTC
I'm the one who suggested reopening the ticket if there was an issue (in comment #37). I was under the impression that it was the proper workflow. My bad.

Sorry for the confusion.

Comment 45 Raoul Scarazzini 2017-09-14 15:32:51 UTC
Cloned here: https://bugzilla.redhat.com/show_bug.cgi?id=1491767

Lesson learned for the future.

Comment 46 Sylvain Bauza 2017-10-25 15:58:21 UTC
*** Bug 1441368 has been marked as a duplicate of this bug. ***

Comment 47 John Williams 2017-10-25 18:50:42 UTC
Which version of python-nova package is the fix delivered?  I have python-nova-14.0.8-2 installed and the driver.py file checking for EACCESS, yet the upstream code checks for EPERM?  Which is correct? 

Upstream code: 
        except IOError as ex:
            if ex.errno != errno.EPERM:
                raise
            LOG.debug('Console file already exists: %s.', console_file)

Comment 48 Stephen Finucane 2017-11-10 02:48:45 UTC
This was resolved in upstream commit 3072b0afbc1, which has been pulled in as downstream commit 9d299ae50ea. I guess we need to wait for the next point release?


Note You need to log in before you can comment on or make changes to this bug.