Bug 1660951 - race condition with privsep utime results in instance create failure
Summary: race condition with privsep utime results in instance create failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 15.0 (Stein)
Assignee: Lee Yarwood
QA Contact: nova-maint
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-19 16:45 UTC by Martin Schuppert
Modified: 2019-09-26 10:47 UTC (History)
9 users (show)

Fixed In Version: openstack-nova-19.0.0-0.20190403210352.0fd77f3.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:19:41 UTC
Target Upstream Version:


Attachments (Terms of Use)
compute log (542.93 KB, application/octet-stream)
2018-12-19 16:49 UTC, Martin Schuppert
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1809123 0 None None None 2018-12-19 18:33:45 UTC
OpenStack gerrit 625741 0 None None None 2018-12-19 16:45:59 UTC
OpenStack gerrit 641315 0 None None None 2019-04-04 16:30:30 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:20:11 UTC

Description Martin Schuppert 2018-12-19 16:45:59 UTC
Description of problem:

With /var/lib/nova/instances on shared nfs and creation of multiple instances in a single scheduling request (so far only reproduced when adding load like this in an idle env) results in error because of update utime [1]

$ openstack server create --flavor m1.small --image cirros-raw --nic net-id=735432ee-d871-4140-b304-225001906713 --min 8 --max 8 test-multi


2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [req-7c24626a-9e3c-4116-b3e1-885f996e8910 db775be1757f440bb6666e4f289998db 2e899a18bdab4434bfcaff52ff111192 - default default] [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e] Instance failed to spawn: OSError: [Errno 13] Permission denied
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e] Traceback (most recent call last):
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2239, in _build_resources
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     yield resources
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2019, in _build_and_run_instance
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     block_device_info=block_device_info)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3089, in spawn
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     block_device_info=block_device_info)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3467, in _create_image
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     fallback_from_host)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3558, in _create_and_inject_local_root
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     instance, size, fallback_from_host)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7613, in _try_fetch_image_cache
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     size=size)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 243, in cache
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     *args, **kwargs)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 601, in create_image
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     nova.privsep.path.utime(base)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     return self.channel.remote_call(name, args, kwargs)
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]   File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e]     raise exc_type(*result[2])
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e] OSError: [Errno 13] Permission denied
2018-12-18 15:23:02.767 1 ERROR nova.compute.manager [instance: 293393f2-e23f-40d4-bea7-6f4dbc634f3e] 


Version-Release number of selected component (if applicable):
tested on OSP13, but code is same on OSP14

How reproducible:


Steps to Reproduce:
1. deploy OSP13 env
2. setup nfs server (e.g. on undercloud)
3. configure nova to enable nfs for /var/lib/nova/instances
    NovaNfsEnabled: True
    NovaNfsShare: '192.168.24.1:/var/nfs/nova'
4. create multiple instances, like:
openstack server create --flavor m1.small --image cirros-raw --nic net-id=735432ee-d871-4140-b304-225001906713 --min 8 --max 8 test-multi

Actual results:
some instances fail with above mentioned error

Expected results:
update utime should not result in instance create to fail. 

Additional info:
While debugging this with Tim Rozet we tested a patch [2] with which the issue could not be reproduced.

[1] https://github.com/openstack/nova/blob/master/nova/privsep/path.py#L70-L78
[2] https://review.openstack.org/#/c/625741/

Comment 2 Martin Schuppert 2018-12-19 16:49:32 UTC
Created attachment 1515664 [details]
compute log

Comment 12 errata-xmlrpc 2019-09-21 11:19:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.