Bug 1608913 - [OSP12] starting nova_compute docker on a new compute node makes guest disks on nfs share read-only
Summary: [OSP12] starting nova_compute docker on a new compute node makes guest disks ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: zstream
: 12.0 (Pike)
Assignee: Martin Schuppert
QA Contact: Archit Modi
URL:
Whiteboard:
Depends On: 1594261
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-26 13:44 UTC by Martin Schuppert
Modified: 2021-12-10 16:54 UTC (History)
25 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.12-12.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, with shared storage for `/var/lib/nova/instances`, restarting nova_compute on any compute node resulted in owner/group change of the console.log and virtual ephemeral disks of the instance. As a result, instances lost access to virtual ephemeral disks. With this update, the scripts that modify the ownership of the instance files in `var/lib/nova/instances` no longer cause loss of access to the instance files during restart of nova compute.
Clone Of: 1594261
Environment:
Last Closed: 2018-12-05 18:52:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1778465 0 None None None 2018-07-26 13:44:53 UTC
Red Hat Issue Tracker OSP-11499 0 None None None 2021-12-10 16:54:25 UTC
Red Hat Product Errata RHBA-2018:3789 0 None None None 2018-12-05 18:53:41 UTC

Comment 2 PetarJ 2018-10-17 21:22:34 UTC
Hello,

i have a question regarding this bug and its clone: https://bugzilla.redhat.com/show_bug.cgi?id=1594261

Could you please clarify its state because here it says it is fixed in version:
openstack-tripleo-heat-templates-7.0.12-12.el7ost (which is not available in rhel-7-server-openstack-12-rpms repo)

,while in the other bug report target release has been moved to rocky and version is now: openstack-tripleo-heat-templates-9.0.0-0.20180919080946.0rc1.0rc1.el7os

Is the rpm version 7.0.12-12 available somewhere and does it fix the whole /var/lib/nova recurse issue?

best,
p

Comment 3 Martin Schuppert 2018-10-18 06:45:35 UTC
(In reply to PetarJ from comment #2)
> Hello,
> 
> i have a question regarding this bug and its clone:
> https://bugzilla.redhat.com/show_bug.cgi?id=1594261
> 
> Could you please clarify its state because here it says it is fixed in
> version:
> openstack-tripleo-heat-templates-7.0.12-12.el7ost (which is not available in
> rhel-7-server-openstack-12-rpms repo)
> 
> ,while in the other bug report target release has been moved to rocky and
> version is now:
> openstack-tripleo-heat-templates-9.0.0-0.20180919080946.0rc1.0rc1.el7os
> 
> Is the rpm version 7.0.12-12 available somewhere and does it fix the whole
> /var/lib/nova recurse issue?

openstack-tripleo-heat-templates-7.0.12-12.el7ost is not released yet. [1] is the clone for OSP13 which is already released. In urgent cases please file a support case at access.redhat.com and we could provide a hotfix via the support case.

What you mean with whole issue? It fixes the wrong owner issue of the instance disk files and there is now no loss in access to the instance files during restart of nova compute.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1603538

Comment 4 PetarJ 2018-10-18 12:44:01 UTC
Hi Martin,
thanks for the quick response. We will open a support case regarding the 7.0.12-12 version.

Sorry for the imprecision regarding the "whole issue", besides /var/lib/nova/instances folder being chowned, also cinder netapp backend nfs mount gets recursively chowned, this is what i meant.

Before:

[[staging]root@overcloud-ciscocompute-1 ~]# ls -al /var/lib/nova/mnt/
total 4
drwxr-xr-x.  3  42436  42436   46 Oct 17 19:45 .
drwxr-xr-x. 10  42436  42436  121 Aug 16 00:41 ..
drwxrwxr-x.  2 cinder cinder 4096 Oct 17 21:17 48c91206eaf59857e41341396e408d44

After docker restart nova_compute:

[[staging]root@overcloud-ciscocompute-1 ~]# ls -alR /var/lib/nova/mnt/
/var/lib/nova/mnt/:
total 4
drwxr-xr-x.  3 42436 42436   46 Oct 17 19:45 .
drwxr-xr-x. 10 42436 42436  121 Aug 16 00:41 ..
drwxrwxr-x.  2 nova  nova  4096 Oct 17 21:17 48c91206eaf59857e41341396e408d44

Comment 5 PetarJ 2018-10-18 15:08:50 UTC
More precise, this affects not only ephemerals, but also other nfs mounts.

Example:

osp12 deployed with cinder-backend-netapp.yaml backend enabled.

Spin up an instance with the boot volume served from netapp, run a test dd inside the instance while doing a docker restart of nova_compute:

prior to restart:

[[staging]root@overcloud-controller-0 ~]# ls -al /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/
total 3937136
drwxrwxr-x. 2 cinder cinder        4096 Oct 18 16:49 .
drwxr-xr-x. 4 cinder cinder          86 Oct 18 16:30 ..
-rw-rw----. 1 cinder cinder 10737418240 Oct 18 16:49 img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d
-rw-rw----. 1 qemu   qemu   10737418240 Oct 18 16:54 volume-37c73099-5f42-4cef-82ca-692bbf354b68

do a docker restart nova_compute changes the ownership:

[[staging]root@overcloud-controller-0 ~]# ls -al /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/
total 5327996
drwxrwxr-x. 2 nova   nova          4096 Oct 18 16:49 .
drwxr-xr-x. 4 cinder cinder          86 Oct 18 16:30 ..
-rw-rw----. 1 nova   nova   10737418240 Oct 18 16:49 img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d
-rw-rw----. 1 nova   nova   10737418240 Oct 18 16:54 volume-37c73099-5f42-4cef-82ca-692bbf354b68

Inside the instance the moment volumes change ownership:

-bash: /bin/sleep: Input/output error
-bash: /bin/dd: Input/output error
-bash: /bin/sleep: Input/output error

root@nap-test ~]# ls -al
-bash: /bin/ls: Input/output error

Doing:

openstack server reboot nap-test

returns back the volume ownership and instance runs ok then on:

[[staging]root@overcloud-controller-0 ~]# ls -al /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/
total 5328032
drwxrwxr-x. 2 nova   nova          4096 Oct 18 16:49 .
drwxr-xr-x. 4 cinder cinder          86 Oct 18 16:30 ..
-rw-rw----. 1 nova   nova   10737418240 Oct 18 16:49 img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d
-rw-rw----. 1 qemu   qemu   10737418240 Oct 18 17:05 volume-37c73099-5f42-4cef-82ca-692bbf354b68

Comment 10 Martin Schuppert 2018-10-23 10:04:59 UTC
(In reply to PetarJ from comment #5)
> More precise, this affects not only ephemerals, but also other nfs mounts.
> 
> Example:
> 
> osp12 deployed with cinder-backend-netapp.yaml backend enabled.
> 
> Spin up an instance with the boot volume served from netapp, run a test dd
> inside the instance while doing a docker restart of nova_compute:
> 
> prior to restart:
> 
> [[staging]root@overcloud-controller-0 ~]# ls -al
> /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/
> total 3937136
> drwxrwxr-x. 2 cinder cinder        4096 Oct 18 16:49 .
> drwxr-xr-x. 4 cinder cinder          86 Oct 18 16:30 ..
> -rw-rw----. 1 cinder cinder 10737418240 Oct 18 16:49
> img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d
> -rw-rw----. 1 qemu   qemu   10737418240 Oct 18 16:54
> volume-37c73099-5f42-4cef-82ca-692bbf354b68
> 
> do a docker restart nova_compute changes the ownership:
> 
> [[staging]root@overcloud-controller-0 ~]# ls -al
> /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/
> total 5327996
> drwxrwxr-x. 2 nova   nova          4096 Oct 18 16:49 .
> drwxr-xr-x. 4 cinder cinder          86 Oct 18 16:30 ..
> -rw-rw----. 1 nova   nova   10737418240 Oct 18 16:49
> img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d
> -rw-rw----. 1 nova   nova   10737418240 Oct 18 16:54
> volume-37c73099-5f42-4cef-82ca-692bbf354b68
> 
> Inside the instance the moment volumes change ownership:
> 
> -bash: /bin/sleep: Input/output error
> -bash: /bin/dd: Input/output error
> -bash: /bin/sleep: Input/output error
> 
> root@nap-test ~]# ls -al
> -bash: /bin/ls: Input/output error
> 
> Doing:
> 
> openstack server reboot nap-test
> 
> returns back the volume ownership and instance runs ok then on:
> 
> [[staging]root@overcloud-controller-0 ~]# ls -al
> /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/
> total 5328032
> drwxrwxr-x. 2 nova   nova          4096 Oct 18 16:49 .
> drwxr-xr-x. 4 cinder cinder          86 Oct 18 16:30 ..
> -rw-rw----. 1 nova   nova   10737418240 Oct 18 16:49
> img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d
> -rw-rw----. 1 qemu   qemu   10737418240 Oct 18 17:05
> volume-37c73099-5f42-4cef-82ca-692bbf354b68

With the fix from this BZ the user does not change on attached nfs cinder volumes.
The user is still qemu after a restart of nova_compute container:

[root@compute-1 mnt]# pwd
/var/lib/nova/mnt

[root@compute-1 mnt]# ll -R
.: 
total 0
drwxrwxrwx. 2 42436 42436 108 Oct 23 09:34 b4e49454a0d6fed499c0980f2e484733
   
./b4e49454a0d6fed499c0980f2e484733:
total 0

-rw-rw-rw-. 1 qemu qemu 1073741824 Oct 23 09:34 volume-08d882e4-0465-4b9c-9cf7-c9f44a804b79
-rw-rw-rw-. 1 qemu qemu 1073741824 Oct 23 09:34 volume-7c96d587-b10a-46be-8637-60446942a846

Note: tested with default nfs cinder backend, not netapp, but from compute pov this is the same.

Please let us know if you see issues with attached cinder volumes.

Comment 11 PetarJ 2018-11-09 11:19:15 UTC

We can confirm that the bugfix rpm openstack-tripleo-heat-templates-7.0.12-12.el7ost
fixes this issue also for netapp backend (as you've said, same thing).
Thank you all for the support in fixing this :)

[[dev]root@overcloud-compute-0 ~]# virsh dumpxml instance-00000017|grep mnt
      <source file='/var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff'/>

[[dev]root@overcloud-compute-0 ~]# ls -al /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff
-rw-rw-rw-. 1 qemu qemu 10737418240 Nov  9 11:59 /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff

[[dev]root@overcloud-compute-0 ~]# docker restart nova_compute
nova_compute

[[dev]root@overcloud-compute-0 ~]# ls -al /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff
-rw-rw-rw-. 1 qemu qemu 10737418240 Nov  9 11:59 /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff

Comment 12 Martin Schuppert 2018-11-09 15:09:45 UTC
(In reply to PetarJ from comment #11)
> 
> We can confirm that the bugfix rpm
> openstack-tripleo-heat-templates-7.0.12-12.el7ost
> fixes this issue also for netapp backend (as you've said, same thing).
> Thank you all for the support in fixing this :)
> 
> [[dev]root@overcloud-compute-0 ~]# virsh dumpxml instance-00000017|grep mnt
>       <source
> file='/var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-
> 5e15-4e29-9a9d-b84a1dafa2ff'/>
> 
> [[dev]root@overcloud-compute-0 ~]# ls -al
> /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-
> 9a9d-b84a1dafa2ff
> -rw-rw-rw-. 1 qemu qemu 10737418240 Nov  9 11:59
> /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-
> 9a9d-b84a1dafa2ff
> 
> [[dev]root@overcloud-compute-0 ~]# docker restart nova_compute
> nova_compute
> 
> [[dev]root@overcloud-compute-0 ~]# ls -al
> /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-
> 9a9d-b84a1dafa2ff
> -rw-rw-rw-. 1 qemu qemu 10737418240 Nov  9 11:59
> /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-
> 9a9d-b84a1dafa2ff

Thanks a lot for the feedback and confirmation that the issue is fixed!

Comment 21 errata-xmlrpc 2018-12-05 18:52:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3789


Note You need to log in before you can comment on or make changes to this bug.