Hello, i have a question regarding this bug and its clone: https://bugzilla.redhat.com/show_bug.cgi?id=1594261 Could you please clarify its state because here it says it is fixed in version: openstack-tripleo-heat-templates-7.0.12-12.el7ost (which is not available in rhel-7-server-openstack-12-rpms repo) ,while in the other bug report target release has been moved to rocky and version is now: openstack-tripleo-heat-templates-9.0.0-0.20180919080946.0rc1.0rc1.el7os Is the rpm version 7.0.12-12 available somewhere and does it fix the whole /var/lib/nova recurse issue? best, p
(In reply to PetarJ from comment #2) > Hello, > > i have a question regarding this bug and its clone: > https://bugzilla.redhat.com/show_bug.cgi?id=1594261 > > Could you please clarify its state because here it says it is fixed in > version: > openstack-tripleo-heat-templates-7.0.12-12.el7ost (which is not available in > rhel-7-server-openstack-12-rpms repo) > > ,while in the other bug report target release has been moved to rocky and > version is now: > openstack-tripleo-heat-templates-9.0.0-0.20180919080946.0rc1.0rc1.el7os > > Is the rpm version 7.0.12-12 available somewhere and does it fix the whole > /var/lib/nova recurse issue? openstack-tripleo-heat-templates-7.0.12-12.el7ost is not released yet. [1] is the clone for OSP13 which is already released. In urgent cases please file a support case at access.redhat.com and we could provide a hotfix via the support case. What you mean with whole issue? It fixes the wrong owner issue of the instance disk files and there is now no loss in access to the instance files during restart of nova compute. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1603538
Hi Martin, thanks for the quick response. We will open a support case regarding the 7.0.12-12 version. Sorry for the imprecision regarding the "whole issue", besides /var/lib/nova/instances folder being chowned, also cinder netapp backend nfs mount gets recursively chowned, this is what i meant. Before: [[staging]root@overcloud-ciscocompute-1 ~]# ls -al /var/lib/nova/mnt/ total 4 drwxr-xr-x. 3 42436 42436 46 Oct 17 19:45 . drwxr-xr-x. 10 42436 42436 121 Aug 16 00:41 .. drwxrwxr-x. 2 cinder cinder 4096 Oct 17 21:17 48c91206eaf59857e41341396e408d44 After docker restart nova_compute: [[staging]root@overcloud-ciscocompute-1 ~]# ls -alR /var/lib/nova/mnt/ /var/lib/nova/mnt/: total 4 drwxr-xr-x. 3 42436 42436 46 Oct 17 19:45 . drwxr-xr-x. 10 42436 42436 121 Aug 16 00:41 .. drwxrwxr-x. 2 nova nova 4096 Oct 17 21:17 48c91206eaf59857e41341396e408d44
More precise, this affects not only ephemerals, but also other nfs mounts. Example: osp12 deployed with cinder-backend-netapp.yaml backend enabled. Spin up an instance with the boot volume served from netapp, run a test dd inside the instance while doing a docker restart of nova_compute: prior to restart: [[staging]root@overcloud-controller-0 ~]# ls -al /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/ total 3937136 drwxrwxr-x. 2 cinder cinder 4096 Oct 18 16:49 . drwxr-xr-x. 4 cinder cinder 86 Oct 18 16:30 .. -rw-rw----. 1 cinder cinder 10737418240 Oct 18 16:49 img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d -rw-rw----. 1 qemu qemu 10737418240 Oct 18 16:54 volume-37c73099-5f42-4cef-82ca-692bbf354b68 do a docker restart nova_compute changes the ownership: [[staging]root@overcloud-controller-0 ~]# ls -al /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/ total 5327996 drwxrwxr-x. 2 nova nova 4096 Oct 18 16:49 . drwxr-xr-x. 4 cinder cinder 86 Oct 18 16:30 .. -rw-rw----. 1 nova nova 10737418240 Oct 18 16:49 img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d -rw-rw----. 1 nova nova 10737418240 Oct 18 16:54 volume-37c73099-5f42-4cef-82ca-692bbf354b68 Inside the instance the moment volumes change ownership: -bash: /bin/sleep: Input/output error -bash: /bin/dd: Input/output error -bash: /bin/sleep: Input/output error root@nap-test ~]# ls -al -bash: /bin/ls: Input/output error Doing: openstack server reboot nap-test returns back the volume ownership and instance runs ok then on: [[staging]root@overcloud-controller-0 ~]# ls -al /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/ total 5328032 drwxrwxr-x. 2 nova nova 4096 Oct 18 16:49 . drwxr-xr-x. 4 cinder cinder 86 Oct 18 16:30 .. -rw-rw----. 1 nova nova 10737418240 Oct 18 16:49 img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d -rw-rw----. 1 qemu qemu 10737418240 Oct 18 17:05 volume-37c73099-5f42-4cef-82ca-692bbf354b68
(In reply to PetarJ from comment #5) > More precise, this affects not only ephemerals, but also other nfs mounts. > > Example: > > osp12 deployed with cinder-backend-netapp.yaml backend enabled. > > Spin up an instance with the boot volume served from netapp, run a test dd > inside the instance while doing a docker restart of nova_compute: > > prior to restart: > > [[staging]root@overcloud-controller-0 ~]# ls -al > /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/ > total 3937136 > drwxrwxr-x. 2 cinder cinder 4096 Oct 18 16:49 . > drwxr-xr-x. 4 cinder cinder 86 Oct 18 16:30 .. > -rw-rw----. 1 cinder cinder 10737418240 Oct 18 16:49 > img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d > -rw-rw----. 1 qemu qemu 10737418240 Oct 18 16:54 > volume-37c73099-5f42-4cef-82ca-692bbf354b68 > > do a docker restart nova_compute changes the ownership: > > [[staging]root@overcloud-controller-0 ~]# ls -al > /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/ > total 5327996 > drwxrwxr-x. 2 nova nova 4096 Oct 18 16:49 . > drwxr-xr-x. 4 cinder cinder 86 Oct 18 16:30 .. > -rw-rw----. 1 nova nova 10737418240 Oct 18 16:49 > img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d > -rw-rw----. 1 nova nova 10737418240 Oct 18 16:54 > volume-37c73099-5f42-4cef-82ca-692bbf354b68 > > Inside the instance the moment volumes change ownership: > > -bash: /bin/sleep: Input/output error > -bash: /bin/dd: Input/output error > -bash: /bin/sleep: Input/output error > > root@nap-test ~]# ls -al > -bash: /bin/ls: Input/output error > > Doing: > > openstack server reboot nap-test > > returns back the volume ownership and instance runs ok then on: > > [[staging]root@overcloud-controller-0 ~]# ls -al > /var/lib/cinder/mnt/48c91206eaf59857e41341396e408d44/ > total 5328032 > drwxrwxr-x. 2 nova nova 4096 Oct 18 16:49 . > drwxr-xr-x. 4 cinder cinder 86 Oct 18 16:30 .. > -rw-rw----. 1 nova nova 10737418240 Oct 18 16:49 > img-cache-b8626cce-1a27-42a3-a1b1-125d0c9a270d > -rw-rw----. 1 qemu qemu 10737418240 Oct 18 17:05 > volume-37c73099-5f42-4cef-82ca-692bbf354b68 With the fix from this BZ the user does not change on attached nfs cinder volumes. The user is still qemu after a restart of nova_compute container: [root@compute-1 mnt]# pwd /var/lib/nova/mnt [root@compute-1 mnt]# ll -R .: total 0 drwxrwxrwx. 2 42436 42436 108 Oct 23 09:34 b4e49454a0d6fed499c0980f2e484733 ./b4e49454a0d6fed499c0980f2e484733: total 0 -rw-rw-rw-. 1 qemu qemu 1073741824 Oct 23 09:34 volume-08d882e4-0465-4b9c-9cf7-c9f44a804b79 -rw-rw-rw-. 1 qemu qemu 1073741824 Oct 23 09:34 volume-7c96d587-b10a-46be-8637-60446942a846 Note: tested with default nfs cinder backend, not netapp, but from compute pov this is the same. Please let us know if you see issues with attached cinder volumes.
We can confirm that the bugfix rpm openstack-tripleo-heat-templates-7.0.12-12.el7ost fixes this issue also for netapp backend (as you've said, same thing). Thank you all for the support in fixing this :) [[dev]root@overcloud-compute-0 ~]# virsh dumpxml instance-00000017|grep mnt <source file='/var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff'/> [[dev]root@overcloud-compute-0 ~]# ls -al /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff -rw-rw-rw-. 1 qemu qemu 10737418240 Nov 9 11:59 /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff [[dev]root@overcloud-compute-0 ~]# docker restart nova_compute nova_compute [[dev]root@overcloud-compute-0 ~]# ls -al /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff -rw-rw-rw-. 1 qemu qemu 10737418240 Nov 9 11:59 /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29-9a9d-b84a1dafa2ff
(In reply to PetarJ from comment #11) > > We can confirm that the bugfix rpm > openstack-tripleo-heat-templates-7.0.12-12.el7ost > fixes this issue also for netapp backend (as you've said, same thing). > Thank you all for the support in fixing this :) > > [[dev]root@overcloud-compute-0 ~]# virsh dumpxml instance-00000017|grep mnt > <source > file='/var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc- > 5e15-4e29-9a9d-b84a1dafa2ff'/> > > [[dev]root@overcloud-compute-0 ~]# ls -al > /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29- > 9a9d-b84a1dafa2ff > -rw-rw-rw-. 1 qemu qemu 10737418240 Nov 9 11:59 > /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29- > 9a9d-b84a1dafa2ff > > [[dev]root@overcloud-compute-0 ~]# docker restart nova_compute > nova_compute > > [[dev]root@overcloud-compute-0 ~]# ls -al > /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29- > 9a9d-b84a1dafa2ff > -rw-rw-rw-. 1 qemu qemu 10737418240 Nov 9 11:59 > /var/lib/nova/mnt/f2c2277183706b16e4a0618b7d88140e/volume-b18081cc-5e15-4e29- > 9a9d-b84a1dafa2ff Thanks a lot for the feedback and confirmation that the issue is fixed!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3789