Description of problem: Updating openstack-nova-common on the compute will break nova permissions in the containers as the /var/lib/nova/instances folder will get chowned back to UID:GID 162 instead of 42436. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Update openstack-nova-compute on the computes 2. 3. Actual results: Failure to attach volumes to new VMs Expected results: Success Additional info:
From the "hypervisor" we should see 42436:42436 here instead of 162: [root@overcloud-compute-0 nova]# ls -lanr total 8 drwxr-xr-x. 2 162 162 6 Oct 24 19:40 tmp drwxr-xr-x. 2 162 162 6 Oct 24 19:40 networks drwxr-xr-x. 2 162 162 6 Oct 24 19:40 keys drwxr-xr-x. 3 162 162 40 Oct 24 19:40 instances drwxr-xr-x. 2 162 162 6 Oct 24 19:40 buckets -rw-------. 1 42436 42436 103 Dec 5 21:01 .bash_history drwxr-xr-x. 94 0 0 4096 Dec 3 20:25 .. drwxr-xr-x. 7 162 162 98 Dec 5 21:11 . It looks like upgrading openstack-nova-common is the cause of this behavior.
When hitting this bug, we'll have the following scheduling error message when failing to create the VMs: ["Build of instance 05aa1a94-248b-4a04-9d86-7bb9fc1857cb was re-scheduled: [Errno 13] Permission denied: '/var/lib/nova/instances/05aa1a94-248b-4a04-9d86-7bb9fc1857cb'\n"]
We should probably remove that package from the computes in order to avoid this being a big problems for any customers updating the latest packages ... It looks like a legacy requirement in the way we build the overcloud images and we could simply remove that from the image and make sure the package is uninstalled via puppet/ansible ?
Verification steps: * Deploy osp13z4 * On the compute node, nova packages are installed: [heat-admin@compute-0 ~]$ rpm -qa | grep nova openstack-nova-conductor-17.0.7-5.el7ost.noarch python-nova-17.0.7-5.el7ost.noarch openstack-nova-scheduler-17.0.7-5.el7ost.noarch openstack-nova-migration-17.0.7-5.el7ost.noarch openstack-nova-common-17.0.7-5.el7ost.noarch openstack-nova-console-17.0.7-5.el7ost.noarch openstack-nova-placement-api-17.0.7-5.el7ost.noarch openstack-nova-novncproxy-17.0.7-5.el7ost.noarch python2-novaclient-10.1.0-1.el7ost.noarch puppet-nova-12.4.0-14.el7ost.noarch openstack-nova-compute-17.0.7-5.el7ost.noarch openstack-nova-api-17.0.7-5.el7ost.noarch * In the nova_compute container, file ownership is as follow: [heat-admin@compute-0 ~]$ sudo docker exec nova_compute ls -lrn /var/lib/nova total 0 drwxr-xr-x. 2 42436 42436 6 Dec 21 10:37 tmp drwxr-xr-x. 2 42436 42436 6 Dec 21 10:37 networks drwxr-xr-x. 2 42436 42436 6 Dec 21 10:37 keys drwxr-xr-x. 5 42436 42436 97 Feb 27 08:20 instances drwxr-xr-x. 2 42436 42436 6 Dec 21 10:37 buckets * Update to 13z5 * nova packages are removed from the compute node: [heat-admin@compute-0 ~]$ rpm -qa | grep nova python2-novaclient-10.1.0-1.el7ost.noarch puppet-nova-12.4.0-14.el7ost.noarch * File ownership is correct inside the container: [heat-admin@compute-0 ~]$ sudo docker exec nova_compute ls -lrn /var/lib/nova total 0 drwxr-xr-x. 6 42436 42436 141 Feb 28 13:00 instances
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0448