Description of problem: Controller-replacement fails (controller-removal) because : /var/log/containers/nova/nova-manage.log is owned by root:root Version-Release number of selected component (if applicable): OSP14 2019-02-27.1 How reproducible: always Steps to Reproduce: Via automaion: run : https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-controller_replacement-normal/ Manually: 1.deploy an HA OSP14 2.try to remove one controller : rerun the overcloud_deploy.sh with an added : -e /home/stack/remove-controller.yaml \ cat /home/stack/remove-controller.yaml parameters: ControllerRemovalPolicies: [{'resource_list': ['0']}] Actual results: Overcloud controller removal fails with : http://pastebin.test.redhat.com/731244 Controller-1 nova_api container fails to start because : "IOError: [Errno 13] Permission denied: '/var/log/nova/nova-manage.log'", (in deployment log file) Expected results: Controller removal succeeds, finishes without errors, and all overcloud agents are up and operational.
sos reports and stack home are at : http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1685506/
As can be seen below the rest of nova's containers logs are owned by Kolla : userid=>42436 (as it should) but nova-manage.log is owned by root: [root@controller-1 ~]# ls -l /var/log/containers/nova total 33072 -rw-r--r--. 1 42436 42436 6120842 Mar 5 10:41 nova-api.log -rw-r--r--. 1 42436 42436 10828856 Mar 5 09:00 nova-api.log.1 [...] -rw-r--r--. 1 root root 0 Mar 4 17:23 nova-manage.log -rw-r--r--. 1 42436 42436 0 Mar 5 00:01 nova-metadata-api.log -rw-r--r--. 1 42436 42436 761548 Mar 5 00:01 nova-metadata-api.log.1 [stack@undercloud-0 ~]$ ansible controller-1 -mshell -b -a'ls -l /var/log/containers/nova|grep manage' controller-1 | SUCCESS | rc=0 >> -rw-r--r--. 1 root root 0 Mar 4 17:23 nova-manage.log [stack@undercloud-0 ~]$ ansible controller-2 -mshell -b -a'ls -l /var/log/containers/nova|grep manage' controller-2 | SUCCESS | rc=0 >> -rw-r--r--. 1 42436 42436 0 Mar 5 00:01 nova-manage.log
Hello Pini, reproduced on my env too - https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-df-controller_replacement-14-virthost-3cont_3comp_3ceph-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-vxlan-replace_controller-RHELOSP-31864/ OSP14 puddle - 2019-02-27.1
As the sosreports miss system logs, I tried to reproduce the issue with 2019-02-27.1 , but don't see the wrong permission on the nova-manage log After deploy: The only nova-manage log on controller-0: [root@controller-0 ~]# ls -la /var/log/containers/nova/ |grep manage -rw-r--r--. 1 42436 42436 0 Mar 15 00:00 nova-manage.log -rw-r--r--. 1 42436 42436 274848 Mar 15 00:00 nova-manage.log.1 After replacement: (undercloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | 0ab774a5-233f-46e9-a429-0949b969d6db | compute-0 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | 51344592-2b97-455f-a936-b7826eae7b30 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 0c5fe3a4-8a66-4f45-b85b-0855b02f277f | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.21 | | 4a046882-093e-4e00-8bed-46fcf6e72603 | controller-3 | ACTIVE | - | Running | ctlplane=192.168.24.18 | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ [root@controller-0 ~]# ls -la /var/log/containers/nova/ |grep manage -rw-r--r--. 1 42436 42436 0 Mar 15 00:00 nova-manage.log -rw-r--r--. 1 42436 42436 274848 Mar 15 00:00 nova-manage.log.1 Does that job run any nova-manage commands as root outside the tripleo workflow? If initially one got triggered on Controller-1 as root the nova-manage log gets created as from the description and then the reported issue can happen. In any case we'll submit a patch to chown the logs in case something get triggered as root manually.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0878
Reopen as fix openstack-tripleo-heat-templates-9.3.1-0.20190314162764.d0a6cb1.el7ost is not included in the errata
we can note reopen BZ which are mapped to an errata, created 1707816 for track the release