DescriptionMartin Schuppert
2019-03-20 09:02:52 UTC
+++ This bug was initially created as a clone of Bug #1685506 +++
Description of problem:
Controller-replacement fails (controller-removal) because : /var/log/containers/nova/nova-manage.log is owned by root:root
Version-Release number of selected component (if applicable):
OSP14 2019-02-27.1
How reproducible:
always
Steps to Reproduce:
Via automaion:
run : https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-controller_replacement-normal/
Manually:
1.deploy an HA OSP14
2.try to remove one controller :
rerun the overcloud_deploy.sh with an added :
-e /home/stack/remove-controller.yaml \
cat /home/stack/remove-controller.yaml
parameters:
ControllerRemovalPolicies:
[{'resource_list': ['0']}]
Actual results:
Overcloud controller removal fails with :
http://pastebin.test.redhat.com/731244
Controller-1 nova_api container fails to start because :
"IOError: [Errno 13] Permission denied: '/var/log/nova/nova-manage.log'",
(in deployment log file)
Expected results:
Controller removal succeeds, finishes without errors, and all overcloud
agents are up and operational.
--- Additional comment from on 2019-03-05 11:05:18 UTC ---
sos reports and stack home are at :
http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1685506/
--- Additional comment from on 2019-03-05 11:07:24 UTC ---
As can be seen below the rest of nova's containers logs are owned by Kolla : userid=>42436 (as it should)
but nova-manage.log is owned by root:
[root@controller-1 ~]# ls -l /var/log/containers/nova
total 33072
-rw-r--r--. 1 42436 42436 6120842 Mar 5 10:41 nova-api.log
-rw-r--r--. 1 42436 42436 10828856 Mar 5 09:00 nova-api.log.1
[...]
-rw-r--r--. 1 root root 0 Mar 4 17:23 nova-manage.log
-rw-r--r--. 1 42436 42436 0 Mar 5 00:01 nova-metadata-api.log
-rw-r--r--. 1 42436 42436 761548 Mar 5 00:01 nova-metadata-api.log.1
[stack@undercloud-0 ~]$ ansible controller-1 -mshell -b -a'ls -l /var/log/containers/nova|grep manage'
controller-1 | SUCCESS | rc=0 >>
-rw-r--r--. 1 root root 0 Mar 4 17:23 nova-manage.log
[stack@undercloud-0 ~]$ ansible controller-2 -mshell -b -a'ls -l /var/log/containers/nova|grep manage'
controller-2 | SUCCESS | rc=0 >>
-rw-r--r--. 1 42436 42436 0 Mar 5 00:01 nova-manage.log
--- Additional comment from Artem Hrechanychenko on 2019-03-06 15:00:15 UTC ---
Hello Pini,
reproduced on my env too - https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-df-controller_replacement-14-virthost-3cont_3comp_3ceph-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-vxlan-replace_controller-RHELOSP-31864/
OSP14 puddle - 2019-02-27.1
--- Additional comment from Martin Schuppert on 2019-03-15 11:01:31 UTC ---
As the sosreports miss system logs, I tried to reproduce the issue with 2019-02-27.1 , but don't see the wrong permission on the nova-manage log
After deploy:
The only nova-manage log on controller-0:
[root@controller-0 ~]# ls -la /var/log/containers/nova/ |grep manage
-rw-r--r--. 1 42436 42436 0 Mar 15 00:00 nova-manage.log
-rw-r--r--. 1 42436 42436 274848 Mar 15 00:00 nova-manage.log.1
After replacement:
(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 0ab774a5-233f-46e9-a429-0949b969d6db | compute-0 | ACTIVE | - | Running | ctlplane=192.168.24.10 |
| 51344592-2b97-455f-a936-b7826eae7b30 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 |
| 0c5fe3a4-8a66-4f45-b85b-0855b02f277f | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.21 |
| 4a046882-093e-4e00-8bed-46fcf6e72603 | controller-3 | ACTIVE | - | Running | ctlplane=192.168.24.18 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
[root@controller-0 ~]# ls -la /var/log/containers/nova/ |grep manage
-rw-r--r--. 1 42436 42436 0 Mar 15 00:00 nova-manage.log
-rw-r--r--. 1 42436 42436 274848 Mar 15 00:00 nova-manage.log.1
Does that job run any nova-manage commands as root outside the tripleo workflow? If initially one got triggered on Controller-1 as root the nova-manage log gets created as from the description and then the reported issue can happen.
In any case we'll submit a patch to chown the logs in case something get triggered as root manually.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2019:0939
Comment 14Martin Schuppert
2019-05-02 22:07:20 UTC
Reopen as fix openstack-tripleo-heat-templates-8.3.1-16.el7ost is not included in the errata
Comment 15Martin Schuppert
2019-05-08 13:13:38 UTC
we can note reopen BZ which are mapped to an errata, created 1707817 for track the release