Bug 1690787 - [OSP13] Controller-replacement fails (controller-removal) because : /var/log/containers/nova/nova-manage.log is owned by root:root
Summary: [OSP13] Controller-replacement fails (controller-removal) because : /var/log/...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z7
: 13.0 (Queens)
Assignee: Martin Schuppert
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On: 1685506 1707816
Blocks: 1690784 1707817
TreeView+ depends on / blocked
 
Reported: 2019-03-20 09:02 UTC by Martin Schuppert
Modified: 2019-05-08 13:13 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.3.1-16.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1685506
: 1707817 (view as bug list)
Environment:
Last Closed: 2019-05-08 13:13:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1820590 0 None None None 2019-03-20 09:02:52 UTC
OpenStack gerrit 644548 0 None MERGED Run chown for nova log files on every run to fix wrong permissions 2020-06-05 10:11:20 UTC
Red Hat Product Errata RHBA-2019:0939 0 None None None 2019-04-30 17:28:03 UTC

Description Martin Schuppert 2019-03-20 09:02:52 UTC
+++ This bug was initially created as a clone of Bug #1685506 +++

Description of problem:
Controller-replacement fails (controller-removal) because : /var/log/containers/nova/nova-manage.log is owned by root:root

Version-Release number of selected component (if applicable):
OSP14 2019-02-27.1

How reproducible:
always

Steps to Reproduce:

Via automaion: 
run : https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-controller_replacement-normal/

Manually:
1.deploy an HA OSP14
2.try to remove one controller : 
rerun the overcloud_deploy.sh with an added : 
-e /home/stack/remove-controller.yaml \

cat /home/stack/remove-controller.yaml
parameters:
  ControllerRemovalPolicies:
    [{'resource_list': ['0']}]


Actual results:
Overcloud controller removal fails with : 
http://pastebin.test.redhat.com/731244
Controller-1 nova_api container fails to start because :

  "IOError: [Errno 13] Permission denied: '/var/log/nova/nova-manage.log'", 
(in deployment log file)

Expected results:
Controller removal succeeds, finishes without errors, and all overcloud
agents are up and operational.

--- Additional comment from  on 2019-03-05 11:05:18 UTC ---

sos reports and stack home are at : 
http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1685506/

--- Additional comment from  on 2019-03-05 11:07:24 UTC ---

As can be seen below the rest of nova's containers logs are owned by Kolla : userid=>42436 (as it should)
but nova-manage.log is owned by root:

[root@controller-1 ~]# ls -l /var/log/containers/nova
total 33072
-rw-r--r--. 1 42436 42436  6120842 Mar  5 10:41 nova-api.log
-rw-r--r--. 1 42436 42436 10828856 Mar  5 09:00 nova-api.log.1
[...]
-rw-r--r--. 1 root  root         0 Mar  4 17:23 nova-manage.log
-rw-r--r--. 1 42436 42436        0 Mar  5 00:01 nova-metadata-api.log
-rw-r--r--. 1 42436 42436   761548 Mar  5 00:01 nova-metadata-api.log.1


[stack@undercloud-0 ~]$ ansible controller-1 -mshell -b -a'ls -l /var/log/containers/nova|grep manage'

controller-1 | SUCCESS | rc=0 >>
-rw-r--r--. 1 root  root         0 Mar  4 17:23 nova-manage.log

[stack@undercloud-0 ~]$ ansible controller-2 -mshell -b -a'ls -l /var/log/containers/nova|grep manage'

controller-2 | SUCCESS | rc=0 >>
-rw-r--r--. 1 42436 42436        0 Mar  5 00:01 nova-manage.log

--- Additional comment from Artem Hrechanychenko on 2019-03-06 15:00:15 UTC ---

Hello Pini,
reproduced on my env too - https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-df-controller_replacement-14-virthost-3cont_3comp_3ceph-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-vxlan-replace_controller-RHELOSP-31864/

OSP14 puddle - 2019-02-27.1

--- Additional comment from Martin Schuppert on 2019-03-15 11:01:31 UTC ---

As the sosreports miss system logs, I tried to reproduce the issue with 2019-02-27.1 , but don't see the wrong permission on the nova-manage log

After deploy:

The only nova-manage log on controller-0:

[root@controller-0 ~]#  ls -la /var/log/containers/nova/ |grep manage
-rw-r--r--.  1 42436 42436        0 Mar 15 00:00 nova-manage.log
-rw-r--r--.  1 42436 42436   274848 Mar 15 00:00 nova-manage.log.1

After replacement:

(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 0ab774a5-233f-46e9-a429-0949b969d6db | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| 51344592-2b97-455f-a936-b7826eae7b30 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 0c5fe3a4-8a66-4f45-b85b-0855b02f277f | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.21 |
| 4a046882-093e-4e00-8bed-46fcf6e72603 | controller-3 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+

[root@controller-0 ~]# ls -la /var/log/containers/nova/ |grep manage
-rw-r--r--.  1 42436 42436        0 Mar 15 00:00 nova-manage.log
-rw-r--r--.  1 42436 42436   274848 Mar 15 00:00 nova-manage.log.1

Does that job run any nova-manage commands as root outside the tripleo workflow? If initially one got triggered on Controller-1 as root the nova-manage log gets created as from the description and then the reported issue can happen.

In any case we'll submit a patch to chown the logs in case something get triggered as root manually.

Comment 13 errata-xmlrpc 2019-04-30 17:27:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0939

Comment 14 Martin Schuppert 2019-05-02 22:07:20 UTC
Reopen as fix openstack-tripleo-heat-templates-8.3.1-16.el7ost is not included in the errata

Comment 15 Martin Schuppert 2019-05-08 13:13:38 UTC
we can note reopen BZ which are mapped to an errata, created 1707817 for track the release


Note You need to log in before you can comment on or make changes to this bug.