Bug 1640371 - openstack overcloud update fails due to bad permissions on cinder-manage.log -- Permission denied: '/var/log/cinder/cinder-manage.log'
Summary: openstack overcloud update fails due to bad permissions on cinder-manage.log ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: z12
: 10.0 (Newton)
Assignee: Sofer Athlan-Guyot
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-17 22:23 UTC by Andreas Karis
Modified: 2021-12-10 18:08 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1595315
Environment:
Last Closed: 2019-07-02 09:28:20 UTC
Target Upstream Version:
Embargoed:
tshefi: automate_bug-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11714 0 None Closed RHEL EUS Errata Documentation 2022-06-16 06:26:55 UTC
Red Hat Issue Tracker UPG-4873 0 None None None 2021-12-10 18:08:42 UTC

Comment 2 Andreas Karis 2018-10-17 22:34:28 UTC
This environment was deployed as OSP 10.
i)  We ran a controller replacement procedure https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/director_installation_and_usage/sect-scaling_the_overcloud#sect-Replacing_Controller_Nodes  (SFDC 02199264)
ii) We then fixed an issue in the roles .yaml file and redeployed (SFDC 02192627)

https://bugzilla.redhat.com/show_bug.cgi?id=1595315#c11
~~~
 Alan Bishop 2018-06-28 08:34:46 EDT

Further analysis reveals this problem will only occur in an artificial
environment in which the upgrade to OSP-10 occurs immediately after the
initial deployment of OSP-9. If the OSP-9 deployment runs for a day, cinder's
db_purge cron job will run, and that logs to cinder-manage.log. This has the
side effect of creating the cinder-manage.log file with the correct 'cinder'
ownership prior to the OSP-10 upgrade. So, when upgrading to OSP-10, the
log file will already have been created with the proper ownership that will
allow the OSP-10 db_sync operation to succeed.

With this in mind, I'm lowering the BZ priority and severity.

To fully address the problem, puppet-tripleo needs to be fixed to stop
running db_sync on non-bootstrap nodes. I've filed a bug upstream, and will
soon submit a patch to fix it.
~~~

Wouldn't the correct solution be to make sure that the file permissions of  /var/log/cinder/cinder-manage.log file are cinder:cinder?
E.g., add to the puppet manifest an instruction that sets the permissions of files in /var/log/cinder to cinder:cinder if this condition is not met.

Thanks,

Andreas

Comment 3 Andreas Karis 2018-10-17 22:37:28 UTC
Actually we ran cinder-manage manually. Here's how to reproduce this:

sudo [heat-admin@overcloud-controller-0 ~]$ sudo -i
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# ls /var/log/cinder/
api.log               scheduler.log              volume.log              volume.log-20180922.gz
api.log-20180908.gz   scheduler.log-20180911.gz  volume.log-20180916.gz
cinder-manage.log     scheduler.log-20180916.gz  volume.log-20180918.gz
cinder-rowsflush.log  scheduler.log-20180921.gz  volume.log-20180920.gz
[root@overcloud-controller-0 ~]# ls /var/log/cinder/ -alh
total 8.0M
drwxr-x---.  2 cinder root   4.0K Sep 22 03:35 .
drwxr-xr-x. 41 root   root   4.0K Oct 14 03:35 ..
-rw-r--r--.  1 cinder cinder 244K Oct 16 18:44 api.log
-rw-r--r--.  1 cinder cinder 841K Sep  7 22:02 api.log-20180908.gz
-rw-r--r--.  1 cinder cinder 417K Oct 17 00:01 cinder-manage.log
-rw-r--r--.  1 cinder cinder 416K Oct 17 00:01 cinder-rowsflush.log
-rw-r--r--.  1 cinder cinder 1.7M Oct 16 18:44 scheduler.log
-rw-r--r--.  1 cinder cinder 749K Sep 11 03:17 scheduler.log-20180911.gz
-rw-r--r--.  1 cinder cinder 860K Sep 16 03:46 scheduler.log-20180916.gz
-rw-r--r--.  1 cinder cinder 854K Sep 21 03:29 scheduler.log-20180921.gz
-rw-r--r--.  1 cinder cinder  14K Oct 16 18:44 volume.log
-rw-r--r--.  1 cinder cinder 526K Sep 16 03:46 volume.log-20180916.gz
-rw-r--r--.  1 cinder cinder 524K Sep 18 03:45 volume.log-20180918.gz
-rw-r--r--.  1 cinder cinder 523K Sep 20 03:20 volume.log-20180920.gz
-rw-r--r--.  1 cinder cinder 446K Sep 21 20:46 volume.log-20180922.gz
[root@overcloud-controller-0 ~]# cinder-manage  db sync
Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
[root@overcloud-controller-0 ~]# ls /var/log/cinder/ -alh
total 8.0M
drwxr-x---.  2 cinder root   4.0K Sep 22 03:35 .
drwxr-xr-x. 41 root   root   4.0K Oct 14 03:35 ..
-rw-r--r--.  1 cinder cinder 244K Oct 16 18:44 api.log
-rw-r--r--.  1 cinder cinder 841K Sep  7 22:02 api.log-20180908.gz
-rw-r--r--.  1 cinder cinder 417K Oct 17 00:01 cinder-manage.log
-rw-r--r--.  1 cinder cinder 416K Oct 17 00:01 cinder-rowsflush.log
-rw-r--r--.  1 cinder cinder 1.7M Oct 16 18:44 scheduler.log
-rw-r--r--.  1 cinder cinder 749K Sep 11 03:17 scheduler.log-20180911.gz
-rw-r--r--.  1 cinder cinder 860K Sep 16 03:46 scheduler.log-20180916.gz
-rw-r--r--.  1 cinder cinder 854K Sep 21 03:29 scheduler.log-20180921.gz
-rw-r--r--.  1 cinder cinder  14K Oct 16 18:44 volume.log
-rw-r--r--.  1 cinder cinder 526K Sep 16 03:46 volume.log-20180916.gz
-rw-r--r--.  1 cinder cinder 524K Sep 18 03:45 volume.log-20180918.gz
-rw-r--r--.  1 cinder cinder 523K Sep 20 03:20 volume.log-20180920.gz
-rw-r--r--.  1 cinder cinder 446K Sep 21 20:46 volume.log-20180922.gz
[root@overcloud-controller-0 ~]# rm -f /var/log/cinder/cinder-manage.log 
[root@overcloud-controller-0 ~]# cinder-manage  db sync
Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
[root@overcloud-controller-0 ~]# ls /var/log/cinder/ -alh
total 7.6M
drwxr-x---.  2 cinder root   4.0K Oct 17 22:36 .
drwxr-xr-x. 41 root   root   4.0K Oct 14 03:35 ..
-rw-r--r--.  1 cinder cinder 244K Oct 16 18:44 api.log
-rw-r--r--.  1 cinder cinder 841K Sep  7 22:02 api.log-20180908.gz
-rw-r--r--.  1 root   root      0 Oct 17 22:36 cinder-manage.log
-rw-r--r--.  1 cinder cinder 416K Oct 17 00:01 cinder-rowsflush.log
-rw-r--r--.  1 cinder cinder 1.7M Oct 16 18:44 scheduler.log
-rw-r--r--.  1 cinder cinder 749K Sep 11 03:17 scheduler.log-20180911.gz
-rw-r--r--.  1 cinder cinder 860K Sep 16 03:46 scheduler.log-20180916.gz
-rw-r--r--.  1 cinder cinder 854K Sep 21 03:29 scheduler.log-20180921.gz
-rw-r--r--.  1 cinder cinder  14K Oct 16 18:44 volume.log
-rw-r--r--.  1 cinder cinder 526K Sep 16 03:46 volume.log-20180916.gz
-rw-r--r--.  1 cinder cinder 524K Sep 18 03:45 volume.log-20180918.gz
-rw-r--r--.  1 cinder cinder 523K Sep 20 03:20 volume.log-20180920.gz
-rw-r--r--.  1 cinder cinder 446K Sep 21 20:46 volume.log-20180922.gz
[root@overcloud-controller-0 ~]#

Comment 6 Alan Bishop 2018-10-19 19:58:53 UTC
Removing triaged keyword until I understand what's happening.

Comment 7 Alan Bishop 2018-10-22 15:51:47 UTC
I finally tracked down where (and how) cinder-manage is invoked in a way that
causes the cinder-manage.log file to be owned by root, which in turn causes
puppet's db sync to fail.

There are three places [1], [2] and [3] where cinder-manage is run as root. The
command always opens its cinder-manage.log file, and if the file doesn't
already exist then it will be created with root as the owner. I believe [1] is
the source of cinder-manage.log files with 0 size.

[1] https://github.com/openstack/tripleo-heat-templates/blob/newton-eol/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L69
[2] https://github.com/openstack/tripleo-heat-templates/blob/newton-eol/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L72
[3] https://github.com/openstack/tripleo-heat-templates/blob/newton-eol/extraconfig/tasks/major_upgrade_controller_pacemaker_3.sh#L59

One thing that will help is if these where invoked using "sudo -u cinder," but
there's a deeper problem. All of the above instances are supposed to only be
run on the bootstrap node, and the general pattern uses this logic:

  if [[ -n $(is_bootstrap_node) ]]; then...

The "-n" tests whether the following string has a non-zero length.
Unfortunately, the is_boostrap_function emits a debut message [4], and
so the "-n" test will always be True. In other words, the code always thinks
its running on the bootstrap node.

[4] https://github.com/openstack/tripleo-heat-templates/blob/newton-eol/extraconfig/tasks/pacemaker_common_functions.sh#L37

I'd like DFG:Upgrades to take a look at this. Although the pacemaker upgrade
scripts are not used in later releases, the is_bootstrap_node issue identified
is present in upstream master.

Comment 8 Andreas Karis 2018-10-22 16:00:25 UTC
So in this particular case, my problem was different. I was troubleshooting, and cinder-manage.log was not yet created for some reason. I think that *my troubleshooting* created that file when I ran `cinder-manage db sync` with user root. Perhaps it would be useful to make sure that the owner/group/permissions on the log files are correct with puppet before calling cinder-manage.

Comment 9 Sofer Athlan-Guyot 2018-10-29 13:51:46 UTC
Hi Andreas,

so I'm wondering why we don't close the bug as it seems to be due to the fact that the command was run as root user and you already have the correct package (one version more than the one provided there https://bugzilla.redhat.com/show_bug.cgi?id=1595315)

Is there still something to be done here?

Comment 10 Andreas Karis 2018-10-29 14:12:36 UTC
Because I think that Director needs to be smart enough to deal with somebody running `cinder-manage db sync` as root. Feel free to close it if you don't agree, but my take is that given the sequence:

*) log file doesn't exist
*) admin logs in, runs `cinder-manage db sync`
*) Director is executed

This should still work and not fail.

Comment 11 Alan Bishop 2018-10-29 15:10:58 UTC
I don't think there's much that any service (or the Director) can do in the face of an admin running commands as root. My concern is when the Director itself does that, which it appears to be doing at [1] 

[1] https://github.com/openstack/tripleo-heat-templates/blob/newton-eol/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L69

If the cinder-manage.log file is created for the first time, then it will be owned by root, which is a problem. Normally, the log file will have already been created by the cinder user long before an upgrade takes place. But, this should happen *only* on the boostrap node.

This leads to my biggest concern, and that is the debug_log at [2] will the is_bootstrap_node behave as if it's True on non-bootstrap nodes.

[2] https://github.com/openstack/tripleo-heat-templates/blob/newton-eol/extraconfig/tasks/pacemaker_common_functions.sh#L37

Comment 12 Andreas Karis 2018-10-29 15:27:59 UTC
Director can simply chown the log file, can't it?

I had exactly the opposite discussion on another ticket where Director (or rather puppet) unconditionally restarted iptables.service when it was stopped (by an admin). Leading to a complete flush of iptables rules and an outage. 

When I said that I didn't agree with that behavior of the product (I still don't think that DIrector should simply restart iptables.service when it's down), the argument as I understood it was that Director expects a state and brings the system into that state, no matter what. 

Following that logic, Director should in this case chown the log file to the correct user if it's owned by root. In the end, I'm indifferent to this. It's highly unlikely that customers run into this issue, and if somebody hits it, this bug here exists to document it, so I'm good with whatever you decide to do :-)

Comment 13 Alan Bishop 2018-10-29 15:59:10 UTC
(In reply to Andreas Karis from comment #12)
> Director can simply chown the log file, can't it?

Sure it could, but this is one of thousands of potential conditions that might arise when something runs as root when it really shouldn't. Yes, a chown could address this particular instance, but where does it all end?

I'm trying to get to the root cause here, which I think it a combination of these factors:

1) The Director's upgrade scripts need to take care when running things as root
2) The is_bootstrap_node function may be allowing things to run on non-bootstrap nodes

Comment 14 Sofer Athlan-Guyot 2018-10-31 14:22:08 UTC
Hi,

lot of things there.

>> Director can simply chown the log file, can't it?
>
> Sure it could, but this is one of thousands of potential conditions that might arise when
> something runs as root when it really shouldn't. Yes, a chown could address this particular 
> instance, but where does it all end?

I tend to agree with Alan here.  Furthermore when https://review.openstack.org/#/c/579234/1/manifests/profile/base/cinder/api.pp will be available to osp10 (z9 with version >= puppet-tripleo-5.6.8-13.el7ost), it will reduce the likelihood of this happening as the db sync will run only on the bootstrap node.

Furthermore we can fix that cheaply during the upgrade.  See below.

So now form point 1. and 2. in Alan's recap of the situation.

> 1) The Director's upgrade scripts need to take care when running things as root

Agree, at least we have to make sure that we're not the one doing this and it's a cheap change.  Let's harden the situation there.

We can start with a patch in osp10 and then make sure it's ok upstream in newer version.

We could as well add a lit


>> This leads to my biggest concern, and that is the debug_log at [2] will the >> is_bootstrap_node behave as if it's True on non-bootstrap nodes.
>
> 2) The is_bootstrap_node function may be allowing things to run on non-bootstrap nodes

Well, this I don't get.

is_bootstrap_node is always checked for non-empty output

   -n $(is_bootstrap_node)

the function is_bootstrap_node won't emit anything on non bootstrap node, log_debug or not, as the call to log debug is within the "if/fi":

    function is_bootstrap_node {
      if [ "$(hiera -c /etc/puppet/hiera.yaml bootstrap_nodeid | tr '[:upper:]' '[:lower:]')" = "$(facter hostname | tr '[:upper:]' '[:lower:]')" ]; then
# -> something on the stdout
        log_debug "Node is bootstrap"
        echo "true"
      fi
# -> nothing on the stdout
    }

So not sure what I'm missing here?

In the meatime I'll propose a patch to prevent osp10 upgrade script from running cinder-manage as root and make sure the file has enough permission

Comment 16 Alan Bishop 2018-11-16 16:00:08 UTC
(In reply to Sofer Athlan-Guyot from comment #14)
> <snip>
> Well, this I don't get.
> 
> is_bootstrap_node is always checked for non-empty output
> 
>    -n $(is_bootstrap_node)
> 
> the function is_bootstrap_node won't emit anything on non bootstrap node,
> log_debug or not, as the call to log debug is within the "if/fi":
> 
>     function is_bootstrap_node {
>       if [ "$(hiera -c /etc/puppet/hiera.yaml bootstrap_nodeid | tr
> '[:upper:]' '[:lower:]')" = "$(facter hostname | tr '[:upper:]'
> '[:lower:]')" ]; then
> # -> something on the stdout
>         log_debug "Node is bootstrap"
>         echo "true"
>       fi
> # -> nothing on the stdout
>     }
> 
> So not sure what I'm missing here?

You're missing nothing, and I'm feeling foolish.

You are absolutely correct that the debug message only appears when the node *is* a bootstrap node, and it will not appear on non-bootstrap nodes. Sorry for the confusion.


Note You need to log in before you can comment on or make changes to this bug.