Bug 2219765

Summary: {cinder_api, glance_api, heat_api, nova_api, keystone}_cron container don't work
Product: Red Hat OpenStack Reporter: Keigo Noha <knoha>
Component: openstack-tripleo-commonAssignee: Takashi Kajinami <tkajinam>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 17.0 (Wallaby)CC: bshephar, drosenfe, dwilde, jamsmith, lsvaty, mburns, pgrist, rheslop, slinaber, tkajinam
Target Milestone: gaKeywords: Regression, Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-15.4.1-1.20230518211054.el9ost openstack-dependencies-container-17.1.0-82 Doc Type: Bug Fix
Doc Text:
Before this update, the `pam_loginuid` module was enabled in some containers. This prevented crond from executing some tasks, such as `db purge,` inside of those containers. Now, `pam_loginuid` is removed and the containerized `crond` process runs all periodic tasks.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:15:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Keigo Noha 2023-07-05 09:11:31 UTC
Description of problem:
{cinder_api, glance_api, heat_api, nova_api, keystone}_cron container don't work.

When crond tries to execute the command in /var/spool/cron/<service name>.

The work of crond is intercepted by pam_loginuid.so.

- /var/log/cron
Jul  5 01:01:01 controller-0 crond[115705]: (keystone) PAM ERROR (Cannot make/remove an entry for the specified session)
Jul  5 01:01:01 controller-0 crond[115705]: (keystone) FAILED to open PAM security session (Cannot make/remove an entry for the specified session)

- /var/log/secure
Jul  5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted
Jul  5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): set_loginuid failed

By default, /etc/pam.d/crond has the following entries.
~~~
auth       include    system-auth
account    required   pam_access.so
account    include    system-auth
session    required   pam_loginuid.so
session    include    system-auth
~~~

pam_loginuid.so is marked as required. So, if crond fails the verification by pam_loginuid.so, the command execution by crond fails.

From my preliminary research, it is caused by the lack of the configuration.

In T-H-T, every cron container except logroate works as below:
~~~
        /var/lib/kolla/config_files/cinder_api_cron.json:
          command: /usr/sbin/crond -n
          config_files:
            - source: "/var/lib/kolla/config_files/src/*"
              dest: "/"
              merge: true
              preserve_properties: true
          permissions:
            - path: /var/log/cinder
              owner: cinder:cinder
              recurse: true
~~~

In the man page of crond, it states that -n option requires a modifcation to PAM, removing pam_loginuid.so from /etc/pam.d/crond

~~~
       -n     Tells the daemon to run in the foreground.  This can be
              useful when starting it out of init. With this option is
              needed to change pam setting.  /etc/pam.d/crond must not
              enable pam_loginuid.so module.
~~~

In container side, the container doesn't have the proper capability, AUDIT_CONTROL.
Due to that, writing a uid to /proc/self/loginuid is rejected by kernel.

Version-Release number of selected component (if applicable):
OSP17.0 and OSP17.1 Beta


How reproducible:
Everytime when crond runs a command in /var/spool/cron/<service name>

Steps to Reproduce:
1. Deploy overcloud
2. Wait for one or two hours.
3. Confirm the error logs in /var/log/cron and /var/log/secure.

Actual results:
All cron containers except logrotate fail to execute the specified commands.

Expected results:
Commands in cron container can be executed.

Additional info:
I think there are two solution for this issue.

1. Removing pam_loginuid.so module from /etc/pam.d/crond in continer.
2. Add AUDIT_CONTROL to cron container(This is now being test on my lab.)

Comment 1 Keigo Noha 2023-07-05 09:15:42 UTC
I confirmed that adding AUDIT_CONTROL to cron container can avoid the issue.

The curious thing is PAM error is still recorded in /var/log/cron and /var/log/secure.
We may need to dig the issue deeper.

Comment 3 Takashi Kajinami 2023-07-06 01:34:16 UTC
Re-reading the report, I noticed the pam entry is specific to crond, so probably
we can remove the line during image build, like we did for systemd-auth here.

https://github.com/openstack/tripleo-common/blob/master/container-images/tcib/base/base.yaml#L40

Comment 4 Takashi Kajinami 2023-07-06 05:37:05 UTC
I checked some logs from upstream c8s jobs in wallaby but found the same error in secure log.
Unfortunately cron log is not captured in CI so could not determine whether cron is working there.

I also checked CAPs enabled in cinder_api_cron container but AUDIT_CONTROL does not exist under
EffectiveCaps/BoundingCaps either in c8s or c9s.

Comment 6 Takashi Kajinami 2023-07-06 15:02:23 UTC
If we agree removing pam_loginuid.so is the correct solution then we can try https://review.opendev.org/c/openstack/tripleo-common/+/887748 .

Comment 7 Keigo Noha 2023-07-07 01:08:41 UTC
Currently, I'm removing the pam_loginuid.so in /etc/pam.d/crond from the affected containers and capabilities as default. Once I get the result, I'll update this bugzilla.

Comment 8 Keigo Noha 2023-07-07 07:31:04 UTC
Removing pam_loginuid.so seems to be good solution for this issue.
I confirmed that keystone-trustflush.log is created. I'm waiting for other components' logs.
I'll update this bugzilla next Monday with the results.

Comment 9 Keigo Noha 2023-07-10 00:40:17 UTC
Hi,

I confirmed that removing pam_loginuid.so from /etc/pam.d/crond inside containers fixed the issue.
What I confirmed is:

1. Deploy overcloud and modify /etc/pam.d/crond on all cron containers which run crond -n.
2. Leave the system for a day.
3. Confirm the following errors are not recorded in /var/log/cron and /var/log/secure:
~~~
- /var/log/cron
Jul  5 01:01:01 controller-0 crond[115705]: (keystone) PAM ERROR (Cannot make/remove an entry for the specified session)
Jul  5 01:01:01 controller-0 crond[115705]: (keystone) FAILED to open PAM security session (Cannot make/remove an entry for the specified session)

- /var/log/secure
Jul  5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted
Jul  5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): set_loginuid failed
~~~
4. Confirm the logs generated by cronjob exist under /var/log/container/{glance,nova,cinder,keystone}

Comment 17 David Rosenfeld 2023-07-20 13:07:35 UTC
Still seeing the error messages in RHOS-17.1-RHEL-9-20230719.n.1:

[stack@undercloud-0 ~]$ sudo more /var/log/cron | grep PAM
Jul 19 21:01:01 undercloud-0 crond[177]: (keystone) PAM ERROR (Cannot make/remove an entry for the specified session)
Jul 19 21:01:01 undercloud-0 crond[177]: (keystone) FAILED to open PAM security session (Cannot make/remove an entry for the specified session)

[stack@undercloud-0 ~]$ sudo more /var/log/secure | grep pam_loginuid
Jul 19 21:01:01 undercloud-0 crond[177]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted
Jul 19 21:01:01 undercloud-0 crond[177]: pam_loginuid(crond:session): set_loginuid failed



Believe that even though the fix has been included on the host it is not included in the containers:

[stack@undercloud-0 ~]$ yum list installed | grep openstack-tripleo-common
openstack-tripleo-common.noarch                 15.4.1-1.20230518211054.el9ost           @rhelosp-17.1                     
openstack-tripleo-common-containers.noarch      15.4.1-1.20230518211054.el9ost           @rhelosp-17.1  

These are image tags:

 sudo podman images 
REPOSITORY                                                                                   TAG               IMAGE ID      CREATED       SIZE
localhost/podman-pause                                                                       4.4.1-1686828714  8797b77f4ffe  16 hours ago  810 kB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-dhcp-agent         17.1_20230718.2   2c83e0db2c7d  41 hours ago  865 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-l3-agent           17.1_20230718.2   e1dc7eee8881  41 hours ago  865 MB
localhost/tripleo/openstack-heat-engine                                                      ephemeral         4e894c16a5ba  41 hours ago  732 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-heat-engine                17.1_20230718.2   4e894c16a5ba  41 hours ago  732 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-inspector           17.1_20230718.2   ab2786581459  41 hours ago  608 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-neutron-agent       17.1_20230718.2   36b2d9934f4e  41 hours ago  778 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-heat-api                   17.1_20230718.2   6ec3d906fca6  41 hours ago  732 MB
localhost/tripleo/openstack-heat-api                                                         ephemeral         6ec3d906fca6  41 hours ago  732 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-api                 17.1_20230718.2   f6606f9039d2  41 hours ago  615 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-pxe                 17.1_20230718.2   15fd867992ed  41 hours ago  717 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-conductor           17.1_20230718.2   1f45cc4df0b2  41 hours ago  663 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-heat-all                   17.1_20230718.2   91295e72e2f5  41 hours ago  732 MB
localhost/tripleo/openstack-heat-all                                                         ephemeral         91295e72e2f5  41 hours ago  732 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-server             17.1_20230718.2   d7e95e515152  42 hours ago  806 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-openvswitch-agent  17.1_20230718.2   0230e81393e5  42 hours ago  778 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-iscsid                     17.1_20230718.2   8b13890b519a  42 hours ago  521 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-keystone                   17.1_20230718.2   4332f08ba9cb  42 hours ago  598 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rabbitmq                   17.1_20230718.2   df9932aa9a9d  43 hours ago  546 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy                    17.1_20230718.2   916dfcfac72c  43 hours ago  491 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-mariadb                    17.1_20230718.2   9d1e911ad27c  43 hours ago  623 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-memcached                  17.1_20230718.2   6b56a8b0dbe9  43 hours ago  408 MB
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron                       17.1_20230718.2   0cea06f2eaf0  43 hours ago  381 MB

Comment 18 Takashi Kajinami 2023-07-20 13:14:48 UTC
So looking at the pam file in keystone_cron container, the record causing the problem was not removed
```
[stack@undercloud-0 ~]$ sudo podman exec -it 1d13d3f388be /bin/bash
[root@undercloud-0 /]# cat /etc/pam.d/crond
#
# The PAM configuration file for the cron daemon
#
#
# Although no PAM authentication is called, auth modules
# are used for credential setting
auth       include    system-auth
account    required   pam_access.so
account    include    system-auth
session    required   pam_loginuid.so
session    include    system-auth
[root@undercloud-0 /]#
```

But if I run the command (I noticed inconsistent () but I don't think that causes the problem) and that removed the line.
```
[root@undercloud-0 /]# sed -ri '/^session(\s)+required(\s+)pam_loginuid.so$/d' /etc/pam.d/crond
[root@undercloud-0 /]# cat /etc/pam.d/crond
#
# The PAM configuration file for the cron daemon
#
#
# Although no PAM authentication is called, auth modules
# are used for credential setting
auth       include    system-auth
account    required   pam_access.so
account    include    system-auth
session    include    system-auth
```

I checked the upstream images but the record with pam_loginuid.so was removed

```
[tkajinam@mylaptop ~]$ podman run -it quay.io/tripleowallabycentos9/openstack-keystone:current-tripleo /bin/bash
[root@1bffcdb6f9ec /]# cat /etc/pam.d/crond
#
# The PAM configuration file for the cron daemon
#
#
# Although no PAM authentication is called, auth modules
# are used for credential setting
auth       include    system-auth
account    required   pam_access.so
account    include    system-auth
session    include    system-auth
```

So it seems the change in the Dockerfile is not pulled during image build. I'll ask RelDel to update Dockerfile
and rebuild images.

Comment 30 errata-xmlrpc 2023-08-16 01:15:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577