Description of problem: {cinder_api, glance_api, heat_api, nova_api, keystone}_cron container don't work. When crond tries to execute the command in /var/spool/cron/<service name>. The work of crond is intercepted by pam_loginuid.so. - /var/log/cron Jul 5 01:01:01 controller-0 crond[115705]: (keystone) PAM ERROR (Cannot make/remove an entry for the specified session) Jul 5 01:01:01 controller-0 crond[115705]: (keystone) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) - /var/log/secure Jul 5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted Jul 5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): set_loginuid failed By default, /etc/pam.d/crond has the following entries. ~~~ auth include system-auth account required pam_access.so account include system-auth session required pam_loginuid.so session include system-auth ~~~ pam_loginuid.so is marked as required. So, if crond fails the verification by pam_loginuid.so, the command execution by crond fails. From my preliminary research, it is caused by the lack of the configuration. In T-H-T, every cron container except logroate works as below: ~~~ /var/lib/kolla/config_files/cinder_api_cron.json: command: /usr/sbin/crond -n config_files: - source: "/var/lib/kolla/config_files/src/*" dest: "/" merge: true preserve_properties: true permissions: - path: /var/log/cinder owner: cinder:cinder recurse: true ~~~ In the man page of crond, it states that -n option requires a modifcation to PAM, removing pam_loginuid.so from /etc/pam.d/crond ~~~ -n Tells the daemon to run in the foreground. This can be useful when starting it out of init. With this option is needed to change pam setting. /etc/pam.d/crond must not enable pam_loginuid.so module. ~~~ In container side, the container doesn't have the proper capability, AUDIT_CONTROL. Due to that, writing a uid to /proc/self/loginuid is rejected by kernel. Version-Release number of selected component (if applicable): OSP17.0 and OSP17.1 Beta How reproducible: Everytime when crond runs a command in /var/spool/cron/<service name> Steps to Reproduce: 1. Deploy overcloud 2. Wait for one or two hours. 3. Confirm the error logs in /var/log/cron and /var/log/secure. Actual results: All cron containers except logrotate fail to execute the specified commands. Expected results: Commands in cron container can be executed. Additional info: I think there are two solution for this issue. 1. Removing pam_loginuid.so module from /etc/pam.d/crond in continer. 2. Add AUDIT_CONTROL to cron container(This is now being test on my lab.)
I confirmed that adding AUDIT_CONTROL to cron container can avoid the issue. The curious thing is PAM error is still recorded in /var/log/cron and /var/log/secure. We may need to dig the issue deeper.
Re-reading the report, I noticed the pam entry is specific to crond, so probably we can remove the line during image build, like we did for systemd-auth here. https://github.com/openstack/tripleo-common/blob/master/container-images/tcib/base/base.yaml#L40
I checked some logs from upstream c8s jobs in wallaby but found the same error in secure log. Unfortunately cron log is not captured in CI so could not determine whether cron is working there. I also checked CAPs enabled in cinder_api_cron container but AUDIT_CONTROL does not exist under EffectiveCaps/BoundingCaps either in c8s or c9s.
If we agree removing pam_loginuid.so is the correct solution then we can try https://review.opendev.org/c/openstack/tripleo-common/+/887748 .
Currently, I'm removing the pam_loginuid.so in /etc/pam.d/crond from the affected containers and capabilities as default. Once I get the result, I'll update this bugzilla.
Removing pam_loginuid.so seems to be good solution for this issue. I confirmed that keystone-trustflush.log is created. I'm waiting for other components' logs. I'll update this bugzilla next Monday with the results.
Hi, I confirmed that removing pam_loginuid.so from /etc/pam.d/crond inside containers fixed the issue. What I confirmed is: 1. Deploy overcloud and modify /etc/pam.d/crond on all cron containers which run crond -n. 2. Leave the system for a day. 3. Confirm the following errors are not recorded in /var/log/cron and /var/log/secure: ~~~ - /var/log/cron Jul 5 01:01:01 controller-0 crond[115705]: (keystone) PAM ERROR (Cannot make/remove an entry for the specified session) Jul 5 01:01:01 controller-0 crond[115705]: (keystone) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) - /var/log/secure Jul 5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted Jul 5 01:01:01 controller-0 crond[115705]: pam_loginuid(crond:session): set_loginuid failed ~~~ 4. Confirm the logs generated by cronjob exist under /var/log/container/{glance,nova,cinder,keystone}
Still seeing the error messages in RHOS-17.1-RHEL-9-20230719.n.1: [stack@undercloud-0 ~]$ sudo more /var/log/cron | grep PAM Jul 19 21:01:01 undercloud-0 crond[177]: (keystone) PAM ERROR (Cannot make/remove an entry for the specified session) Jul 19 21:01:01 undercloud-0 crond[177]: (keystone) FAILED to open PAM security session (Cannot make/remove an entry for the specified session) [stack@undercloud-0 ~]$ sudo more /var/log/secure | grep pam_loginuid Jul 19 21:01:01 undercloud-0 crond[177]: pam_loginuid(crond:session): Error writing /proc/self/loginuid: Operation not permitted Jul 19 21:01:01 undercloud-0 crond[177]: pam_loginuid(crond:session): set_loginuid failed Believe that even though the fix has been included on the host it is not included in the containers: [stack@undercloud-0 ~]$ yum list installed | grep openstack-tripleo-common openstack-tripleo-common.noarch 15.4.1-1.20230518211054.el9ost @rhelosp-17.1 openstack-tripleo-common-containers.noarch 15.4.1-1.20230518211054.el9ost @rhelosp-17.1 These are image tags: sudo podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/podman-pause 4.4.1-1686828714 8797b77f4ffe 16 hours ago 810 kB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-dhcp-agent 17.1_20230718.2 2c83e0db2c7d 41 hours ago 865 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-l3-agent 17.1_20230718.2 e1dc7eee8881 41 hours ago 865 MB localhost/tripleo/openstack-heat-engine ephemeral 4e894c16a5ba 41 hours ago 732 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-heat-engine 17.1_20230718.2 4e894c16a5ba 41 hours ago 732 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-inspector 17.1_20230718.2 ab2786581459 41 hours ago 608 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-neutron-agent 17.1_20230718.2 36b2d9934f4e 41 hours ago 778 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-heat-api 17.1_20230718.2 6ec3d906fca6 41 hours ago 732 MB localhost/tripleo/openstack-heat-api ephemeral 6ec3d906fca6 41 hours ago 732 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-api 17.1_20230718.2 f6606f9039d2 41 hours ago 615 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-pxe 17.1_20230718.2 15fd867992ed 41 hours ago 717 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ironic-conductor 17.1_20230718.2 1f45cc4df0b2 41 hours ago 663 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-heat-all 17.1_20230718.2 91295e72e2f5 41 hours ago 732 MB localhost/tripleo/openstack-heat-all ephemeral 91295e72e2f5 41 hours ago 732 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-server 17.1_20230718.2 d7e95e515152 42 hours ago 806 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-openvswitch-agent 17.1_20230718.2 0230e81393e5 42 hours ago 778 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-iscsid 17.1_20230718.2 8b13890b519a 42 hours ago 521 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-keystone 17.1_20230718.2 4332f08ba9cb 42 hours ago 598 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rabbitmq 17.1_20230718.2 df9932aa9a9d 43 hours ago 546 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy 17.1_20230718.2 916dfcfac72c 43 hours ago 491 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-mariadb 17.1_20230718.2 9d1e911ad27c 43 hours ago 623 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-memcached 17.1_20230718.2 6b56a8b0dbe9 43 hours ago 408 MB undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron 17.1_20230718.2 0cea06f2eaf0 43 hours ago 381 MB
So looking at the pam file in keystone_cron container, the record causing the problem was not removed ``` [stack@undercloud-0 ~]$ sudo podman exec -it 1d13d3f388be /bin/bash [root@undercloud-0 /]# cat /etc/pam.d/crond # # The PAM configuration file for the cron daemon # # # Although no PAM authentication is called, auth modules # are used for credential setting auth include system-auth account required pam_access.so account include system-auth session required pam_loginuid.so session include system-auth [root@undercloud-0 /]# ``` But if I run the command (I noticed inconsistent () but I don't think that causes the problem) and that removed the line. ``` [root@undercloud-0 /]# sed -ri '/^session(\s)+required(\s+)pam_loginuid.so$/d' /etc/pam.d/crond [root@undercloud-0 /]# cat /etc/pam.d/crond # # The PAM configuration file for the cron daemon # # # Although no PAM authentication is called, auth modules # are used for credential setting auth include system-auth account required pam_access.so account include system-auth session include system-auth ``` I checked the upstream images but the record with pam_loginuid.so was removed ``` [tkajinam@mylaptop ~]$ podman run -it quay.io/tripleowallabycentos9/openstack-keystone:current-tripleo /bin/bash [root@1bffcdb6f9ec /]# cat /etc/pam.d/crond # # The PAM configuration file for the cron daemon # # # Although no PAM authentication is called, auth modules # are used for credential setting auth include system-auth account required pam_access.so account include system-auth session include system-auth ``` So it seems the change in the Dockerfile is not pulled during image build. I'll ask RelDel to update Dockerfile and rebuild images.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577