Description of problem: ----------------------- RHCS 4.3z1 installation fails on RHEL 8.7 at TASK [ceph-mgr : wait for all mgr to be up]. The mgr services fails to start because of below error: ``` Jan 19 03:12:25 ceph50 systemd[1]: Started Ceph Manager. Jan 19 03:12:25 ceph50 ceph-mgr-ceph50[677680]: find: '/var/lib/ceph/mgr/ceph-ceph50/keyring': Permission denied Jan 19 03:12:25 ceph50 ceph-mgr-ceph50[677680]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph50/keyring': Permission denied Jan 19 03:12:26 ceph50 systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE Jan 19 03:12:26 ceph50 systemd[1]: ceph-mgr: Failed with result 'exit-code'. Jan 19 03:12:36 ceph50 systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart. Jan 19 03:12:36 ceph50 systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 5198. Jan 19 03:12:36 ceph50 systemd[1]: Stopped Ceph Manager. ``` Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHCS : 4.3z1 ceph-ansible : 4.0.70.18-1.el8cp.noarch RHEL : 8.7 Podman : 4.2 How reproducible: ----------------- Every time Steps to Reproduce: Deploy fresh ceph cluster on 4.3z1 on top of RHEL 8.7. Actual results: --------------- ceph mgr fails to start service because of selinux label and hence the deployment fails. Expected results: ----------------- ceph mgr should start service and the deployment should succeed.
Hi, we tested this deployment RHCS 4.3z1 with RHEL 8.7, hitting same issue as reported in this BZ [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# rpm -qa | grep ansible ceph-ansible-4.0.70.18-1.el8cp.noarch ansible-2.9.27-1.el8ae.noarch [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# rpm -qa | grep podman podman-4.2.0-6.module+el8.7.0+17498+a7f63b89.x86_64 podman-catatonit-4.2.0-6.module+el8.7.0+17498+a7f63b89.x86_64 [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.7 (Ootpa) [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# podman version Client: Podman Engine Version: 4.2.0 API Version: 4.2.0 Go Version: go1.18.4 Built: Mon Dec 12 06:41:56 2022 OS/Arch: linux/amd64 Deployment failed with below error ============ Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20023]: find: '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring': Permission > Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20023]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyrin> Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Failed with result 'exit-code'. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 3. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Stopped Ceph Manager. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Starting Ceph Manager... Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20266]: Error: no container with name or ID "ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer" found: no such container Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20277]: Error: no container with name or ID "ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer" found: no such container Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20286]: Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20286]: a4fd9d117ca54e869ecc5cb4b9a42290d4a6d51ba38348bee186dba16edc3c08 Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Started Ceph Manager. Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20296]: find: '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring': Permission > Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20296]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyrin> Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Failed with result 'exit-code'. Feb 02 07:22:27 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart. Feb 02 07:22:27 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 4. ========================= [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# ll -lZ /var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring -rw-------. 1 167 167 system_u:object_r:var_lib_t:s0 172 Feb 2 07:21 /var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring ================================
ls -lZd /var/lib/ceph The only change in this area that I am aware of is, if the top level directory is labeled correctly from an SELinux point of view, podman will no longer relabel the contents of the directory. Meaning if you mv'd files into the directory we could have an issue, if /var/lib/ceph is labeled container_file_t:s0.
Why aren't both sides using the :z? What AVCs are you seeing? what is the label of ls -lZd /var/lib/ceph /etc/ceph /var/run/ceph /var/log/ceph
When I analyze the AVCs, audit2allow indicates that these rules are dontaudited in current SELinux policy. ~ $ audit2allow -i /tmp/t #============= container_t ============== #!!!! This avc has a dontaudit rule in the current policy allow container_t var_lib_t:dir read; #============= init_t ============== #!!!! This avc has a dontaudit rule in the current policy allow init_t initrc_t:process siginh; #!!!! This avc has a dontaudit rule in the current policy allow init_t unconfined_service_t:process siginh; Which indicates to me, that either someone did # semodule -DB Turning off dontaudit rules. Or container-selinux failed to be installed properly.
This allow rule allow container_t var_lib_t:dir read; comes from a container trying to read /var/lib/ceph/mgr/ceph-ceph50/ I would guess. So this countainer is not being run as spc_t, so the SELinux container separation is not turned on. Looks like the container either needs to be run with `--security-opt label=disabled`if run by Podman. Or with `SecurityOpt Label type spc_t` if run via OpenSHift. Or the content needs to be relabeled container_file_t using the :z option.
One possible issues isa change podman that it checks the top level directory of a volume, it it has the correct label on it. Podman will no longer walk the entire directory tree to relabel files/directories under the top level. Perhaps for some reason you have this setup. You could just do a `chcon -Rt container_file_t PATHTO/SRCVOLUME` And then everything should work correctly or just change the top level directory to not be container_file_t, and then the :z will relabel everything.