Bug 2162781
| Summary: | [cee/sd][ceph-ansible] RHCS 4.3z1 installation fails on RHEL 8.7 at TASK [ceph-mgr : wait for all mgr to be up] | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Lijo Stephen Thomas <lithomas> | |
| Component: | podman | Assignee: | Daniel Walsh <dwalsh> | |
| Status: | CLOSED MIGRATED | QA Contact: | atomic-bugs <atomic-bugs> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 8.7 | CC: | anrao, bbaude, ceph-eng-bugs, cephqe-warriors, dornelas, dwalsh, gjose, gmeno, jligon, jnovy, lsm5, mboddu, mheon, pthomas, tonay, tpetr, tsweeney, umohnani, vereddy, vumrao | |
| Target Milestone: | rc | Keywords: | MigratedToJIRA | |
| Target Release: | 8.7 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2169767 (view as bug list) | Environment: | ||
| Last Closed: | 2023-09-11 19:09:22 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2169767, 2222003 | |||
|
Description
Lijo Stephen Thomas
2023-01-20 21:01:32 UTC
Hi, we tested this deployment RHCS 4.3z1 with RHEL 8.7, hitting same issue as reported in this BZ [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# rpm -qa | grep ansible ceph-ansible-4.0.70.18-1.el8cp.noarch ansible-2.9.27-1.el8ae.noarch [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# rpm -qa | grep podman podman-4.2.0-6.module+el8.7.0+17498+a7f63b89.x86_64 podman-catatonit-4.2.0-6.module+el8.7.0+17498+a7f63b89.x86_64 [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.7 (Ootpa) [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# podman version Client: Podman Engine Version: 4.2.0 API Version: 4.2.0 Go Version: go1.18.4 Built: Mon Dec 12 06:41:56 2022 OS/Arch: linux/amd64 Deployment failed with below error ============ Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20023]: find: '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring': Permission > Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20023]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyrin> Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Failed with result 'exit-code'. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 3. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Stopped Ceph Manager. Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Starting Ceph Manager... Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20266]: Error: no container with name or ID "ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer" found: no such container Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20277]: Error: no container with name or ID "ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer" found: no such container Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20286]: Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20286]: a4fd9d117ca54e869ecc5cb4b9a42290d4a6d51ba38348bee186dba16edc3c08 Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Started Ceph Manager. Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20296]: find: '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring': Permission > Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20296]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyrin> Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Failed with result 'exit-code'. Feb 02 07:22:27 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart. Feb 02 07:22:27 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 4. ========================= [root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# ll -lZ /var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring -rw-------. 1 167 167 system_u:object_r:var_lib_t:s0 172 Feb 2 07:21 /var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring ================================ ls -lZd /var/lib/ceph The only change in this area that I am aware of is, if the top level directory is labeled correctly from an SELinux point of view, podman will no longer relabel the contents of the directory. Meaning if you mv'd files into the directory we could have an issue, if /var/lib/ceph is labeled container_file_t:s0. Why aren't both sides using the :z? What AVCs are you seeing? what is the label of ls -lZd /var/lib/ceph /etc/ceph /var/run/ceph /var/log/ceph When I analyze the AVCs, audit2allow indicates that these rules are dontaudited in current SELinux policy. ~ $ audit2allow -i /tmp/t #============= container_t ============== #!!!! This avc has a dontaudit rule in the current policy allow container_t var_lib_t:dir read; #============= init_t ============== #!!!! This avc has a dontaudit rule in the current policy allow init_t initrc_t:process siginh; #!!!! This avc has a dontaudit rule in the current policy allow init_t unconfined_service_t:process siginh; Which indicates to me, that either someone did # semodule -DB Turning off dontaudit rules. Or container-selinux failed to be installed properly. This allow rule allow container_t var_lib_t:dir read; comes from a container trying to read /var/lib/ceph/mgr/ceph-ceph50/ I would guess. So this countainer is not being run as spc_t, so the SELinux container separation is not turned on. Looks like the container either needs to be run with `--security-opt label=disabled`if run by Podman. Or with `SecurityOpt Label type spc_t` if run via OpenSHift. Or the content needs to be relabeled container_file_t using the :z option. One possible issues isa change podman that it checks the top level directory of a volume, it it has the correct label on it. Podman will no longer walk the entire directory tree to relabel files/directories under the top level. Perhaps for some reason you have this setup. You could just do a `chcon -Rt container_file_t PATHTO/SRCVOLUME` And then everything should work correctly or just change the top level directory to not be container_file_t, and then the :z will relabel everything. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |