Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2162781

Summary:	[cee/sd][ceph-ansible] RHCS 4.3z1 installation fails on RHEL 8.7 at TASK [ceph-mgr : wait for all mgr to be up]
Product:	Red Hat Enterprise Linux 8	Reporter:	Lijo Stephen Thomas <lithomas>
Component:	podman	Assignee:	Daniel Walsh <dwalsh>
Status:	CLOSED MIGRATED	QA Contact:	atomic-bugs <atomic-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.7	CC:	anrao, bbaude, ceph-eng-bugs, cephqe-warriors, dornelas, dwalsh, gjose, gmeno, jligon, jnovy, lsm5, mboddu, mheon, pthomas, tonay, tpetr, tsweeney, umohnani, vereddy, vumrao
Target Milestone:	rc	Keywords:	MigratedToJIRA
Target Release:	8.7	Flags:	pm-rhel: mirror+
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2169767 (view as bug list)		Environment:
Last Closed:	2023-09-11 19:09:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2169767, 2222003

Description Lijo Stephen Thomas 2023-01-20 21:01:32 UTC

Description of problem:
-----------------------
RHCS 4.3z1 installation fails on RHEL 8.7 at TASK [ceph-mgr : wait for all mgr to be up].

The mgr services fails to start because of below error:
```
Jan 19 03:12:25 ceph50 systemd[1]: Started Ceph Manager.
Jan 19 03:12:25 ceph50 ceph-mgr-ceph50[677680]: find: '/var/lib/ceph/mgr/ceph-ceph50/keyring': Permission denied
Jan 19 03:12:25 ceph50 ceph-mgr-ceph50[677680]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph50/keyring': Permission denied
Jan 19 03:12:26 ceph50 systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE
Jan 19 03:12:26 ceph50 systemd[1]: ceph-mgr: Failed with result 'exit-code'.
Jan 19 03:12:36 ceph50 systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart.
Jan 19 03:12:36 ceph50 systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 5198.
Jan 19 03:12:36 ceph50 systemd[1]: Stopped Ceph Manager.
```

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHCS         : 4.3z1 
ceph-ansible : 4.0.70.18-1.el8cp.noarch
RHEL         : 8.7
Podman       : 4.2


How reproducible:
-----------------
Every time


Steps to Reproduce:
Deploy fresh ceph cluster on 4.3z1 on top of RHEL 8.7.


Actual results:
---------------
ceph mgr fails to start service because of selinux label and hence the deployment fails.

Expected results:
-----------------
ceph mgr should start service and the deployment should succeed.

Comment 4 Manisha Saini 2023-02-02 12:59:55 UTC

Hi,

we tested this deployment RHCS 4.3z1 with RHEL 8.7, hitting same issue as reported in this BZ


[root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# rpm -qa | grep ansible
ceph-ansible-4.0.70.18-1.el8cp.noarch
ansible-2.9.27-1.el8ae.noarch

[root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# rpm -qa | grep podman
podman-4.2.0-6.module+el8.7.0+17498+a7f63b89.x86_64
podman-catatonit-4.2.0-6.module+el8.7.0+17498+a7f63b89.x86_64

[root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.7 (Ootpa)

[root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# podman version
Client:       Podman Engine
Version:      4.2.0
API Version:  4.2.0
Go Version:   go1.18.4
Built:        Mon Dec 12 06:41:56 2022
OS/Arch:      linux/amd64


Deployment failed with below error

============
Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20023]: find: '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring': Permission >
Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20023]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyrin>
Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE
Feb 02 07:22:06 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Failed with result 'exit-code'.
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart.
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 3.
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Stopped Ceph Manager.
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Starting Ceph Manager...
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20266]: Error: no container with name or ID "ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer" found: no such container
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20277]: Error: no container with name or ID "ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer" found: no such container
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20286]: 
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer podman[20286]: a4fd9d117ca54e869ecc5cb4b9a42290d4a6d51ba38348bee186dba16edc3c08
Feb 02 07:22:16 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: Started Ceph Manager.
Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20296]: find: '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring': Permission >
Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer ceph-mgr-ceph-mobisht-4-3z1-aobcda-node1-installer[20296]: chown: cannot access '/var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyrin>
Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE
Feb 02 07:22:17 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Failed with result 'exit-code'.
Feb 02 07:22:27 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart.
Feb 02 07:22:27 ceph-mobisht-4-3z1-aobcda-node1-installer systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 4.
=========================


[root@ceph-mobisht-4-3z1-aobcda-node1-installer ceph]# ll -lZ /var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring 
-rw-------. 1 167 167 system_u:object_r:var_lib_t:s0 172 Feb  2 07:21 /var/lib/ceph/mgr/ceph-ceph-mobisht-4-3z1-aobcda-node1-installer/keyring


================================

Comment 9 Daniel Walsh 2023-02-04 14:08:54 UTC

ls -lZd /var/lib/ceph


The only change in this area that I am aware of is, if the top level directory is labeled correctly from an SELinux point of view, podman will no longer relabel the contents of the directory.

Meaning if you mv'd files into the directory we could have an issue, if /var/lib/ceph is labeled container_file_t:s0.

Comment 14 Daniel Walsh 2023-02-15 08:24:55 UTC

Why aren't both sides using the :z?


What AVCs are you seeing?  what is the label of
ls -lZd /var/lib/ceph /etc/ceph /var/run/ceph /var/log/ceph

Comment 16 Daniel Walsh 2023-02-18 12:45:37 UTC

When I analyze the AVCs, audit2allow indicates that these rules are dontaudited in current SELinux policy.
~ $ audit2allow  -i /tmp/t


#============= container_t ==============

#!!!! This avc has a dontaudit rule in the current policy
allow container_t var_lib_t:dir read;

#============= init_t ==============

#!!!! This avc has a dontaudit rule in the current policy
allow init_t initrc_t:process siginh;

#!!!! This avc has a dontaudit rule in the current policy
allow init_t unconfined_service_t:process siginh;


Which indicates to me, that either someone did

# semodule -DB

Turning off dontaudit rules.  Or container-selinux failed to be installed properly.

Comment 17 Daniel Walsh 2023-02-18 12:49:14 UTC

This allow rule
allow container_t var_lib_t:dir read;

 comes from a container trying to read
  /var/lib/ceph/mgr/ceph-ceph50/

I would guess.

So this countainer is not being run as spc_t, so the SELinux container separation is not turned on.


Looks like the container either needs to be run with `--security-opt label=disabled`if run by Podman.
Or with `SecurityOpt Label type spc_t` if run via OpenSHift.

Or the content needs to be relabeled container_file_t using the :z option.

Comment 22 Daniel Walsh 2023-03-14 18:00:52 UTC

One possible issues isa change podman that it checks the top level directory of a volume, it it has the correct label on it.
Podman will no longer walk the entire directory tree to relabel files/directories under the top level.

Perhaps for some reason you have this setup.

You could just do a `chcon -Rt container_file_t PATHTO/SRCVOLUME`

And then everything should work correctly or just change the top level directory to not be container_file_t, and then the :z will relabel everything.

Comment 26 RHEL Program Management 2023-09-11 19:04:33 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 27 RHEL Program Management 2023-09-11 19:09:22 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.