1499098 – Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled and context has changed

Bug 1499098 - Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled and context has changed

Summary: Upgrading the ceph-selinux package unconditionally restarts all running daemo...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Documentation
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	3.1
Assignee:	Erin Donnelly
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1578730 1609459
TreeView+	depends on / blocked

Reported:	2017-10-06 03:03 UTC by Brad Hubbard
Modified:	2018-08-27 08:45 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-27 08:45:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	21672	0	None	None	None	2017-10-06 03:03:37 UTC

Description Brad Hubbard 2017-10-06 03:03:38 UTC

Description of problem:

This behaviour is due to the postinstall/postuninstall script for the
ceph-selinux package and is less than ideal in a hyperconverged
scenario when MONs must be upgrade before OSDs can be restarted.

$ rpm -qp --scripts ceph-selinux-10.2.7-41.el7cp.x86_64.rpm|head -40
postinstall scriptlet (using /bin/sh):
# backup file_contexts before update
. /etc/selinux/config
FILE_CONTEXT=/etc/selinux/${SELINUXTYPE}/contexts/files/file_contexts
cp ${FILE_CONTEXT} ${FILE_CONTEXT}.pre

# Install the policy
/usr/sbin/semodule -i /usr/share/selinux/packages/ceph.pp

# Load the policy if SELinux is enabled
if ! /usr/sbin/selinuxenabled; then
    # Do not relabel if selinux is not enabled
    exit 0
fi

if diff ${FILE_CONTEXT} ${FILE_CONTEXT}.pre > /dev/null 2>&1; then
   # Do not relabel if file contexts did not change
   exit 0
fi

# Check whether the daemons are running
/usr/bin/systemctl status ceph.target > /dev/null 2>&1
STATUS=$?

# Stop the daemons if they were running
if test $STATUS -eq 0; then
    /usr/bin/systemctl stop ceph.target > /dev/null 2>&1
fi

# Now, relabel the files
/usr/sbin/fixfiles -C ${FILE_CONTEXT}.pre restore 2> /dev/null
rm -f ${FILE_CONTEXT}.pre
# The fixfiles command won't fix label for /var/run/ceph
/usr/sbin/restorecon -R /var/run/ceph > /dev/null 2>&1

# Start the daemons iff they were running before
if test $STATUS -eq 0; then
    /usr/bin/systemctl start ceph.target > /dev/null 2>&1 || :
fi
exit 0


Actual results:
All daemons are restarted

Expected results:
All daemons not restarted

Additional info:
See upstream tracker http://tracker.ceph.com/issues/21672

Steps to Reproduce:

# cp /etc/selinux/targeted/contexts/files/file_contexts /tmp/file_contexts.pre
# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28
# rpm -q ceph-mon ceph-osd
ceph-mon-10.2.10-0.el7.x86_64
ceph-osd-10.2.10-0.el7.x86_64
# ps auwwx|grep ceph-
ceph        1038  0.0  2.9 357024 30232 ?        Ssl  22:34   0:00 /usr/bin/ceph-mon -f --cluster ceph --id MON1 --setuser ceph --setgroup ceph
ceph        2329  0.1  4.8 885768 49584 ?        Ssl  22:34   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
ceph        2610  0.1  3.6 879044 37536 ?        Ssl  22:34   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
root        2832  0.0  0.0 112648   976 pts/0    R+   22:43   0:00 grep --color=auto ceph-
# yum -y update ceph-mon
# ps auwwx|grep ceph-
ceph        1038  0.0  3.1 364208 32268 ?        Ssl  22:34   0:00 /usr/bin/ceph-mon -f --cluster ceph --id MON1 --setuser ceph --setgroup ceph
ceph        3932  2.0  3.4 771688 34680 ?        Ssl  22:49   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
ceph        3934  3.0  3.9 777120 39932 ?        Ssl  22:49   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
root        4056  0.0  0.0 112664   968 pts/0    R+   22:49   0:00 grep --color=auto ceph-
# rpm -q ceph-mon ceph-osd
ceph-mon-12.2.1-0.el7.x86_64
ceph-osd-12.2.1-0.el7.x86_64
# diff /etc/selinux/targeted/contexts/files/file_contexts /tmp/file_contexts.pre
3979d3978
< /usr/bin/ceph-mgr     --      system_u:object_r:ceph_exec_t:s0

It's not clear why the MON service wasn't restarted above, it should have been as the 'journalctl -x' output shows the ceph.target being stopped and then started. Possibly a problem with my manual configuration. I believe the relevant change in selinux in this case is commit 8f6a526f9a36ff847755cba68b6b78b37e8e99cb but any change in the file context will cause this. This was all accomplished with upstream packages but it simulates the upgrade from RHCS2 -> RHSC3 to at least some extent.

Comment 1 Brad Hubbard 2017-10-06 03:07:13 UTC

Note that the postuninstall script also has the potential to restart the daemons.

Comment 2 Boris Ranto 2017-10-06 08:34:19 UTC

The daemon restart is by design. The daemons need to be down when they are being relabelled. Otherwise, they might just keep on writing improperly labelled files.

However, we should document this and add a note that if you are running a hyperconverged scenario you should stop the daemons before the upgrade (so this won't really show up as an issue...) and start them after the upgrade.

Comment 3 Boris Ranto 2017-11-08 20:45:07 UTC

@Anjana: Why did you move this to 3.1? Are we not documenting that the admins should stop the daemons before the upgrade to 3.0 and start them after the upgrade to 3.0? Because if not, we definitely should.

Comment 5 Ian Pilcher 2018-08-01 19:42:13 UTC

An example of where this can bite is here - https://bugzilla.redhat.com/show_bug.cgi?id=1609459 (which also explains how the mons can fail to restart).  It also causes all OSDs to go down if 'yum update' is naïvely run simultaneously on all OSD nodes, which TripleO sometimes does.

Note that I only hit this because running 'rhos-release 13' automatically subscribes the node to the Ceph 3 repos, and I didn't realize that wasn't supposed to happen as part of the OSP fast-forward upgrade workflow.  It does seem like this is an awfully easy issue to hit.

Comment 6 Boris Ranto 2018-08-15 08:31:44 UTC

I suppose this might be fairly easy to hit if you do not follow the upgrade instructions when upgrading Ceph.

AFAIK, we only require MONs to be restarted before OSDs in case of major upgrades and I certainly hope that we do document that you should stop the daemons before doing the major upgrade (iirc, we do). If the daemons are stopped during upgrade they won't be restarted by the post script that runs the relabelling so you won't run into issues like these.

Note You need to log in before you can comment on or make changes to this bug.