Bug 1416575 - [RFE] Implement fix files (restorecon/chown) function in ceph-disk
Summary: [RFE] Implement fix files (restorecon/chown) function in ceph-disk
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Build
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 2.3
Assignee: Boris Ranto
QA Contact: Tejas
URL:
Whiteboard:
Depends On:
Blocks: 1437916
TreeView+ depends on / blocked
 
Reported: 2017-01-25 21:10 UTC by David Galloway
Modified: 2017-06-19 13:29 UTC (History)
8 users (show)

Fixed In Version: RHEL: ceph-10.2.7-4.el7cp Ubuntu: ceph_10.2.7-5redhat1xenial
Doc Type: Enhancement
Doc Text:
.The process of enabling SELinux on a Ceph Storage Cluster has been improved A new subcommand has been added to the ceph-disk utility that can help make the process of enabling SELinux on a Ceph Storage Cluster faster. Previously, the standard way of SELinux labeling did not take into account the fact that OSDs usually reside on different disks. This caused the labelling process to be slow. This new subcommand is designed to speed up the process by labeling the Ceph files in parallel per OSD.
Clone Of:
Environment:
Last Closed: 2017-06-19 13:29:05 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1497 normal SHIPPED_LIVE Red Hat Ceph Storage 2.3 bug fix and enhancement update 2017-06-19 17:24:11 UTC
Ceph Project Bug Tracker 19543 None None None 2017-04-06 23:01:44 UTC
Red Hat Bugzilla 1455532 None CLOSED Backport https://github.com/ceph/ceph/pull/14871 2019-07-30 16:28:37 UTC

Internal Links: 1455532

Description David Galloway 2017-01-25 21:10:45 UTC
This is sort of related to tracker.ceph.com/issues/9927.

I'm in the process of upgrading a RHCS cluster from 1.3 to 2 and am 3 hours in (and counting) to an selinux relabel of /var/lib/ceph on 6TB worth of OSD data.

If rebooting an OSD node is part of the upgrade process anyway [1], would it make more sense for ceph-selinux to touch /.autorelabel instead of running '/sbin/restorecon -i -f - -R -e /sys -e /proc -e /dev -e /run -e /mnt -e /var/tmp -e /home -e /tmp -e /dev' during the yum transaction?

A couple reasons I can think of:
- It'd be more clear to the user what's happening.  The docs don't mention this will happen *during* the 'yum update' and just see that ceph-selinux has been installing for hours.
- Less of a chance of the yum transaction getting interrupted.
- While I can appreciate excluding non-Ceph portions of the relabel from the restorecon, if you're using the host as an OSD node, that's probably where most of your storage is so the rest of the filesystem shouldn't be much more overhead.
- If the user's running in Permissive mode and really doesn't want to relabel, they can remove /.autorelabel before rebooting.

You could even append the paths currently excluded to /etc/selinux/fixfiles_exclude_dirs and have Ceph remove them after rebooting.

[1] https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/installation-guide-for-red-hat-enterprise-linux/chapter-5-upgrading-ceph-storage-cluster

Comment 2 Boris Ranto 2017-01-26 10:02:27 UTC
This is quite tricky. If we did relabel on reboot then you would have zero remote access to the machine (afaik, the relabelling is done before sshd is started) so remotely you would have absolutely no idea what is happening.

btw: This should be achievable by a documentation change. We may want to document it as an alternative update path? You could achieve this if you enabled SELinux after you updated the packages. If SELinux is disabled during package update then the policy is still being installed but the files are not being labelled. If you enable SELinux after the package update and touch /.autorelabel it will trigger the full relabel on reboot.

Alternatively, I was thinking we could add an info message during the update to let users know we are relabelling the files and that it may take a long time to complete.

Comment 3 David Galloway 2017-01-26 16:36:06 UTC
(In reply to Boris Ranto from comment #2)
> This is quite tricky. If we did relabel on reboot then you would have zero
> remote access to the machine (afaik, the relabelling is done before sshd is
> started) so remotely you would have absolutely no idea what is happening.

Aside from OOB management, but yeah I'm not sure that's much better from a user perspective.  I only suggested it because the docs already say to touch /.autorelabel so relabeling during the yum transaction is redundant.

> 
> btw: This should be achievable by a documentation change. We may want to
> document it as an alternative update path? You could achieve this if you
> enabled SELinux after you updated the packages. If SELinux is disabled
> during package update then the policy is still being installed but the files
> are not being labelled. If you enable SELinux after the package update and
> touch /.autorelabel it will trigger the full relabel on reboot.
> 
> Alternatively, I was thinking we could add an info message during the update
> to let users know we are relabelling the files and that it may take a long
> time to complete.

It'd be an improvement, although from a nervous sysadmin standpoint I'm still stuck in a yum transaction I now wish I'd started in a screen session.  (The first OSD node I updated took 7 hours for 6 TB of OSD data)

This might be unnecessarily complex but what about having the restorecon done after the next reboot?  The OSDs fail to start if the mountpoints aren't labeled correctly anyway, right?  Could the OSD daemon check if the mountpoints are labeled correctly and start a restorecon before bringing up the OSD?

Comment 4 Boris Ranto 2017-01-30 16:14:03 UTC
(In reply to David Galloway from comment #3)
> Aside from OOB management, but yeah I'm not sure that's much better from a
> user perspective.  I only suggested it because the docs already say to touch
> /.autorelabel so relabeling during the yum transaction is redundant.
> 

Hmm, it sounds like we might want to revisit the docs on this matter.

> 
> It'd be an improvement, although from a nervous sysadmin standpoint I'm
> still stuck in a yum transaction I now wish I'd started in a screen session.
> (The first OSD node I updated took 7 hours for 6 TB of OSD data)
> 
> This might be unnecessarily complex but what about having the restorecon
> done after the next reboot?  The OSDs fail to start if the mountpoints
> aren't labeled correctly anyway, right?  Could the OSD daemon check if the
> mountpoints are labeled correctly and start a restorecon before bringing up
> the OSD?

After the next reboot? Well, that is tricky and would very likely be racy -- i.e. if the daemons are not labelled properly when they are being started (daemons/binaries are labelled in SELinux as well) then it will start just fine and work mostly ok.

What we might actually want to do here is some sort of ceph-disk/ansible command/job that would "fix the files" -- enable SELinux, maybe even change the file uid/gid, do the relabel and all of it automatically. It could run the fixfiles/restorecon manually, start/stop daemons as needed and could be run at any point after the update.

This way we could separate the update process from the relabelling and it would be done per request, not ~forced.

Comment 5 David Galloway 2017-01-30 16:49:56 UTC
Sounds reasonable to me.  May also be wise to do a separate process for each OSD vs just a recursive chown or restorecon on /var/lib/ceph/osd.  On our downstream cluster, each OSD is a separate disk and parallel operations significantly speed up the upgrade process.

From my perspective during a recent RHCS 1.3 -> 2 upgrade, it would have been nice to have the chown done for me and restart the OSD daemons when done.  If the relabel and file ownership could be done simultaneously, we're saving hours of potential downtime.

Comment 6 Boris Ranto 2017-01-30 17:25:26 UTC
Yep, having this implemented in ceph-disk [1] would also allow us to be much more sophisticated and do the parallel restorecon/chown (not sure about doing it at once but some well-designed find command could probably do that for us), etc.

We would still be putting mons/osds/... down during the data format upgrade but the admins would have more control over whether and when they want to do it, etc.

There is also added bonus of having this feature -- we can easily point people to the ceph-disk commands if they are having obvious issues with permissions/avc denials.

I don't think we can make this in time for 2.2 so I'm adding a RFE keyword and re-targeting for 2.3.

[1] I think it would be best to implement it in ceph-disk so that we can do things differently per different ceph versions

Comment 7 Boris Ranto 2017-01-31 11:30:20 UTC
I've been playing with this a bit and I think this should help us speed things up a bit (unless the find, restorecon or chown overhead is too high). Can anyone try running (both as root) something like this

# time find /var/lib/ceph/osd/ -exec chown ceph:ceph '{}' + -exec restorecon '{}' +

and compare the running time to the regular old way (doing this)

# time ( chown -R ceph:ceph /var/lib/ceph/osd/; restorecon -R /var/lib/ceph/osd/ )

on a reasonably big (may I add 'test') cluster? (a single osd run should be plenty enough)

Comment 8 Boris Ranto 2017-02-08 12:20:27 UTC
Upstream PR:

https://github.com/ceph/ceph/pull/13310

It should cover all the points we were discussing, here -- it runs in parallel it can do restorecon and chown at the ~same time thus taking advantage of caching.

I hope I did not forget any directories in the PR/did not include too much.

Comment 9 Boris Ranto 2017-02-16 11:43:35 UTC
FYI: This will need a documentation change as well. We will need to enable SELinux (Permissive mode) _after_ the upgrade (but before starting the daemons), reboot the machine making sure the daemons are not running, then we will need to run restorecon/fixfiles but tell it to avoid touching /var/lib/ceph and then we can finally run the ceph-disk fix command and start the daemons.

Comment 10 Tejas 2017-04-04 11:05:06 UTC
Hi Boris,

  We already have a new method to do the chown on osd disks in parallel, which is in our install doc:
find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1 chown -R ceph:ceph

So just to clarify, you are saying that chown and "touch /.autorelabel"

can be done in parallel?

Thanks,
Tejas

Comment 11 Boris Ranto 2017-04-04 11:35:06 UTC
Hi Tejas,

the upstream patch at its current form has two primary modes:

At the '--all' mode it will begin by relabelling the base system (everything except ceph files, this should usually take only a couple of minutes or so) and only then it will try to chown and relabel the ceph data -- it will do both operations in parallel per osd but sequentially per file to take advantage of caching. This should be the fastest option (especially for large clusters) but I would like to see the speed of these commands tested somewhere.

Then you can choose the sub-parts that you want to fix, i.e. --system to fix the base system SELinux labels, --permissions to fix permissions and --selinux to fix SELinux labels for ceph data. This will again begin by relabelling the base system and it will still do the operations in parallel per osd but it will fix the permissions first (in parallel) and only then it will begin fixing SELinux labels for ceph data (again in parallel) so you won't be able to take advantage of caching.

Neither option is doing "touch /.autorelabel", that step will only tell the system to relabel on next boot. The fix subcommand is running the commands that do an actual relabel (a reboot is required after that, ofc).

The ceph nodes still need to be down (it checks for that) during these operations.

Comment 12 Christina Meno 2017-04-04 18:07:52 UTC
Patch not yet merged upstream,

Comment 13 Harish NV Rao 2017-04-05 07:20:19 UTC
(In reply to Gregory Meno from comment #12)
> Patch not yet merged upstream,

will it be moved out of 2.3?

Comment 14 Boris Ranto 2017-04-05 13:29:11 UTC
The patch was already merged upstream. Now, we should think about getting it to jewel -- this should not be too difficult as it is a new command and mostly only adds new functionality. I've created the following upstream PRs for kraken and jewel back-ports:

https://github.com/ceph/ceph/pull/14345
https://github.com/ceph/ceph/pull/14346

Comment 20 Tejas 2017-05-19 11:43:46 UTC
Hi Boris,

   Could you please let us know how to verify the changes made to ceph-disk in this BZ?

Thanks,
Tejas

Comment 21 Boris Ranto 2017-05-19 12:28:38 UTC
Hi Tejas,

the new subcommand (fix) was implemented. You can check that it works by changing the ownership/selinux labels on some files in /var/lib/ceph. It should fix them back (fyi: this can take a long time on a big cluster). There are several modes as described in 'ceph-disk fix --help' output.

Let me know if you need any more details.

Regards,
Boris

Comment 35 Federico Lucifredi 2017-05-26 23:34:25 UTC
After discussing with Ken and trying to find a compromise, the reasonable solution seems to be to address David's original bug without resolving the additional item that is not merged upstream yet.

This requires Boris to find an alternative way for QE to test this. 

The mentioned bug (1455532) is already =>2.4. Removing "depends".

Boris is on point for what is appropriate as fas as doc updates go, he knows best.

Comment 36 Harish NV Rao 2017-05-29 06:33:05 UTC
@Boris, please let us know at the earliest the relevant scenarios and doc updates that need to be tested for this BZ.

Comment 42 errata-xmlrpc 2017-06-19 13:29:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497


Note You need to log in before you can comment on or make changes to this bug.