Red Hat Bugzilla – Bug 1262374
docker-selinux: Package upgrade taking a long time and using large amounts of resources
Last modified: 2016-04-01 00:00:15 EDT
+++ This bug was initially created as a clone of Bug #1251458 +++
Description of problem:
While upgrading the docker-selinux package, a `restorecon` command is being run and taking multiple minutes and occupying huge CPU and I/O resources.
I took a look at the command line but didn't manage to save it, but noticed it included /var/lib/docker. That has the potential to take a huge time if lots of images exist, and I have a suspicion it might be problematic with symbolic/hard links being followed outside of a container environment.
Version-Release number of selected component (if applicable):
Hard to tell, but probably easily since it seems the command is part of the package's upgrade script.
Steps to Reproduce:
1. Upgrade the docker-selinux package using dnf
2. Watch it take forever
Package update took 10+ mins, in a desktop system with a fast SSD. Would probably take much longer if /var/lib/docker was in a slow HDD, or in a laptop.
Package upgrade does not perform blocking, resource and time-consuming tasks unnecessarily. I also hope it won't happen every time the package is upgraded, which would be very inconvenient.
--- Additional comment from Daniel Walsh on 2015-08-07 07:57:00 EDT ---
Yes not much we can do about, if the labeling changes we need to fix the labeling, if there are a huge number of files, then it will take some time.
It should only happen this once. If you did a resistall of docker-selinux it should happen much quicker.
--- Additional comment from Daniel Miranda on 2015-08-07 09:15:15 EDT ---
What about a service that runs before the docker service, checks if a relabelling is needed and does it instead? Seems much saner than doing this with no warning or choice during package upgrades. It seems even crazier in servers if it means a possible service disruption due to resource exhausting, with no way for an administrator to predict it.
--- Additional comment from Daniel Walsh on 2015-08-08 05:31:33 EDT ---
This has been the way that SELinux has been updating for greater then 10 years.
There should be no resource consumption, since basically it is walking /var/lib/docker and changing context, with a fairly stable tool restorecon/setfiles.
The loading/compiling which takes a lot of memory and CPU is greatly improved in Fedora 23 which should improve the situation.
How many files do you have under /var/lib/docker?
You can also eliminate the relabel of /var/lib/docker by adding /var/lib/docker to /etc/selinux/fixfiles_exclude_dirs.
--- Additional comment from Daniel Miranda on 2015-08-25 20:32:58 EDT ---
I just got another update and I can confirm there was actually a lot of resource consumption. I/O was heavily saturated and about 80% CPU was used. Changing the contexts itself obviously isn't free, it causes a bunch of disk writes and cache trashing. I understand the need, but it's honestly quite a bit inconvenient still.
My /var/lib/docker folder has 1728485 files totalling 31GB, and this time the relabelling took about 3min.
I don't have a F23 installation right now to test, but it's nice to know work is happening on making things speedier. For now though, I'll probably have to exclude /var/lib/docker from the relabelling and do it myself occasionally.
--- Additional comment from Daniel Walsh on 2015-08-26 07:34:05 EDT ---
This should be a one time occurance, not something to happen regularly. The 80% CPU might be the recompiling of policy.
--- Additional comment from Daniel Miranda on 2015-08-26 12:50:57 EDT ---
As I mentioned, it happened for a second time in less than a month. If it wasn't supposed to happen I can attempt to provide information to help you debug it.
--- Additional comment from Daniel Miranda on 2015-09-11 08:03:08 EDT ---
Just for the record: creating a /etc/selinux/fixfiles_exclude_dirs with a single line of /var/lib/docker had no effect on the relabelling. It happened *again* and is taking 10 minutes *again*.
--- Additional comment from Daniel Walsh on 2015-09-11 08:09:19 EDT ---
When is this happening on a selinux-policy-targeted update?
--- Additional comment from Daniel Miranda on 2015-09-11 08:49:18 EDT ---
Apologies, but I don't think I understood your question (or I can't answer it). What I see is a very long delay while DNF is upgrading docker-selinux. This time when I saw it in the pending upgrade list, I edited /etc/selinux/fixfiles_exclude_dirs, but the upgrade still seemingly did the relabel, as it took a good amount of time.
Thanks for taking the time to help me.
--- Additional comment from Daniel Walsh on 2015-09-11 09:07:34 EDT ---
There are two things that can cause the delay. Every selinux-policy update currently will take over a minute while the policy compiles. This is a selinux policy compiler that we have faced for years, which is finally fixed in Rawhide. But will not be backported to older versions.
You can see this delay by executing:
# semodule -B
The second delay is looking for changes in the file_context files between one version of policy and the next. If there is a change then the rpm will do as minimum a fixfilex/restorecon recursively on the difference as possible. This usually should not take that long, unless it ends up doing something like restorecon -R /var or restorecon -R /home.
In this case it walks those parts of the file system and fixes the labels.
fixfiles is supposed to skip any directories in the /etc/selinux/fixfiles_exclude_dirs file
I will attach a patch script that would show what it will exclude.
--- Additional comment from Daniel Walsh on 2015-09-11 09:08:40 EDT ---
This code is cut from the current fixfiles on my machine. Fedora-Rawhide
--- Additional comment from Daniel Miranda on 2015-09-11 09:27:39 EDT ---
Running `semodule -B` manually takes 13.4 seconds, while the package upgrade takes 10m+ plus. Doesn't look like it is the issue to me.
You say that doing a restorecon in /var would be bad, but I think even getting into /var/lib/docker would already cause a long delay, since it's quite large in my system (currently 20GB).
Running your script (replacing logit with echo) shows:
$ sh fixfiles.sh
skipping the directory /var/lib/docker
But looking at the docker spec file, I don't see it using the fixfiles script at all. It calls restorecon manually:
Maybe that's the issue, and why the exclusion had no effect.
--- Additional comment from Daniel Walsh on 2015-09-11 09:37:21 EDT ---
Ok I was thinking you were reporting this on selinux-policy update.
If the docker-selinux package is restorecon -R /var/lib/docker on every update that is indeed wrong.
--- Additional comment from Daniel Walsh on 2015-09-11 09:50:38 EDT ---
Lokesh pleas patch docker.spec to only run restorecon on /var/lib/docker on initial install. Otherwise on lare /var/lib/docker this will go very slow.
Fixed in docker-1.9
In docker-1.9.1-15.el7.x86_64, the new patch is in docker.spec.
As i have limited resource to test the installing speed, only check the code.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.