| Summary: | autorelabel is too slow by a factor of 5 or 6 | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | John Reiser <jreiser> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | dwalsh, hhoyer, johannbg, metherid, mschmidt, notting, plautrba, systemd-maint |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | systemd-44-1.fc17 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-03-21 19:10:32 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
John Reiser
2012-01-14 18:17:57 UTC
Autorelabel is doing the equivalent of find /, along with reading the XAttr on every file, then checking if the Xattr matches the default, and writing a new XAttr if it does not. I do not know of a better way to speed this up. Could some heuristic caching help? Particularly for ext2/ext3/ext4?
The "find /" takes less than one minute [see data at end of this comment.]
Although in theory there might be no correlation in the labels for "consecutive" files, in practice nearly all the regular files (S_IFREG) in a given directory have the same label, especially if they have the same UID, GID, and .st_mode. So if there is a way to tell that the _existing_ labels are the same for large subsets of files in a directory (perhaps by considering filesystem type, particularly ext2/ext3/ext4) then only one check against the new label suffices.
Also, is there a way to "invert" the label checking? For example, to a large degree e2fsck processes blocks of meta information in disk order, and not according to directory hierarchy ("find" order). This is much faster because it reduces random seeks a lot.
-----
# time (cd /; find $(ls | grep -v proc) | wc)
225646 225723 13025709
real 0m40.773s
user 0m0.901s
sys 0m11.128s
#
-----
How does (cd /; find $(ls | grep -v proc) | restorecon -f - do The thing is we need paths to compare against regular expressions, in order to check labels. I guess if we had a quicker way to get all the paths on a system then running find, we could pipe them to restorecon -F - and check. But it is still going to read the XATTR and check it versus the regular expression. If different it will write the XATTR. Just under 9 minutes for the restorecon, with high CPU utilization almost all the time (gnome-system-monitor also running [I'll try again without this]), and harddrive LED flickering moderately but also often dark for 3 to 5 seconds at a stretch: # cd /; find $(ls | grep -v proc) | time restorecon -f - find: `home/jreiser/.gvfs': Permission denied ### This is expected, I think. 271.51user 147.54system 8:54.19elapsed 78%CPU (0avgtext+0avgdata 106728maxresident)k 19066inputs+0outputs (0major+26875minor)pagefaults 0swaps # rpm -q selinux-policy-targeted kernel selinux-policy-targeted-3.10.0-75.fc17.noarch kernel-3.2.1-5.fc17.x86_64 After several runs of restorecon and/or autorelabel, the good news is that autorelabel of this filesystem now takes under 8 minutes. The bad news is that one of the autorelabel runs took 2 hours and 12 minutes, and the reasons why are not known. Also, there may have been interactions involving systemd and/or plymouth. I noticed that the restorecon of Comment 5 was about 7 minutes CPU but 9 minutes elapsed. gnome-system-monitor was running (GNOME3 desktop, 8400GS graphics, 1 processor @2GHz), so next I would try restorecon without gnome-system-monitor. In fact, a reboot seemed like a good idea; and because I was shutting down anyway, I decided to run autorelabel: # touch /.autorelabel; reboot The desktop closed, systemd drivel scrolled by on text console (64x160 characters on 1280x1024 monitor), and autorelabel began. There are 225,000 files; at one asterisk '*' per 1024 files, that implies 220 '*' characters, or about 1.4 full lines. The first half line went by OK, but then things became slower and slower. By the time the line reached the right edge of the screen (160 '*'), the elapsed time was 1 hour 46 minutes, and it was taking more than one minute between successive '*'. I had to depart at this point, but I left it running. When I came back 45 minutes later, a reboot had taken place. By looking at the timestamps in /var/log/messages, I determined that the time from the 'reboot' command to the start of the log for the newly-booted kernel was 2 hours 13 minutes. Except for systemd overhead at the start of the reboot command, and BIOS time before the boot of Linux, this was all during the running of autorelabel. I rebooted again to single-user mode, and also removed 'quiet' and 'rhgb' from the kernel boot command arguments. In single-user mode, the restorecon took: 263.27user 143.68system 7:04.27elapsed 95%CPU (105764maxresident)k 38554inputs + 0outputs (0major + 26634minor)pagefaults 0swaps so gnome-system-monitor must have accounted for the 2 minute difference between 9 minutes elapsed under GNOME3 desktop when running the restorecon with gnome-system-monitor, and 7 minutes elapsed under single-user text console without gnome-system-monitor. Still in single-user mode, I invoked autorelabel: # touch /.autorelabel; reboot (Again, I changed the kernel boot command arguments by adding 'single', and removing 'rhgb' and 'quiet'.) When invoked from graphical desktop, then the autorelabel runs before I see the BIOS. But this time when autorelabel was "invoked" from single-user mode before reboot, then I saw the BIOS first, and the autorelabel ran after the BIOS, at the time when I would have gotten a single-user shell from the new boot. This autorelabel finished in just under 8 minutes. The 1.4 text lines of asterisks "******..." appeared as expected, slowing only somewhat towards the right end of the first line. Astounded at the reduction in elapsed time, I repeated the "touch /.autorelabel; reboot" twice more, each time booting to single-user mode without 'quiet' and without 'rhgb'. The autorelabels finished in 8 minutes each. Although gratified that autorelabel is now working much better, I cannot explain what happened. I am not sure what you have changed? Are you suggesting we change some defaults to make this run faster for everyone? The sequential measuring of the performance of restorecon and autorelabel on this one filesystem resulted in some "permanent" change to the observed performance of autorelabel. I ran restorecon a couple times, and autorelabel several times, and rebooted several times into single-user mode (and without plymouth). For one of the runs, autorelabel was dramatically slower than the first time (which was already much too slow). After that, including three runs in a row, autorelabel was dramatically faster, and close to the time required for restorecon. The two slow runs of autorelabel were requested while running GNOME3 Desktop, then an explicit "reboot" command. The graphical desktop shuts down, systemd spews some stuff onto the VGA text console, autorelabel runs, the BIOS gains control, GRUB2 boots a new instantiation of the kernel. The three fast runs of autorelabel were requested from single-user non-graphical VGA text console mode. There is minimal systemd spew, the BIOS gains control, GRUB2 boots a new instantiation of the kernel, autorelabel runs instead of the single-user shell. Note that autorelabel runs at a different time: after the BIOS instead of before the BIOS. For some boots I disable plymouth by removing 'rghb' from the kernel boot command line. The plymouth shutdown sequence often hangs near 90% complete or so, with no apparent progress for 30 seconds or more. Ctrl-Alt-Del triggers more systemd spew onto VGA console, but the BIOS gains control only when I press the hardware Reset button. When autorelabel runs _before_ the BIOS, then it is impossible to see what actually is running. Network is down, other VTn consoles are down, etc. So some process could be hanging around and interfering with autorelabel, but there is no way to get any information out of the system (except by watching the slowdown of the progress bar of asterisks.) Harald pointed out that this might be related to syslog. Since the restorecon is sending a syslog message for each file that it fixes a label on, the slow down might be caused by interaction with the systemd-journald? Exactly. http://lists.fedoraproject.org/pipermail/devel/2012-January/161767.html Lennart made the logging non-blocking in upstream commit: http://cgit.freedesktop.org/systemd/systemd/commit/?id=632117b71fdd2d1b9da7418d00ec21c18d60dedd This will make the problem go away, though I don't think it is the ideal solution. I have also removed the syslog messages in a full relabel, since I don't believe anyone every looked at them. systemd-44-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/systemd-44-1.fc17 Package systemd-44-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-44-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-3918/systemd-44-1.fc17 then log in and leave karma (feedback). systemd-44-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |