Hide Forgot
Description of problem: Autorelabel takes a factor of 5 or 6 longer to run than is reasonable. "fsck -f" on a 20GB ext4 (25% used) takes a few minutes, but autorelabel takes half an hour (2GHz CPU, 3GB DDR2 SDRAM, UDMA100 drive.) During most of autorelabel, the LED which indicates activity on the drive flashes once per few seconds, instead of several times per second. The slowness is particularly unpleasant because relabel is required after using Rescue mode. Version-Release number of selected component (if applicable): selinux-policy-targeted-3.10.0-74.2.fc17.noarch How reproducible: every time Steps to Reproduce: 1. # touch /.autorelabel 2. # reboot 3. Actual results: autorelabel takes several times as long as "fsck -f" Expected results: autorelabel takes about as long as fsck. Additional info: 20GB ext4 filesystem, "tune2fs -l" says: Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1310720 Block count: 5242880 Reserved block count: 262144 Free blocks: 3872470 Free inodes: 1159481 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1022 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512
Autorelabel is doing the equivalent of find /, along with reading the XAttr on every file, then checking if the Xattr matches the default, and writing a new XAttr if it does not. I do not know of a better way to speed this up.
Could some heuristic caching help? Particularly for ext2/ext3/ext4? The "find /" takes less than one minute [see data at end of this comment.] Although in theory there might be no correlation in the labels for "consecutive" files, in practice nearly all the regular files (S_IFREG) in a given directory have the same label, especially if they have the same UID, GID, and .st_mode. So if there is a way to tell that the _existing_ labels are the same for large subsets of files in a directory (perhaps by considering filesystem type, particularly ext2/ext3/ext4) then only one check against the new label suffices. Also, is there a way to "invert" the label checking? For example, to a large degree e2fsck processes blocks of meta information in disk order, and not according to directory hierarchy ("find" order). This is much faster because it reduces random seeks a lot. ----- # time (cd /; find $(ls | grep -v proc) | wc) 225646 225723 13025709 real 0m40.773s user 0m0.901s sys 0m11.128s # -----
How does (cd /; find $(ls | grep -v proc) | restorecon -f - do
The thing is we need paths to compare against regular expressions, in order to check labels. I guess if we had a quicker way to get all the paths on a system then running find, we could pipe them to restorecon -F - and check. But it is still going to read the XATTR and check it versus the regular expression. If different it will write the XATTR.
Just under 9 minutes for the restorecon, with high CPU utilization almost all the time (gnome-system-monitor also running [I'll try again without this]), and harddrive LED flickering moderately but also often dark for 3 to 5 seconds at a stretch: # cd /; find $(ls | grep -v proc) | time restorecon -f - find: `home/jreiser/.gvfs': Permission denied ### This is expected, I think. 271.51user 147.54system 8:54.19elapsed 78%CPU (0avgtext+0avgdata 106728maxresident)k 19066inputs+0outputs (0major+26875minor)pagefaults 0swaps # rpm -q selinux-policy-targeted kernel selinux-policy-targeted-3.10.0-75.fc17.noarch kernel-3.2.1-5.fc17.x86_64
After several runs of restorecon and/or autorelabel, the good news is that autorelabel of this filesystem now takes under 8 minutes. The bad news is that one of the autorelabel runs took 2 hours and 12 minutes, and the reasons why are not known. Also, there may have been interactions involving systemd and/or plymouth. I noticed that the restorecon of Comment 5 was about 7 minutes CPU but 9 minutes elapsed. gnome-system-monitor was running (GNOME3 desktop, 8400GS graphics, 1 processor @2GHz), so next I would try restorecon without gnome-system-monitor. In fact, a reboot seemed like a good idea; and because I was shutting down anyway, I decided to run autorelabel: # touch /.autorelabel; reboot The desktop closed, systemd drivel scrolled by on text console (64x160 characters on 1280x1024 monitor), and autorelabel began. There are 225,000 files; at one asterisk '*' per 1024 files, that implies 220 '*' characters, or about 1.4 full lines. The first half line went by OK, but then things became slower and slower. By the time the line reached the right edge of the screen (160 '*'), the elapsed time was 1 hour 46 minutes, and it was taking more than one minute between successive '*'. I had to depart at this point, but I left it running. When I came back 45 minutes later, a reboot had taken place. By looking at the timestamps in /var/log/messages, I determined that the time from the 'reboot' command to the start of the log for the newly-booted kernel was 2 hours 13 minutes. Except for systemd overhead at the start of the reboot command, and BIOS time before the boot of Linux, this was all during the running of autorelabel. I rebooted again to single-user mode, and also removed 'quiet' and 'rhgb' from the kernel boot command arguments. In single-user mode, the restorecon took: 263.27user 143.68system 7:04.27elapsed 95%CPU (105764maxresident)k 38554inputs + 0outputs (0major + 26634minor)pagefaults 0swaps so gnome-system-monitor must have accounted for the 2 minute difference between 9 minutes elapsed under GNOME3 desktop when running the restorecon with gnome-system-monitor, and 7 minutes elapsed under single-user text console without gnome-system-monitor. Still in single-user mode, I invoked autorelabel: # touch /.autorelabel; reboot (Again, I changed the kernel boot command arguments by adding 'single', and removing 'rhgb' and 'quiet'.) When invoked from graphical desktop, then the autorelabel runs before I see the BIOS. But this time when autorelabel was "invoked" from single-user mode before reboot, then I saw the BIOS first, and the autorelabel ran after the BIOS, at the time when I would have gotten a single-user shell from the new boot. This autorelabel finished in just under 8 minutes. The 1.4 text lines of asterisks "******..." appeared as expected, slowing only somewhat towards the right end of the first line. Astounded at the reduction in elapsed time, I repeated the "touch /.autorelabel; reboot" twice more, each time booting to single-user mode without 'quiet' and without 'rhgb'. The autorelabels finished in 8 minutes each. Although gratified that autorelabel is now working much better, I cannot explain what happened.
I am not sure what you have changed? Are you suggesting we change some defaults to make this run faster for everyone?
The sequential measuring of the performance of restorecon and autorelabel on this one filesystem resulted in some "permanent" change to the observed performance of autorelabel. I ran restorecon a couple times, and autorelabel several times, and rebooted several times into single-user mode (and without plymouth). For one of the runs, autorelabel was dramatically slower than the first time (which was already much too slow). After that, including three runs in a row, autorelabel was dramatically faster, and close to the time required for restorecon. The two slow runs of autorelabel were requested while running GNOME3 Desktop, then an explicit "reboot" command. The graphical desktop shuts down, systemd spews some stuff onto the VGA text console, autorelabel runs, the BIOS gains control, GRUB2 boots a new instantiation of the kernel. The three fast runs of autorelabel were requested from single-user non-graphical VGA text console mode. There is minimal systemd spew, the BIOS gains control, GRUB2 boots a new instantiation of the kernel, autorelabel runs instead of the single-user shell. Note that autorelabel runs at a different time: after the BIOS instead of before the BIOS. For some boots I disable plymouth by removing 'rghb' from the kernel boot command line. The plymouth shutdown sequence often hangs near 90% complete or so, with no apparent progress for 30 seconds or more. Ctrl-Alt-Del triggers more systemd spew onto VGA console, but the BIOS gains control only when I press the hardware Reset button. When autorelabel runs _before_ the BIOS, then it is impossible to see what actually is running. Network is down, other VTn consoles are down, etc. So some process could be hanging around and interfering with autorelabel, but there is no way to get any information out of the system (except by watching the slowdown of the progress bar of asterisks.)
Harald pointed out that this might be related to syslog. Since the restorecon is sending a syslog message for each file that it fixes a label on, the slow down might be caused by interaction with the systemd-journald?
Exactly. http://lists.fedoraproject.org/pipermail/devel/2012-January/161767.html Lennart made the logging non-blocking in upstream commit: http://cgit.freedesktop.org/systemd/systemd/commit/?id=632117b71fdd2d1b9da7418d00ec21c18d60dedd This will make the problem go away, though I don't think it is the ideal solution.
I have also removed the syslog messages in a full relabel, since I don't believe anyone every looked at them.
systemd-44-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/systemd-44-1.fc17
Package systemd-44-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-44-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-3918/systemd-44-1.fc17 then log in and leave karma (feedback).
systemd-44-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.