Description of problem: Fedora32 install took in excess of 48 hours to complete installation! Version-Release number of selected component (if applicable): F32 wORKSTATION UPGRADE How reproducible: Unknown Steps to Reproduce: 1. sudo dnf system-upgrade reboot 2. 3. Actual results: After reboot dnf updated several thousand packages rapidly, then stalled tremendously, showing 30% complete, on one of the SElinux packages Expected results: Reasonably rapid install or else an explanatory message. Additional info: One of the selinux* packages during installation launched a 'restorecon' across the files systems. When it encountered a backuo BTRFS (compressed) filesystem containing ~400GB of small backup files, the performance of restorecon was abysmal. I was able to observe the installation progress (via escape key). I was able to login as root via a virtual terminal and observe the problem with strace.
Correction, the time between reboots was 45.783 hours. from utmpdump /var/log/wtmp >> [2] [00000] [~~ ] [reboot ] [~ ] [5.6.11-200.fc31.x86_64] [0.0.0.0 ] [2020-05-16T06:40:31,782810+00:00] >> [2] [00000] [~~ ] [reboot ] [~ ] [5.6.12-300.fc32.x86_64] [0.0.0.0 ] [2020-05-18T04:37:44,783445+00:00] df -h of problematic file system. /dev/sdb1 699G 391G 306G 57% /backup mount parameters /dev/sdb1 on /backup type btrfs (rw,noatime,seclabel,compress=lzo,space_cache,subvolid=257,subvol=/backup)
Switching the component to btrfs. Folks, some versions of selinux-policy had as a result complete relabeling of all filesystems. This is tracked in bz#1832327. It looks though in case of btrfs it can take up to 2 days which is too much time. Is this a know issue? Is it expected? Can there be such a big performance penalty?
(In reply to Steve Alexander from comment #1) > /dev/sdb1 on /backup type btrfs > (rw,noatime,seclabel,compress=lzo,space_cache,subvolid=257,subvol=/backup) This looks reasonable. (In reply to Zdenek Pytela from comment #2) > Folks, some versions of selinux-policy had as a result complete relabeling+ > of all filesystems. This is tracked in bz#1832327. > > It looks though in case of btrfs it can take up to 2 days which is too much > time. Is this a know issue? Is it expected? Can there be such a big > performance penalty? I'm using Btrfs and haven't seen it, so it must be an edge case. Needs more information and a reproducer. I'd suggest while it's happening do sysrq+t and then we can ask a btrfs kernel developer what's going on. Is restorecon using sync() at all while this is happening? That'd slow any file system down, but might slow btrfs down more. Once there's more information, we'll know if component should be set to kernel and assign fedora-kernel-btrfs, or set to policycoreutils (restorecon).
(In reply to Steve Alexander from comment #0) > When it encountered a backuo BTRFS (compressed) > filesystem containing ~400GB of small backup files, the performance of > restorecon was abysmal. How many files? ls -R -1 | wc -l Btrfs background info: the selinux label associated with a file is stored in the same leaf as its filename, unix permissions, owner, timestamp, etc. There's no special extra thing that needs to be read or written to jus for selinux labels. Getting file info means reading a 16KiB node (likely cached) and a 16KiB leaf that contains metadata on ~40 files. I expect to get the same read, modify, write behavior from chmod, chown, and restorecon on btrfs. Does the slowness happen on reads, modify, or writes? Does the slowness happen when doing a chmod or chown, as with restorecon?
More Btrfs background info: With the compression flag set, this results in many EXTENT_DATA items per file. The file is logically broken up into 128KiB extents, which is the compression size on Btrfs. A file that's 1MiB uncompressed will normally have one extent and thus one EXTENT_DATA item in a leaf; if compression is enabled this becomes up to 8 EXTENT_DATA items, even if those extents are contiguous on disk. Therefore many leaves will only contain EXTENT_DATA items. All these leaves need to be read because any of them could contain the item we're looking for, XATTR_ITEM, for the selinux label. There is a numerically significant effect of compression on metadata read amplification. For writes, this won't happen too badly because only the changed leaves are modified and then written, and due to COW they are collected into a contiguous write But I don't know whether this read amplification on Btrfs alone can explain the many orders of magnitude longer time of restorecon. I have some compressed and some not compressed Btrfs file systems and don't see this effect. One single spinning disk system I have, I never run restorecon or chcon on, but I just recently did a recursive chmod and chown on about 500K files and it took about 30 seconds (each command), on spinning rust. I'm still wondering how chmod and chown compare to both restorecon and chcon, on the same file system (on the same directory even).
# cat /etc/fstab UUID= btrfs noatime,discard=async,compress=zstd:1,subvol=root 0 0 Thus root subvolume is mounted at /, and root/var is the identical directory as /var. # btrfs sub create test # cp -a --reflink root/var/ test/ # ls -R -1 test/ | wc -l 114205 Therefore test/ is an independent file tree, but has the same contents as root/var. # time chown -R chris: test/ real 0m0.482s user 0m0.025s sys 0m0.424s # time chown -R root: test/ real 0m0.495s user 0m0.034s sys 0m0.436s # time chcon -R 'system_u:object_r:virt_content_t:s0' test/ real 0m1.142s user 0m0.094s sys 0m1.037s # time restorecon -R -v test/ real 0m2.552s user 0m2.253s sys 0m0.290s # time restorecon -R -v root/var/ real 0m2.482s user 0m2.199s sys 0m0.276s # time restorecon -R -v /var real 0m38.370s user 0m37.727s sys 0m0.548s # restorecon's behavior changes quite a lot just because it's specifically /var even though the contents are the same as root/var and test/ and it never uses less than 91% of CPU but never more than 100%. So it's single threaded and busy. There are no out of the ordinary btrfs kernel threads. fatrace difference is confusing. It shows twice as many events (lines) for restorecon test/ as for /var. And yet the latency for /var is 15x that of test/ perf top -p pointed to the restorecon /var PID: Samples Overhead Shared Object Symbol 48.89% libpcre2-8.so.0.9.0 [.] pcre2_match_8 14.47% libpcre2-8.so.0.9.0 [.] 0x00000000000683d8 13.16% libpcre2-8.so.0.9.0 [.] 0x00000000000683d4 3.89% libpthread-2.31.so [.] __pthread_mutex_unlock_usercnt 3.86% libpthread-2.31.so [.] __pthread_mutex_lock 2.87% libpcre2-8.so.0.9.0 [.] 0x00000000000683da 1.33% libselinux.so.1 [.] 0x0000000000011a0c *shrug* strace output to file for both test/ and /var $ cat strace-var.txt | wc -l 249644 $ grep lgetxattr strace-var.txt | wc -l 106340 $ grep lstat strace-var.txt | wc -l 106370 $ cat strace-test.txt | wc -l 143297 $ grep lgetxattr strace-test.txt | wc -l 1 $ grep lstat strace-test.txt | wc -l 106363 Looks like lgetxattr() is super expensive compared to lstat? But why only on btrfs? Or is it also expensive on ext4/XFS but somehow is optimized for them better?
I did three clean installs of Fedora-Workstation-Live-x86_64-32-1.6.iso as follows: btrfs defaults time restorecon -R -v /usr real 1m6.380s user 0m59.957s sys 0m2.615s btrfs compress=zstd:1,space_cache=v2 time restorecon -R -v /usr real 1m4.654s user 0m59.880s sys 0m2.396s default (LVM/ext4) time restorecon -R -v /usr real 1m8.278s user 1m0.691s sys 0m2.280s At the moment, I'm not convinced that this bug is Btrfs related at all.
Adding selinux-kernel alias, maybe there are some ideas on lgetxattr() being so expensive.
I just ran into this on a different system, and it's related to c2. So other than lgetxattr() generally being expensive, in the btrfs case it's possible to have snapshots (possibly many of them, as it was in my case) and somehow these end up being scanned by restorecon. And that's just going to take forever, because each snapshot will be seen as a unique file tree. So for the original poster, does this btrfs backup contain many snapshots in a visible search path?
(In reply to Chris Murphy from comment #6) > restorecon's behavior changes quite a lot just because it's specifically > /var even though the contents are the same as root/var and test/ and it > never uses less than 91% of CPU but never more than 100%. So it's single > threaded and busy. There are no out of the ordinary btrfs kernel threads. > > fatrace difference is confusing. It shows twice as many events (lines) for > restorecon test/ as for /var. And yet the latency for /var is 15x that of > test/ I'm pretty sure the difference between test/ vs. /var is the amount of .fc lines (i.e. regular expressions) applicable for each path. Notice in the time output that most of the time is spent in userspace, not the kernel. The output of perf top confirms that most of the time is spent matching regexes. When you only look at the "sys" field (i.e. the kernel overhead), there is not a significant difference (and those numbers are quite small).
(In reply to Ondrej Mosnacek 🌽 from comment #10) > Notice in the > time output that most of the time is spent in userspace, not the kernel. The > output of perf top confirms that most of the time is spent matching regexes. Thanks for addressing this so clearly. In that case, what do you think the optimization opportunity is? It sounds to me the biggest opportunity is, at least with dnf updates, having restorecon being applied only to system root directories: /boot /etc /usr /var It's just not tenable to run restorecon automatically by default on /mnt /media /run /srv or even /home because of the potential for either large or many file systems (including containers and/or snapshots). I don't necessarily want those areas labeled anyway, and it shouldn't require an opt out.
And as for /var - it should exclude /var/lib/docker/ (and equivalents). Whether the strategy for /var is an exclude or include list, I don't have a preference.
/var/lib/containers/storage/* should be eliminated.
This message is a reminder that Fedora 32 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '32'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 32 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.