Bug 1836756 - F32 installation extremely slow
Summary: F32 installation extremely slow
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: policycoreutils
Version: 32
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Petr Lautrbach
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-18 04:30 UTC by Stephen J Alexander
Modified: 2021-05-25 17:21 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 17:21:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Stephen J Alexander 2020-05-18 04:30:29 UTC
Description of problem:
Fedora32 install took in excess of 48 hours to complete installation!

Version-Release number of selected component (if applicable):
F32 wORKSTATION UPGRADE

How reproducible:
Unknown

Steps to Reproduce:
1.  sudo dnf system-upgrade reboot
2.
3.

Actual results:
After reboot dnf updated several thousand packages rapidly, then stalled tremendously, showing 30% complete, on one of the SElinux packages

Expected results:
Reasonably rapid install or else an explanatory message.

Additional info:
One of the selinux* packages during installation launched a 'restorecon' across the files systems.  When it encountered a backuo BTRFS (compressed) filesystem containing ~400GB of small backup files, the performance of restorecon was abysmal.

I was able to observe the installation progress (via escape key).  I was able to login as root via a virtual terminal and observe the problem with strace.

Comment 1 Stephen J Alexander 2020-05-18 05:38:20 UTC
Correction, the time between reboots was 45.783 hours. from utmpdump /var/log/wtmp
>>  [2] [00000] [~~  ] [reboot  ] [~           ] [5.6.11-200.fc31.x86_64] [0.0.0.0        ] [2020-05-16T06:40:31,782810+00:00]
>>  [2] [00000] [~~  ] [reboot  ] [~           ] [5.6.12-300.fc32.x86_64] [0.0.0.0        ] [2020-05-18T04:37:44,783445+00:00]

df -h of problematic file system.
/dev/sdb1       699G  391G  306G  57% /backup
mount parameters
/dev/sdb1 on /backup type btrfs (rw,noatime,seclabel,compress=lzo,space_cache,subvolid=257,subvol=/backup)

Comment 2 Zdenek Pytela 2020-05-27 08:41:38 UTC
Switching the component to btrfs.

Folks, some versions of selinux-policy had as a result complete relabeling of all filesystems. This is tracked in bz#1832327.

It looks though in case of btrfs it can take up to 2 days which is too much time. Is this a know issue? Is it expected? Can there be such a big performance penalty?

Comment 3 Chris Murphy 2020-05-27 17:09:15 UTC
(In reply to Steve Alexander from comment #1)
> /dev/sdb1 on /backup type btrfs
> (rw,noatime,seclabel,compress=lzo,space_cache,subvolid=257,subvol=/backup)

This looks reasonable.

(In reply to Zdenek Pytela from comment #2)
> Folks, some versions of selinux-policy had as a result complete relabeling+
> of all filesystems. This is tracked in bz#1832327.
> 
> It looks though in case of btrfs it can take up to 2 days which is too much
> time. Is this a know issue? Is it expected? Can there be such a big
> performance penalty?

I'm using Btrfs and haven't seen it, so it must be an edge case. Needs more information and a reproducer. I'd suggest while it's happening do sysrq+t and then we can ask a btrfs kernel developer what's going on. Is restorecon using sync() at all while this is happening? That'd slow any file system down, but might slow btrfs down more.

Once there's more information, we'll know if component should be set to kernel and assign fedora-kernel-btrfs, or set to policycoreutils (restorecon).

Comment 4 Chris Murphy 2020-05-27 17:44:40 UTC
(In reply to Steve Alexander from comment #0)
> When it encountered a backuo BTRFS (compressed)
> filesystem containing ~400GB of small backup files, the performance of
> restorecon was abysmal.

How many files? ls -R -1 | wc -l

Btrfs background info: the selinux label associated with a file is stored in the same leaf as its filename, unix permissions, owner, timestamp, etc. There's no special extra thing that needs to be read or written to jus for selinux labels. Getting file info means reading a 16KiB node (likely cached) and a 16KiB leaf that contains metadata on ~40 files. I expect to get the same read, modify, write behavior from chmod, chown, and restorecon on btrfs.

Does the slowness happen on reads, modify, or writes? Does the slowness happen when doing a chmod or chown, as with restorecon?

Comment 5 Chris Murphy 2020-05-27 18:32:35 UTC
More Btrfs background info: With the compression flag set, this results in many EXTENT_DATA items per file. The file is logically broken up into 128KiB extents, which is the compression size on Btrfs. A file that's 1MiB uncompressed will normally have one extent and thus one EXTENT_DATA item in a leaf; if compression is enabled this becomes up to 8 EXTENT_DATA items, even if those extents are contiguous on disk. Therefore many leaves will only contain EXTENT_DATA items. All these leaves need to be read because any of them could contain the item we're looking for, XATTR_ITEM, for the selinux label. There is a numerically significant effect of compression on metadata read amplification. For writes, this won't happen too badly because only the changed leaves are modified and then written, and due to COW they are collected into a contiguous write

But I don't know whether this read amplification on Btrfs alone can explain the many orders of magnitude longer time of restorecon. I have some compressed and some not compressed Btrfs file systems and don't see this effect. One single spinning disk system I have, I never run restorecon or chcon on, but I just recently did a recursive chmod and chown on about 500K files and it took about 30 seconds (each command), on spinning rust.

I'm still wondering how chmod and chown compare to both restorecon and chcon, on the same file system (on the same directory even).

Comment 6 Chris Murphy 2020-05-27 22:00:16 UTC
# cat /etc/fstab
UUID=                    btrfs   noatime,discard=async,compress=zstd:1,subvol=root     0 0

Thus root subvolume is mounted at /, and root/var is the identical directory as /var.

# btrfs sub create test
# cp -a --reflink root/var/ test/
# ls -R -1 test/ | wc -l
114205

Therefore test/ is an independent file tree, but has the same contents as root/var.


# time chown -R chris: test/
real	0m0.482s
user	0m0.025s
sys	0m0.424s
# time chown -R root: test/
real	0m0.495s
user	0m0.034s
sys	0m0.436s
# time chcon -R 'system_u:object_r:virt_content_t:s0' test/
real	0m1.142s
user	0m0.094s
sys	0m1.037s
# time restorecon -R -v test/
real	0m2.552s
user	0m2.253s
sys	0m0.290s
# time restorecon -R -v root/var/
real	0m2.482s
user	0m2.199s
sys	0m0.276s
# time restorecon -R -v /var
real	0m38.370s
user	0m37.727s
sys	0m0.548s
# 


restorecon's behavior changes quite a lot just because it's specifically /var even though the contents are the same as root/var and test/ and it never uses less than 91% of CPU but never more than 100%. So it's single threaded and busy. There are no out of the ordinary btrfs kernel threads.

fatrace difference is confusing. It shows twice as many events (lines) for restorecon test/ as for /var. And yet the latency for /var is 15x that of test/

perf top -p pointed to the restorecon /var PID:
Samples
Overhead  Shared Object        Symbol
  48.89%  libpcre2-8.so.0.9.0  [.] pcre2_match_8
  14.47%  libpcre2-8.so.0.9.0  [.] 0x00000000000683d8
  13.16%  libpcre2-8.so.0.9.0  [.] 0x00000000000683d4
   3.89%  libpthread-2.31.so   [.] __pthread_mutex_unlock_usercnt
   3.86%  libpthread-2.31.so   [.] __pthread_mutex_lock
   2.87%  libpcre2-8.so.0.9.0  [.] 0x00000000000683da
   1.33%  libselinux.so.1      [.] 0x0000000000011a0c

*shrug*

strace output to file for both test/ and /var

$ cat strace-var.txt | wc -l
249644
$ grep lgetxattr strace-var.txt | wc -l
106340
$ grep lstat strace-var.txt | wc -l
106370

$ cat strace-test.txt | wc -l
143297
$ grep lgetxattr strace-test.txt | wc -l
1
$ grep lstat strace-test.txt | wc -l
106363


Looks like lgetxattr() is super expensive compared to lstat? But why only on btrfs? Or is it also expensive on ext4/XFS but somehow is optimized for them better?

Comment 7 Chris Murphy 2020-05-28 01:48:11 UTC
I did three clean installs of Fedora-Workstation-Live-x86_64-32-1.6.iso as follows:

btrfs defaults
time restorecon -R -v /usr
real	1m6.380s
user	0m59.957s
sys	0m2.615s

btrfs compress=zstd:1,space_cache=v2
time restorecon -R -v /usr
real	1m4.654s
user	0m59.880s
sys	0m2.396s

default (LVM/ext4)
time restorecon -R -v /usr
real	1m8.278s
user	1m0.691s
sys	0m2.280s

At the moment, I'm not convinced that this bug is Btrfs related at all.

Comment 8 Chris Murphy 2020-06-02 19:36:06 UTC
Adding selinux-kernel alias, maybe there are some ideas on lgetxattr() being so expensive.

Comment 9 Chris Murphy 2020-06-14 01:43:17 UTC
I just ran into this on a different system, and it's related to c2. So other than lgetxattr() generally being expensive, in the btrfs case it's possible to have snapshots (possibly many of them, as it was in my case) and somehow these end up being scanned by restorecon. And that's just going to take forever, because each snapshot will be seen as a unique file tree. So for the original poster, does this btrfs backup contain many snapshots in a visible search path?

Comment 10 Ondrej Mosnacek 2020-06-18 08:06:14 UTC
(In reply to Chris Murphy from comment #6)
> restorecon's behavior changes quite a lot just because it's specifically
> /var even though the contents are the same as root/var and test/ and it
> never uses less than 91% of CPU but never more than 100%. So it's single
> threaded and busy. There are no out of the ordinary btrfs kernel threads.
> 
> fatrace difference is confusing. It shows twice as many events (lines) for
> restorecon test/ as for /var. And yet the latency for /var is 15x that of
> test/

I'm pretty sure the difference between test/ vs. /var is the amount of .fc lines (i.e. regular expressions) applicable for each path. Notice in the time output that most of the time is spent in userspace, not the kernel. The output of perf top confirms that most of the time is spent matching regexes. When you only look at the "sys" field (i.e. the kernel overhead), there is not a significant difference (and those numbers are quite small).

Comment 11 Chris Murphy 2020-06-18 16:25:00 UTC
(In reply to Ondrej Mosnacek 🌽 from comment #10)
> Notice in the
> time output that most of the time is spent in userspace, not the kernel. The
> output of perf top confirms that most of the time is spent matching regexes.

Thanks for addressing this so clearly. In that case, what do you think the optimization opportunity is?

It sounds to me the biggest opportunity is, at least with dnf updates, having restorecon being applied only to system root directories:  /boot /etc /usr /var

It's just not tenable to run restorecon automatically by default on /mnt /media /run /srv or even /home because of the potential for either large or many file systems (including containers and/or snapshots). I don't necessarily want those areas labeled anyway, and it shouldn't require an opt out.

Comment 12 Chris Murphy 2020-06-18 16:29:40 UTC
And as for /var - it should exclude /var/lib/docker/ (and equivalents). Whether the strategy for /var is an exclude or include list, I don't have a preference.

Comment 13 Daniel Walsh 2020-06-19 12:59:47 UTC
/var/lib/containers/storage/* should be eliminated.

Comment 14 Fedora Program Management 2021-04-29 16:53:51 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Ben Cotton 2021-05-25 17:21:19 UTC
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.