Bug 1832327 - restorecon stuck in dnf updating to selinux-policy-targeted-3.14.5-38
Summary: restorecon stuck in dnf updating to selinux-policy-targeted-3.14.5-38
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: policycoreutils
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Petr Lautrbach
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1841430 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-06 14:32 UTC by James
Modified: 2022-12-13 15:14 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-13 15:14:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description James 2020-05-06 14:32:22 UTC
Description of problem:
When updating to selinux-policy-targeted-3.14.5-38, during the scriplets restorecon seems to get stuck in an infinite loop at 100% CPU. I let one run for 10 minutes before manually killing restorecon, whereupon dnf completed. I then rebooted and did an autorelabel which worked fine.

Version-Release number of selected component (if applicable):
policycoreutils-3.0-2.fc32.x86_64

How reproducible:
Observed on 3 independent F32 machines doing this update.

Actual results:
Update gets stuck.

Expected results:
restorecon terminates without error.

Additional info:
Never seen this before with a selinux-policy update.

Comment 1 Petr Lautrbach 2020-05-06 15:19:52 UTC
I have tried to reproduce it on our f32 images and update from selinux-policy-targeted 3.14.5-31.fc32 to 3.14.5-38.fc32.noarch went well.

Do you have /var/log/dnf.log /var/log/dnf.rpm.log? Could you share that? Also the process list when it's stuck would be helpful.

Comment 2 James 2020-05-06 15:39:23 UTC
It may be that it's taking an incredibly long time -- I normally only a pause for a minute or two in semodule. On another box it did eventually finish but took quite a while, longer than say a relabel would for that machine. If I reinstall selinux-policy-targeted, then restorecon is not around long enough to register on top.

I'll find another box to update and see what info I can pull.

Comment 3 Villy Kruse 2020-05-07 18:51:21 UTC
It is triggered by the contents of the file file_contexts; the line 

/sbin/arping   --      system_u:object_r:netutils_exec_t:s0

changed to

/s?bin/arping  --      system_u:object_r:netutils_exec_t:s0

Through the chain of sed commands the result is that the restorecon would run for the entire system.

/s?bin/arping ==> /{s,}bin/arping ==> /*

Comment 4 Petr Lautrbach 2020-05-12 06:48:25 UTC
Thanks for the investigation. We need to improve the logic in fixfiles. In the mean time and as a workaround it would make sense to use 2 lines for arping:

/bin/arping   --      system_u:object_r:netutils_exec_t:s0
/sbin/arping   --      system_u:object_r:netutils_exec_t:s0

Comment 5 Villy Kruse 2020-05-12 09:39:48 UTC
(In reply to Petr Lautrbach from comment #4)
> Thanks for the investigation. We need to improve the logic in fixfiles. In
> the mean time and as a workaround it would make sense to use 2 lines for
> arping:
> 
> /bin/arping   --      system_u:object_r:netutils_exec_t:s0
> /sbin/arping   --      system_u:object_r:netutils_exec_t:s0

Actually, you should leave the line alone.

If you touch it I would expect to run the same issue again, this time by removing the /s?bin/arping line.  Only lines which differs between the old and the new file are ever considered.

As it is, the real file on Fedora is /usr/bin/arping, so the change was quite unnecessary on Fedora.  Perhaps not on RHEL or other systems with SELinux.

Comment 6 Petr Lautrbach 2020-05-12 10:08:32 UTC
You are right, it doesn't make sense to trigger restorecon with /* again.

Comment 7 Petr Lautrbach 2020-05-12 10:24:08 UTC
Actually, the original version was '/sbin/arping' and new version would just add '/bin/arping' or vice versa. It means that for users who haven't updated yet it would not trigger the full relabel so it makes sense to change it in order to prevent this issue for others.

Comment 8 Zdenek Pytela 2020-05-12 10:31:33 UTC
Doing a change in the path specification (any kind of) will affect all users with selinux-policy-3.14.5-38.fc32 installed.
Not doing a change will affect everybody else.

Comment 9 James 2020-05-13 09:52:17 UTC
[Dummy comment to clear NEEDINFO as this has been diagnosed.]

Comment 10 Villy Kruse 2020-05-13 10:23:38 UTC
(In reply to Villy Kruse from comment #3)
> It is triggered by the contents of the file file_contexts; the line 
> 
> /sbin/arping   --      system_u:object_r:netutils_exec_t:s0
> 
> changed to
> 
> /s?bin/arping  --      system_u:object_r:netutils_exec_t:s0
> 
> Through the chain of sed commands the result is that the restorecon would
> run for the entire system.
> 
> /s?bin/arping ==> /{s,}bin/arping ==> /*

As the restorecon understand the /{s,}bin/arping pattern it migh help to skip the last translation step.

That is:

Remove the line

       -e 's|\{.*|*|g' \

in the fixfiles program.

This is not perfect as there are some nasty regex patterns found in the file_contexts file.

It would be nice if the restorecon program would be taught to understand rexex expressions as found in the file_contexts file.  Then it would not be necessary to convert the regex patterns to glob patterns in fixfiles.

By the way.  The man page for restorecon doesn't mention the the glob patterns.

Comment 11 Zdenek Pytela 2020-05-29 07:15:04 UTC
*** Bug 1841430 has been marked as a duplicate of this bug. ***

Comment 12 Christian Klomp 2020-06-05 21:44:07 UTC
Not sure if it is related, but I just upgraded to selinux-policy-targeted-3.14.4-52.fc31.noarch and it made the dnf upgrade hang.
In my case it looks like it was due to btrfs snapshots at /.btrfs/ because the file system went ENOSPC and restorecon went into a loop relabelling the files from the snapshots.
I strongly believe excessive labelling caused the out of space issue because there should've been plenty space free, though I cannot see how relabelling ~20 rootfs snapshots could lead to an increase of ~100GiB of used space (or isn't it just metadata blocks getting de-cowed).

Perhaps paths can be excluded (apparently /etc/selinux/fixfiles_exclude_dirs is a thing for fixfiles though I have not found an equivalent for restorecon) and if so it could be considered user error, but such a problem had not happened in the 10-ish years this install is running.

Comment 13 nicolas.vieville 2020-06-12 07:32:20 UTC
Hello,

Seems that Fedora 31 is also impacted. 
DNF is stuck for the past 2 hours with regular access to the hard drive (yes, 
old laptops doesn't necessary own a SSD and even when they are 10 years old
work flawlessly with mechanical hard disk drives and Fedora 31) in scriptlet 
from selinux-policy-targeted-3.14.4-53.fc31.noarch in this command :

/sbin/restorecon -e /sys -e /proc -e /dev -e /run -e /mnt -e /var/tmp -e /var/lib/BackupPC -e /home -e /tmp -e /dev -i -R -f -

I am wondering if you could suggest a manual solution so that we can wait for 
the final fix in future selinux packages without having to waste too much time 
waiting for each update (maintenance of more than 50 workstations for example).

I remain available for any questions on this subject.

Cordially,


-- 
NVieville

Comment 14 Chris Murphy 2020-06-14 01:21:49 UTC
I'm still seeing this today. dnf is hung up on selinux-policy-3.14.5-40.fc32.noarch and a strace of restorecon shows it's scanning totally unrelated volumes mounted in /srv - which as it happens to contain a bunch of git repositories and btrfs snapshots, it's scanning ~60TB of data which is a bit aggravating.

The manual solution is to make certain you don't have any extra things mounted when the dnf update runs.

Comment 15 Christian Klomp 2020-06-14 09:54:00 UTC
Installing selinux-policy-targeted-3.14.4-53.fc31.noarch again took a long time, but luckily this time it didn't blow up my rootfs (due to btrfs snapshots?).
The process is also scanning/relabelling /media, which contains TBs of data, which to me seems quite unnecessary to do during dnf upgrades (though I'm not sure whether it actually scans all the files there, from a cursory peek at strace it seems quicker than the rootfs).
It appears not possible to exclude these volumes via the fixfiles_exclude_dirs method (which does work for volumes mounted elsewhere, maybe I need to exclude /media).


# /etc/selinux/fixfiles_exclude_dirs
/.btrfs             # rootfs
/boot/.btrfs        # rootfs
/home/.btrfs
/media/data/.btrfs

# restorecon command during upgrade
/sbin/restorecon -e /.btrfs -e /boot/.btrfs -e /home/.btrfs -e /sys -e /proc -e /dev -e /run -e /mnt -e /var/tmp -e /home -e /tmp -e /dev -i -R -f -


Besides lack of control over the process, it probably is not ideal to perform such lengthy tasks during dnf upgrades without any indication of what is going on.

Comment 16 Villy Kruse 2020-06-14 13:31:48 UTC
(In reply to Christian Klomp from comment #15)

> ...

> It appears not possible to exclude these volumes via the
> fixfiles_exclude_dirs method (which does work for volumes mounted elsewhere,
> maybe I need to exclude /media).

/media used to contain mount points created by udisks.  Now udisks is using the /run/media directory instead,
so /media should only contain your own files and mount points.

Excluding /media seems to be the right thing to do.

> 
> 
> # /etc/selinux/fixfiles_exclude_dirs
> /.btrfs             # rootfs
> /boot/.btrfs        # rootfs
> /home/.btrfs
> /media/data/.btrfs
> 

I suggest you create a new bug especially for the .btrfs issues.

Comment 17 Robert O'Callahan 2020-06-17 23:58:45 UTC
I am getting this upgrading to selinux-policy-targeted-3.14.4-53.fc31.noarch.

Looks like it's grinding through docker subvolumes under "/var/lib/docker/btrfs/subvolumes".

Comment 18 Robert O'Callahan 2020-06-18 00:11:12 UTC
From past experience this will take half an hour or more. If there is no way to automatically avoid all this relabeling work, at least please make setfiles multithreaded so upgrades don't take as long.

Comment 19 Chris Murphy 2020-06-18 03:07:29 UTC
I think whether btrfs subvolumes, reflinks on btrfs or xfs (via overlayfs), or ext4 copyup - the number of inodes that need to be scanned by restorecon is the same. I don't think it's file system specific unless there's some optimization happening to skip over certain directories?

The summary of this comment: doing a restorecon on identical content on ext4/lvm and btrfs, no difference in performance.
https://bugzilla.redhat.com/show_bug.cgi?id=1836756#c7

The summary of this comment: lgetxattr() used by restorecon is very expensive (order of magnitude) compared to doing chown, chmod, chcon. It's suspiciously bad.
https://bugzilla.redhat.com/show_bug.cgi?id=1836756#c6

Comment 20 James 2021-03-10 01:14:23 UTC
I've not seen anything like this in a while... as the original reporter, is everyone OK for it to be closed as fixed?

Comment 21 Chris Murphy 2021-03-10 02:09:57 UTC
Seems to be fixed. And also I notice there's an -x option to keep it on one file system. At least the rsync -x option of the same description will not cross subvolume boundaries (whether bind mounted or not) since subvolumes get reported by stat as being separate devices (I think this is st_dev).

Comment 22 Petr Lautrbach 2021-03-10 08:32:52 UTC
Unfortunately it's not fixed, it just doesn't appear. `fixfiles -C ...` would this cause full relabel with a change described in https://bugzilla.redhat.com/show_bug.cgi?id=1832327#c3 
And selinux-policy.spec still uses `fixfiles -C`

Comment 23 Ben Cotton 2021-08-10 13:39:15 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 24 Ben Cotton 2022-11-29 16:48:29 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 25 Ben Cotton 2022-12-13 15:14:57 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.