Description of problem: This is not a bug report in the traditional sense, but a bugzilla was the best way I could think of to track the issue. As you may know, sssd has the ability to read SELinux login context from an IPA server and set it. To that end, SSSD uses libsemanage calls. And as we've seen e.g. in https://bugzilla.redhat.com/show_bug.cgi?id=1654537#c9 setting the login context can be an expensive operation. What I'm trying to find is whether we can either optimize setting the SELinux label from the SSSD side (e.g. by inspecting if the label needs to be set at all) or if libsemanage can be made faster. I'm going to attach a C program that contains the same operations sssd does. On my test VM I can see that setting the label takes between 3-4 seconds, but in #1654537 Adam observed even 25+ second time outs. Looking at ltrace, much of the time is spent in semanage_begin_transaction and semanage_commit. Also strace shows that there are many files copied and renamed ltrace output: ltrace -c ./a.out admin unconfined_u s0-s0:c0.c1023 libsemanage.add_user: user override not in password file % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 68.68 2.510231 2510231 1 semanage_commit 24.94 0.911523 911523 1 semanage_begin_transaction 6.17 0.225404 225404 1 semanage_seuser_modify_local 0.11 0.003856 3856 1 exit_group 0.03 0.001217 1217 1 semanage_handle_create 0.02 0.000883 883 1 semanage_seuser_key_free 0.02 0.000605 605 1 semanage_seuser_exists 0.01 0.000237 237 1 semanage_is_managed 0.00 0.000182 182 1 semanage_access_check 0.00 0.000117 117 1 semanage_connect 0.00 0.000092 92 1 semanage_seuser_create 0.00 0.000092 92 1 semanage_seuser_key_create 0.00 0.000090 90 1 is_selinux_enabled 0.00 0.000070 70 1 semanage_seuser_free 0.00 0.000070 70 1 semanage_seuser_set_name 0.00 0.000068 68 1 semanage_seuser_set_sename 0.00 0.000066 66 1 semanage_seuser_set_mlsrange 0.00 0.000058 58 1 semanage_disconnect 0.00 0.000055 55 1 semanage_is_connected 0.00 0.000047 47 1 semanage_handle_destroy ------ ----------- ----------- --------- -------------------- 100.00 3.654963 20 total About optimizations -- what we did before was that we only called the sss_set_seuser() function is the user had the login context different from the default. But this was breaking the case when a user has a non-standard homedir as semanage had no idea that certain files are to be labeled as home_t. I was wondering if maybe we could check if the homedir labels are already set for this user's homedir and if yes, avoid calling semanage? Is there even an API that would allow that? Is that a good idea? Or are there any obvious issues with what we are doing? Version-Release number of selected component (if applicable): libsemanage-2.8-2.fc28.x86_64 How reproducible: depends on the disk speed I guess Steps to Reproduce: 1. set the 2. 3. Actual results: Expected results: Additional info:
Created attachment 1515374 [details] a program that illustrates what sssd does
"On my test VM I can see that setting the label takes between 3-4 seconds, but in #1654537 Adam observed even 25+ second time outs." It bounces around a bit and may be related to how busy the box running the test is (it can be running 10 VMs at once...or 30, with the new box) and whether the Rawhide kernel has debugging enabled at the time, I think. I just noticed yesterday it seems to be taking over 60 seconds on aarch64 at least sometimes, as the failure is still showing up on aarch64 even though I gave the test a 60 second timeout now.
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'.
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to 31.
So far, I wasn't able to reproduce the super-slow times... I guess the disk must be really slow on the OpenQA aarch64 machines :/ The OpenQA links in BZ 1644919 are dead and I couldn't find any aarch64 runs in OpenQA... Is there some way I could get more information about the machine(s) where the operation was particularly slow? As for the technical side of speeding it up - I don't see any easy way to avoid copying all the files. It is needed to allow committing the transactions atomically by just doing one rename(). When I analyzed the transaction functions using perf, I noticed that also policydb_read()/_destroy() functions take up a significant part of the time (but this was on a fast x86_64 machine), so you *might* get some speedup with the 3.1 SELinux userspace, which should be released and imprted to Rawhide soon. If the copying/removing files turns out to be the main bottleneck, I guess we can try some tricks with hardlinks, but I don't know yet how effective/feasible/complex that would be...
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
*** Bug 1644919 has been marked as a duplicate of this bug. ***
So, update on openQA status here: I just checked the past test results and it looks like this got a lot faster in February and March. It last outright *failed*, best as I can tell, around Feb 20, and we last got a soft failure because login took more than 30 seconds on March 21. Since then I don't think that's happened. I don't know what changed to speed it up, but something has, anyway. It's not a going concern for me any more. I'll leave it up to you folks to decide whether to close the bug.
(In reply to Adam Williamson from comment #8) > So, update on openQA status here: I just checked the past test results and > it looks like this got a lot faster in February and March. It last outright > *failed*, best as I can tell, around Feb 20, and we last got a soft failure > because login took more than 30 seconds on March 21. Since then I don't > think that's happened. That's good to hear :) > I don't know what changed to speed it up, but something has, anyway. It's > not a going concern for me any more. I'll leave it up to you folks to decide > whether to close the bug. Out of curiosity, what Fedora version were those tests done on? Nightly Rawhide from that time? I'd like to see if there were any changes in our packages in that time that might have done it.
Yeah, I was looking at Rawhide nightlies. I had a look at the compose reports but didn't see anything terribly obvious.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.