1660546 – Can we optimize setting the SELinux login context?

Bug 1660546 - Can we optimize setting the SELinux login context?

Summary: Can we optimize setting the SELinux login context?

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	libsemanage
Sub Component:
Version:	31
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Petr Lautrbach
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1644919 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-18 15:25 UTC by Jakub Hrozek
Modified:	2020-11-24 18:23 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-11-24 18:23:40 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
a program that illustrates what sssd does (5.38 KB, text/x-csrc) 2018-12-18 15:26 UTC, Jakub Hrozek	no flags	Details
View All

Description Jakub Hrozek 2018-12-18 15:25:30 UTC

Description of problem:
This is not a bug report in the traditional sense, but a bugzilla was the best way I could think of to track the issue.

As you may know, sssd has the ability to read SELinux login context from an IPA server and set it. To that end, SSSD uses libsemanage calls. And as we've seen e.g. in https://bugzilla.redhat.com/show_bug.cgi?id=1654537#c9 setting the login context can be an expensive operation.

What I'm trying to find is whether we can either optimize setting the SELinux label from the SSSD side (e.g. by inspecting if the label needs to be set at all) or if libsemanage can be made faster. I'm going to attach a C program that contains the same operations sssd does.

On my test VM I can see that setting the label takes between 3-4 seconds, but in 
#1654537 Adam observed even 25+ second time outs. Looking at ltrace, much of the time is spent in semanage_begin_transaction and semanage_commit. Also strace shows that there are many files copied and renamed

ltrace output:
ltrace -c ./a.out admin unconfined_u s0-s0:c0.c1023
libsemanage.add_user: user override not in password file
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 68.68    2.510231     2510231         1 semanage_commit
 24.94    0.911523      911523         1 semanage_begin_transaction
  6.17    0.225404      225404         1 semanage_seuser_modify_local
  0.11    0.003856        3856         1 exit_group
  0.03    0.001217        1217         1 semanage_handle_create
  0.02    0.000883         883         1 semanage_seuser_key_free
  0.02    0.000605         605         1 semanage_seuser_exists
  0.01    0.000237         237         1 semanage_is_managed
  0.00    0.000182         182         1 semanage_access_check
  0.00    0.000117         117         1 semanage_connect
  0.00    0.000092          92         1 semanage_seuser_create
  0.00    0.000092          92         1 semanage_seuser_key_create
  0.00    0.000090          90         1 is_selinux_enabled
  0.00    0.000070          70         1 semanage_seuser_free
  0.00    0.000070          70         1 semanage_seuser_set_name
  0.00    0.000068          68         1 semanage_seuser_set_sename
  0.00    0.000066          66         1 semanage_seuser_set_mlsrange
  0.00    0.000058          58         1 semanage_disconnect
  0.00    0.000055          55         1 semanage_is_connected
  0.00    0.000047          47         1 semanage_handle_destroy
------ ----------- ----------- --------- --------------------
100.00    3.654963                    20 total

About optimizations -- what we did before was that we only called the sss_set_seuser() function is the user had the login context different from the default. But this was breaking the case when a user has a non-standard homedir as semanage had no idea that certain files are to be labeled as home_t. I was wondering if maybe we could check if the homedir labels are already set for this user's homedir and if yes, avoid calling semanage? Is there even an API that would allow that? Is that a good idea?

Or are there any obvious issues with what we are doing?

Version-Release number of selected component (if applicable):
libsemanage-2.8-2.fc28.x86_64

How reproducible:
depends on the disk speed I guess

Steps to Reproduce:
1. set the
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jakub Hrozek 2018-12-18 15:26:18 UTC

Created attachment 1515374 [details]
a program that illustrates what sssd does

Comment 2 Adam Williamson 2018-12-18 16:16:20 UTC

"On my test VM I can see that setting the label takes between 3-4 seconds, but in 
#1654537 Adam observed even 25+ second time outs."

It bounces around a bit and may be related to how busy the box running the test is (it can be running 10 VMs at once...or 30, with the new box) and whether the Rawhide kernel has debugging enabled at the time, I think. I just noticed yesterday it seems to be taking over 60 seconds on aarch64 at least sometimes, as the failure is still showing up on aarch64 even though I gave the test a 60 second timeout now.

Comment 3 Ben Cotton 2019-08-13 17:05:26 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 4 Ben Cotton 2019-08-13 19:25:24 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 5 Ondrej Mosnacek 2020-07-10 07:29:22 UTC

So far, I wasn't able to reproduce the super-slow times... I guess the disk must be really slow on the OpenQA aarch64 machines :/

The OpenQA links in BZ 1644919 are dead and I couldn't find any aarch64 runs in OpenQA... Is there some way I could get more information about the machine(s) where the operation was particularly slow?

As for the technical side of speeding it up - I don't see any easy way to avoid copying all the files. It is needed to allow committing the transactions atomically by just doing one rename(). When I analyzed the transaction functions using perf, I noticed that also policydb_read()/_destroy() functions take up a significant part of the time (but this was on a fast x86_64 machine), so you *might* get some speedup with the 3.1 SELinux userspace, which should be released and imprted to Rawhide soon.

If the copying/removing files turns out to be the main bottleneck, I guess we can try some tricks with hardlinks, but I don't know yet how effective/feasible/complex that would be...

Comment 6 Ben Cotton 2020-11-03 15:06:25 UTC

This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Adam Williamson 2020-11-03 22:04:17 UTC

*** Bug 1644919 has been marked as a duplicate of this bug. ***

Comment 8 Adam Williamson 2020-11-03 22:10:08 UTC

So, update on openQA status here: I just checked the past test results and it looks like this got a lot faster in February and March. It last outright *failed*, best as I can tell, around Feb 20, and we last got a soft failure because login took more than 30 seconds on March 21. Since then I don't think that's happened.

I don't know what changed to speed it up, but something has, anyway. It's not a going concern for me any more. I'll leave it up to you folks to decide whether to close the bug.

Comment 9 Ondrej Mosnacek 2020-11-04 09:10:44 UTC

(In reply to Adam Williamson from comment #8)
> So, update on openQA status here: I just checked the past test results and
> it looks like this got a lot faster in February and March. It last outright
> *failed*, best as I can tell, around Feb 20, and we last got a soft failure
> because login took more than 30 seconds on March 21. Since then I don't
> think that's happened.

That's good to hear :)

> I don't know what changed to speed it up, but something has, anyway. It's
> not a going concern for me any more. I'll leave it up to you folks to decide
> whether to close the bug.

Out of curiosity, what Fedora version were those tests done on? Nightly Rawhide from that time? I'd like to see if there were any changes in our packages in that time that might have done it.

Comment 10 Adam Williamson 2020-11-04 16:43:39 UTC

Yeah, I was looking at Rawhide nightlies. I had a look at the compose reports but didn't see anything terribly obvious.

Comment 11 Ben Cotton 2020-11-24 18:23:40 UTC

Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.