Bug 2088481
Summary: | selinux_child: Cannot beign SELinux transaction | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Rob Crittenden <rcritten> |
Component: | sssd | Assignee: | Alexey Tikhonov <atikhono> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 34 | CC: | abokovoy, aboscatt, atikhono, jhrozek, lslebodn, luk.claes, mzidek, pbrezina, sbose, ssorce, sssd-maintainers |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | sync-to-jira | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-06-02 13:51:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rob Crittenden
2022-05-19 14:24:32 UTC
Failing function is `semanage_begin_transaction()`: ``` /* Attempt to obtain a transaction lock on the manager. If another * process has the lock then this function may block, depending upon * the timeout value in the handle. * * Note that if the semanage_handle has not yet obtained a transaction * lock whenever a writer function is called, there will be an * implicit call to this function. */ extern int semanage_begin_transaction(semanage_handle_t *); ``` From a quick glance I don't find an API to specify this timeout explicitly. But https://github.com/SELinuxProject/selinux/blob/0a8c177dacdc1df96ea11bb8aa75e16c4fa82285/libsemanage/src/handle.c#L101: ``` sh->timeout = SEMANAGE_COMMIT_READ_WAIT; ``` and https://github.com/SELinuxProject/selinux/blob/0a8c177dacdc1df96ea11bb8aa75e16c4fa82285/libsemanage/src/handle.c#L40: ``` https://github.com/SELinuxProject/selinux/blob/0a8c177dacdc1df96ea11bb8aa75e16c4fa82285/libsemanage/src/handle.c#L40 ``` This matches timestamps in the log: ``` * (2022-05-19 13:41:56): [selinux_child[11668]] [seuser_needs_update] (0x0400): The SELinux user does need an update * (2022-05-19 13:42:01): [selinux_child[11668]] [libsemanage] (0x0020): Could not get direct transaction lock at /var/lib/selinux/targeted/semanage.trans.LOCK. ``` I'm not sure a retry would help. If a retry would succeed immediately this would mean a bug in `libsemanage`, imo. Btw, Rob, if your goal is a load testing of a *server* then you could consider setting `selinux_provider = none` on the client. It will reduce amount of fetched data, but I think overall request rate will be increased. Thanks, Sumit suggested the same. I'm trying at first with vanilla client and server installs and I'll go from there. Modifying this on all clients is trivial with Ansible or other config management tools so users that don't do SELinux assignments would be fine with that, but I'm focusing on the defaults right now. Hi Rob, We discussed this BZ during our weekly call a couple of minutes ago and Sumit gave a perfect background overview about what is happening. It seems to be a collateral effect of the Load/Performance Test setup, which is something expected from this kind of situation. With that being said, I will go ahead and close it as NOTABUG, please re-open if you disagree or if there is any flaw we might find in a real case scenario that should be indeed addressed. Kindly JFTR: I disagree with the resolution. - limits aren't documented - there is no clear message neither to the user nor to the admin explaining what has happened - this creates artificial bottleneck and degrades performance limits overall for no good reason There is clearly a room for improvement. Perhaps scenario is not important enough to justify such improvement, so resolution could be "WONTFIX". But "NOTBUG" sounds like denial of any issue, and I think this is wrong. As discussed with Alexey in private, I am changing the resolution from NOTABUG to WONTFIX. Plans are to, at least, take a look into this during CY22Q4 or later (CY23Q1), to validate what can be done to improve the user experience (the load is low, and we have similar cases already opened in the past). Rob, if you have new information by then, please let us know, so we can take it into account the new results from the benchmark you are working on. Last but not least, thanks, Alexey, for pointing it out and explaining it differently and bringing another point of view, I do really appreciate it. Kindly It's up to you I guess but if enough people, and it doesn't require many, log into a sssd-managed system simultaneously it's very possible for one or more to fail because of contention on the SELinux lock. |