RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2149091 - Update to sssd-2.7.3-4.el8_7.1.x86_64 resulted in "Request to sssd failed. Device or resource busy"
Summary: Update to sssd-2.7.3-4.el8_7.1.x86_64 resulted in "Request to sssd failed. D...
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2023-02-13
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.7
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Alexey Tikhonov
QA Contact: Anuj Borah
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks: 2172240
TreeView+ depends on / blocked
 
Reported: 2022-11-28 19:30 UTC by Danish Shaikh
Modified: 2023-05-16 11:12 UTC (History)
16 users (show)

Fixed In Version: sssd-2.8.2-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2172240 (view as bug list)
Environment:
Last Closed: 2023-05-16 09:08:11 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Debug logs (157.43 KB, text/plain)
2022-11-28 19:30 UTC, Danish Shaikh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd pull 6560 0 None open SSS_CLIENT: fix a number of issues 2023-02-08 18:48:15 UTC
Red Hat Issue Tracker RHELPLAN-140724 0 None None None 2022-11-28 19:34:25 UTC
Red Hat Issue Tracker SSSD-5551 0 None None None 2023-01-26 12:21:03 UTC
Red Hat Product Errata RHBA-2023:2986 0 None None None 2023-05-16 09:08:18 UTC

Description Danish Shaikh 2022-11-28 19:30:53 UTC
Created attachment 1928106 [details]
Debug logs

Created attachment 1928106 [details]
Debug logs

Description of problem:

>>> Authentication via application (dotnet) fails with the below error on sssd version sssd-2.7.3-4.el8_7.1.x86_64 

--------------------
Nov 22 09:54:51 prqaws1 dotnet[3141635]: pam_sss(pfmenu:auth): Request to sssd failed. Device or resource busy
--------------------

Workaround:

>>> Downgrading the rpm using yum has fixed the issue.

The error disappears on version sssd-2.6.2-4.el8_6.1.x86_64


Version-Release number of selected component (if applicable):
sssd-2.7.3-4.el8_7.1.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


From case: 03368328(closed)

>>> Authentication via application (dotnet) fails with the below error but ssh works fine

--------------------
Nov 22 09:54:51 prqaws1 dotnet[3141635]: pam_sss(pfmenu:auth): Request to sssd failed. Device or resource busy
--------------------

Workaround:

>>> Downgrading the rpm using yum has fixed the issue.

The error disappears on version sssd-2.6.2-4.el8_6.1.x86_64

Version-Release number of selected component (if applicable):
sssd-2.7.3-4.el8_7.1.x86_64


From case: 03410140 

>>  Authentication via application (mysqld)  fails with the below error but ssh works fine.

————
Jan 12 13:40:07 vdlcatssdbda03 mysqld: pam_sss(mysqld:account): Request to sssd failed. Device or resource busy
Jan 12 13:40:07 vdlcatssdbda03 mysqld: pam_unix(mysqld:auth): authentication failure; logname= uid=74421 euid=74421 tty= ruser=catsdv06usr rhost=10.152.42.21  user=catsdv06usr

…
Jan 13 15:55:31 vdlcatssdbda03 sshd[4011105]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.65.14.210 user=lh017882
Jan 13 15:55:32 vdlcatssdbda03 sshd[4011105]: Accepted password for lh017882 from 10.65.14.210 port 63379 ssh2
.
.
Jan 13 16:16:46 vdlcatssdbda03 mysqld: pam_unix(mysqld:auth): authentication failure; logname= uid=74421 euid=74421 tty= ruser=catsuat06usr rhost=10.152.74.27  user=catsuat06usr
———

From 03367178

Authentication is failing via application(httpd) if ssh works fine is not confirmed yet. I have asked the customer to confirm if the downgrade of sssd helps. 

----
Jan  3 16:42:05 1000u308081 httpd[988133]: pam_sss(nagios:auth): authentication success; logname= uid=48 euid=48 tty= ruser= rhost=172.20.81.205 user=aa-bgadjev
Jan  3 16:42:06 1000u308081 httpd[988133]: pam_sss(nagios:auth): authentication success; logname= uid=48 euid=48 tty= ruser= rhost=172.20.81.205 user=aa-bgadjev
Jan  3 16:42:06 1000u308081 httpd[988133]: pam_sss(nagios:auth): authentication success; logname= uid=48 euid=48 tty= ruser= rhost=172.20.81.205 user=aa-bgadjev
Jan  3 16:42:06 1000u308081 httpd[988133]: pam_sss(nagios:account): Request to sssd failed. Device or resource busy
----

Comment 17 christophe.viallon@telespazio.com 2023-01-25 17:34:12 UTC
Confirmed for me, exact same issue. In our case, this is happening under "high"(well very relative) authentication request rate using pam_sss to sssd authentication method (in fact Basic authentication of apache , redirected to pam , then sssd ...) . 
As already said, i confirm that it is solve by downgrading to sssd 2.6.2-4.  As of the 25 of january 2023, this bogue daesn't seem to be solve by any sssd update. Very annoying.
No problems seen with sudo, or ssh access. 
I have noted some other errors sometimes like "bad address" in place of "Device buzy" in /var/log/secure or "journalctl -u httpd".

I rised the debug_level to 10 in [pam] section of sssd and could get the following logs, that seems related to the already mentioned "pam_sss(pfmenu:auth): Request to sssd failed. Device or resource busy" in /var/log/secure or journalctl logs ...

My versions (Not working) : 
sssd version 2.7.3.4
sssd-pam idem (inclu dans sssd)
mod_authnz_pam 1.1.0.7
pam-1.3.1-22.el8.x86_64

My old (still working) versions :
sssd-2.6.2-4.el8_6.1.x86_64
pam-1.3.1-16.el8.x86_64


Logs (debug_level =10)  for sssd_pam responder: 

(2023-01-20 20:34:18): [pam] [ldb] (0x10000): [CID#61] Added timed event "ldb_kv_callback": 0x5638973115f0
(2023-01-20 20:34:18): [pam] [ldb] (0x10000): [CID#61] Added timed event "ldb_kv_timeout": 0x5638972f7710
(2023-01-20 20:34:18): [pam] [ldb] (0x10000): [CID#61] Running timer event 0x5638973115f0 "ldb_kv_callback"
(2023-01-20 20:34:18): [pam] [ldb] (0x10000): [CID#61] Destroying timer event 0x5638972f7710 "ldb_kv_timeout"
(2023-01-20 20:34:18): [pam] [ldb] (0x10000): [CID#61] Destroying timer event 0x5638973115f0 "ldb_kv_callback"
(2023-01-20 20:34:18): [pam] [filter_responses_env] (0x4000): [CID#61] Found PAM ENV filter for variable [KRB5CCNAME] and service [sudo].
(2023-01-20 20:34:18): [pam] [filter_responses_env] (0x4000): [CID#61] Found PAM ENV filter for variable [KRB5CCNAME] and service [sudo-i].
(2023-01-20 20:34:18): [pam] [pam_reply] (0x0200): [CID#61] blen: 67
(2023-01-20 20:34:18): [pam] [pam_reply] (0x0200): [CID#61] Returning [0]: Succès to the client
----
(2023-01-20 20:34:18): [pam] [client_recv] (0x0400): [CID#61] Failed to read request, aborting client!
----
(2023-01-20 20:34:18): [pam] [client_close_fn] (0x2000): [CID#61] Terminated client [0x563897338210][24]
(2023-01-20 20:34:18): [pam] [get_client_cred] (0x4000): Client [0x563897347050][24] creds: euid[48] egid[48] pid[10914] cmd_line['/usr/sbin/httpd'].
(2023-01-20 20:34:18): [pam] [setup_client_idle_timer] (0x4000): Idle timer re-set for client [0x563897347050][24]
(2023-01-20 20:34:18): [pam] [accept_fd_handler] (0x0400): [CID#62] Client [cmd /usr/sbin/httpd][uid 48][0x563897347050][24] connected!
(2023-01-20 20:34:18): [pam] [sss_cmd_get_version] (0x0200): [CID#62] Received client version [3].
(2023-01-20 20:34:18): [pam] [sss_cmd_get_version] (0x0200): [CID#62] Offered version [3].
(2023-01-20 20:34:18): [pam] [pam_cmd_preauth] (0x0100): [CID#62] entering pam_cmd_preauth
...

I hope that it helps

Comment 21 christophe.viallon@telespazio.com 2023-01-31 10:01:12 UTC
Hi , just for information we tried to downgrade sssd to last 2.6.4 version without success (more than 170 pacquets needed , and gives a lot of incompatibility problemes).
Any date for the fix ? We are stucked in our project developpment because of this bug that let pam/sssd use impossible!

Please rise it's priority if possible

Comment 22 Alexey Tikhonov 2023-01-31 14:27:31 UTC
(In reply to christophe.viallon from comment #21)
>
> Any date for the fix ?

At the moment I have no other ideas besides building "intermediate" versions of SSSD between 2.6.2-4.el8_6.1 and sssd-2.7.3-4.el8_7.1 in hope to catch a patch that introduced an issue.


> Please rise it's priority if possible

Would you be willing to test a number of test builds?

Comment 28 Alexey Tikhonov 2023-02-03 11:13:19 UTC
Might be alleviated by https://github.com/SSSD/sssd/pull/6560

Comment 29 christophe.viallon@telespazio.com 2023-02-06 06:41:45 UTC
(In reply to Alexey Tikhonov from comment #22)
> (In reply to christophe.viallon from comment #21)
> >
> > Any date for the fix ?
> 
> At the moment I have no other ideas besides building "intermediate" versions
> of SSSD between 2.6.2-4.el8_6.1 and sssd-2.7.3-4.el8_7.1 in hope to catch a
> patch that introduced an issue.
> 
> 
> > Please rise it's priority if possible
> 
> Would you be willing to test a number of test builds?

Sorry for this delay, probably quite late. I would like to help on this but i will be unable to find necessary time. 
I saw that you had some pregess it seems anyway.

Comment 32 Alexey Tikhonov 2023-02-08 15:17:13 UTC
Inspecting strace of `httpd`:

ID=1665912 starts multiple threads, including 1665914, 1665915, 1665916

* 1665914:
    ...
    22:13:05 stat("/var/lib/sss/pipes/pam", {st_mode=S_IFSOCK|0666, st_size=0, ...}) = 0
    22:13:05 getpid()                       = 1665871
    22:13:05 fstat(-1, 0x7f3797608420)      = -1 EBADF (Bad file descriptor)
    22:13:05 getpid()                       = 1665871
    22:13:05 socket(AF_UNIX, SOCK_STREAM, 0) = 18
    22:13:05 fcntl(18, F_GETFL)             = 0x2 (flags O_RDWR)
    22:13:05 fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    22:13:05 fcntl(18, F_GETFD)             = 0
    22:13:05 fcntl(18, F_SETFD, FD_CLOEXEC) = 0
    22:13:05 connect(18, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/pam"}, 110) = 0
    22:13:05 fstat(18, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
    22:13:05 poll([{fd=18, events=POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])
    22:13:05 sendto(18, "...", 16, MSG_NOSIGNAL, NULL, 0) = 16
    ...
    22:13:05 close(18)                      = 0                 <-- probably pam_end()

* 1665915: the same as in 1665914 and also at 22:13:05

* 1665916:
    ...
    22:13:05 stat("/var/lib/sss/pipes/pam", {st_mode=S_IFSOCK|0666, st_size=0, ...}) = 0
    22:13:05 getpid()                       = 1665871
    22:13:05 poll([{fd=18, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])   <-- sss_cli_check_socket()
    22:13:05 getsockopt(18, SOL_SOCKET, SO_PEERCRED, {pid=3600209, uid=0, gid=0}, [12]) = 0       <-- check_server_cred(), so this is sss_pam_make_request()
    22:13:05 poll([{fd=18, events=POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])
    22:13:05 sendto(18, "...", 16, MSG_NOSIGNAL, NULL, 0) = 16
    22:13:05 poll([{fd=18, events=POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])
    22:13:05 sendto(18, "..."..., 121, MSG_NOSIGNAL, NULL, 0) = 121
    22:13:05 poll([{fd=18, events=POLLIN}], 1, 300000) = 1 ([{fd=18, revents=POLLIN}])
    22:13:05 read(18, "...", 16) = 16
    22:13:05 poll([{fd=18, events=POLLIN}], 1, 300000) = 1 ([{fd=18, revents=POLLIN}])
    22:13:05 read(18, "..."..., 80) = 80
    ...
    22:13:05 stat("/var/lib/sss/pipes/pam", {st_mode=S_IFSOCK|0666, st_size=0, ...}) = 0
    22:13:05 getpid()                       = 1665871
    22:13:05 poll([{fd=18, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])   <-- sss_cli_check_socket()
    22:13:05 getsockopt(18, SOL_SOCKET, SO_PEERCRED, {pid=3600209, uid=0, gid=0}, [12]) = 0       <-- check_server_cred()
    22:13:05 poll([{fd=18, events=POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])
    22:13:05 sendto(18, "...", 16, MSG_NOSIGNAL, NULL, 0) = 16
    22:13:05 poll([{fd=18, events=POLLOUT}], 1, 300000) = 1 ([{fd=18, revents=POLLOUT}])
    22:13:05 sendto(18, "..."..., 88, MSG_NOSIGNAL, NULL, 0) = 88
    22:13:05 poll([{fd=18, events=POLLIN}], 1, 300000) = 1 ([{fd=18, revents=POLLNVAL}])          <-- POLLNVAL: Invalid request: fd not open
    22:13:05 futex(0x7f37946030c0, FUTEX_WAKE_PRIVATE, 1) = 0
    22:13:05 sendto(19, "<83>Feb  7 22:13:05 httpd: pam_s"..., 99, MSG_NOSIGNAL, NULL, 0) = 99
    22:13:05 socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) = 18

So id=1665916 clearly re-uses socket fd open by either 1665914 or 1665915, and then it got closed mid pam transaction.
Most probably bug was introduced in https://github.com/SSSD/sssd/commit/c7918bef57df57fa8c9226fadf7e29e272d1fbff , where locking sss_pam_close_fd() was replaced with lockless sss_cli_close_socket().
Besides, I think sss_client common code should *not* close socket in case of POLLNVAL (otherwise there is a risk to close "foreign" socket if fd was re-used by a different thread).

Comment 34 Alexey Tikhonov 2023-02-08 21:09:51 UTC
(In reply to Alexey Tikhonov from comment #32)
>
> Most probably bug was introduced in https://github.com/SSSD/sssd/commit/c7918bef57df57fa8c9226fadf7e29e272d1fbff ,
> where locking sss_pam_close_fd() was replaced with lockless sss_cli_close_socket().

JFTR: this ^^ commit didn't introduce a bug per se (but rather paved a road to).
At that point "lockfree client" feature was expected to be built unconditionally, every thread would always operate on its own socket and thus a lock before close() wasn't needed.
But later feature support was conditioned by specific functionality in glibc, and thus feature build was disabled on some platforms, including RHEL8. Restoration of the lock for this case was missed.

Comment 36 Alexey Tikhonov 2023-02-13 12:03:15 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/6560

* `master`
    * bf3f73ea0ee123fe4e7c4bdd2287ac5a5e6d9082 - PAM_SSS: close(sss_cli_sd) should also be protected with mutex. Otherwise a thread calling pam_end() can close socket mid pam transaction in another thread.
    * ef93284b5a1f196425d9a61e8e24de8972240eb3 - SSS_CLIENT: if poll() returns POLLNVAL then socket is alredy closed (or wasn't open) so it shouldn't be closed again. Otherwise there is a risk to close "foreign" socket opened in another thread.
    * 0b8638d8de435384562f17d041655887b73523cd - SSS_CLIENT: fix error codes returned by common read/write/check helpers.

* `sssd-2-8`
    * 1fd7a5ecb46a02a29ebf42039575b5344307bfbb - PAM_SSS: close(sss_cli_sd) should also be protected with mutex. Otherwise a thread calling pam_end() can close socket mid pam transaction in another thread.
    * a40b25a3af29706c058ce5a02dd0ba294dbb6874 - SSS_CLIENT: if poll() returns POLLNVAL then socket is alredy closed (or wasn't open) so it shouldn't be closed again. Otherwise there is a risk to close "foreign" socket opened in another thread.
    * f3333b9dbeda33a9344b458accaa4ff372adb660 - SSS_CLIENT: fix error codes returned by common read/write/check helpers.

Comment 51 errata-xmlrpc 2023-05-16 09:08:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2986


Note You need to log in before you can comment on or make changes to this bug.