1752730 – crash in _dbus_list_unlink() (sssd_nss segfault libdbus)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1752730 - crash in _dbus_list_unlink() (sssd_nss segfault libdbus)

Summary: crash in _dbus_list_unlink() (sssd_nss segfault libdbus)

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	8.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.0
Assignee:	SSSD Maintainers
QA Contact:	sssd-qe
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1734248 1783169 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-17 06:11 UTC by adam winberg
Modified:	2024-06-13 22:14 UTC (History)
CC List:	23 users (show)
Fixed In Version:	sssd-2.6.2-4.el8_6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-12 16:07:56 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	bthekkep: needinfo- pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	SSSD-4160	0	None	None	None	2022-02-03 13:21:50 UTC

Description adam winberg 2019-09-17 06:11:10 UTC

Description of problem:
I have a script that generates access configuration (pam_access config, sssd 'ad_access_filter' config, sudoers config) based on a list of AD groups in a file. The script uses 'getent' and 'id' to iterate users in the groups and create configuration based on certain criteria. 

This script is run every hour. On several occasions the running of this script has caused sssd_nss to segfault:

Sep 16 17:45:10 lxserv2015 kernel: sssd_nss[990]: segfault at 26 ip 00007f87907c68bc sp 00007fffdc8ecaa8 error 4 in libdbus-1.so.3.19.7[7f8790793000+52000]


This has happened multiple times on multiple servers. Right now it keeps happening on one server which has several pretty heavily populated groups as indata to my script (on most servers there is only one group with 2-10 members as indata). So I think the script in some way manages to stress sssd to crashing. 

I have debug logs but would rather not post them or my script in a public bug due to gdpr and such. I will happily share them directly with the maintainers though. 


Version-Release number of selected component (if applicable):
sssd-2.0.0-43.el8_0.3.x86_64

How reproducible:
Most of the times the script works like it should. I haven't been able to pinpoint under which circumstances it causes sssd to segfault, which makes it hard to reproduce. 


Additional info:

Comment 1 Pavel Březina 2019-09-17 07:48:36 UTC

Hi, please send the logs and the script to my email: pbrezina. Thank you.

Comment 2 adam winberg 2019-09-17 07:55:36 UTC

Have sent the logs and the script

Comment 3 Alexey Tikhonov 2019-09-17 08:29:51 UTC

This might be similar to:
  * https://pagure.io/SSSD/sssd/issue/2245
or
  * https://pagure.io/SSSD/sssd/issue/2660

Comment 4 Alexey Tikhonov 2019-09-17 08:30:47 UTC

and https://bugzilla.redhat.com/show_bug.cgi?id=1731577

Comment 5 adam winberg 2019-09-18 13:46:57 UTC

Just to clarify: we do not use enumeration. 

To try to force the segfault I ran my scheduled script every 10min instead of once per hour. This had the opposed effect, I did not get any crash while doing this. Resetting the interval to once per hour produces crashes several times/day. Maybe some expiration in the sssd cache which causes this?

Comment 6 Pavel Březina 2019-10-03 09:04:50 UTC

Thank you for the coredump.

#0  _dbus_list_unlink (list=0x558f17440000, link=link@entry=0x0) at ../../dbus/dbus-list.c:502
#1  0x00007f26c24a17cd in _dbus_list_remove_link (list=<optimized out>, link=0x0) at ../../dbus/dbus-list.c:530
#2  0x00007f26c2490115 in _dbus_message_remove_counter (message=0x558f1743ff90, counter=0x558f17268cc0) at ../../dbus/dbus-message.c:384
#3  0x00007f26c2483af4 in _dbus_connection_message_sent_unlocked (connection=<optimized out>, message=<optimized out>) at ../../dbus/dbus-connection.c:664
#4  0x00007f26c249e558 in do_writing (transport=0x558f172697b0) at ../../dbus/dbus-transport-socket.c:726
#5  0x00007f26c249e7be in socket_do_iteration (transport=0x558f172697b0, flags=1, timeout_milliseconds=-1) at ../../dbus/dbus-transport-socket.c:1131
#6  0x00007f26c249d491 in _dbus_transport_do_iteration (transport=0x558f172697b0, flags=<optimized out>, timeout_milliseconds=<optimized out>) at ../../dbus/dbus-transport.c:1016
#7  0x00007f26c24852ec in _dbus_connection_do_iteration_unlocked (connection=0x558f17268ac0, pending=<optimized out>, flags=1, timeout_milliseconds=-1) at ../../dbus/dbus-connection.c:1227
#8  0x00007f26c24853d7 in _dbus_connection_send_preallocated_unlocked_no_update (connection=connection@entry=0x558f17268ac0, preallocated=0x0, message=message@entry=0x558f172c4780, 
    client_serial=client_serial@entry=0x0) at ../../dbus/dbus-connection.c:2057
#9  0x00007f26c24866cc in _dbus_connection_send_preallocated_and_unlock (client_serial=0x0, message=0x558f172c4780, preallocated=<optimized out>, connection=0x558f17268ac0)
    at ../../dbus/dbus-connection.c:2114
#10 _dbus_connection_send_and_unlock (connection=0x558f17268ac0, message=message@entry=0x558f172c4780, client_serial=client_serial@entry=0x0) at ../../dbus/dbus-connection.c:2114
#11 0x00007f26c2486758 in dbus_connection_send (connection=<optimized out>, message=message@entry=0x558f172c4780, serial=serial@entry=0x0) at ../../dbus/dbus-connection.c:3326
#12 0x00007f26c2d04c79 in sbus_reply (conn=<optimized out>, reply=0x558f172c4780) at src/sbus/connection/sbus_send.c:216
#13 0x00007f26c2d13477 in sbus_issue_request_done (subreq=0x0) at src/sbus/router/sbus_router_handler.c:150
#14 0x00007f26c2d104b7 in sbus_request_notify_success (table=<optimized out>, key=<optimized out>, req=0x558f172a9340, messages_fn=0x7f26c2d10020 <sbus_request_messages>, reply=0x558f172c4780)
    at src/sbus/request/sbus_request.c:289
#15 0x00007f26c28e1bd9 in tevent_common_invoke_timer_handler () from /lib64/libtevent.so.0
#16 0x00007f26c28e1d7e in tevent_common_loop_timer_delay () from /lib64/libtevent.so.0
#17 0x00007f26c28e2f2b in epoll_event_loop_once () from /lib64/libtevent.so.0
#18 0x00007f26c28e11bb in std_event_loop_once () from /lib64/libtevent.so.0
#19 0x00007f26c28dc395 in _tevent_loop_once () from /lib64/libtevent.so.0
#20 0x00007f26c28dc63b in tevent_common_loop_wait () from /lib64/libtevent.so.0
#21 0x00007f26c28e114b in std_event_loop_wait () from /lib64/libtevent.so.0
#22 0x00007f26c5977a07 in server_loop (main_ctx=0x558f1724fef0) at src/util/server.c:724
#23 0x0000558f169dfe61 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:485

Apparently, link is NULL and trying to access it results in SIGSEGV.

(gdb) f 0
#0  _dbus_list_unlink (list=0x558f17440000, link=link@entry=0x0) at ../../dbus/dbus-list.c:502
502	  if (link->next == link)
(gdb) l
497	 */
498	void
499	_dbus_list_unlink (DBusList **list,
500	                   DBusList  *link)
501	{
502	  if (link->next == link)
503	    {
504	      /* one-element list */
505	      *list = NULL;
506	    }

However, here is an assertion that says that link cannot be NULL:
(gdb) f 2
#2  0x00007f26c2490115 in _dbus_message_remove_counter (message=0x558f1743ff90, counter=0x558f17268cc0) at ../../dbus/dbus-message.c:384
384	  _dbus_list_remove_link (&message->counters, link);
(gdb) l
379	
380	  link = _dbus_list_find_last (&message->counters,
381	                               counter);
382	  _dbus_assert (link != NULL);
383	
384	  _dbus_list_remove_link (&message->counters, link);

David, do you know what could cause such backtrace?

Comment 7 David King 2019-10-04 12:09:02 UTC

(In reply to Pavel Březina from comment #6)
> David, do you know what could cause such backtrace?

Assertions are disabled in the dbus package. Assertions is dbus code represent situations which should not occur, so this is either a bug in libdbus somewhere, or the code is sssd that calls into libdbus.

Comment 8 adam winberg 2019-10-05 08:24:38 UTC

This is the bash script that causes the segfault:

for member in $(getent group $group | awk -F':' '{print$4}' | tr "," " ");do 
  if id -nGz "$member" 2>/dev/null | grep -qzxF "specialgroupg";then
    specialgroupg_users+=" $member"
  fi
done


where '$group' is an AD group. It is explicitly the 'id' command that causes the segfault and it seems to happen only if $group has more than 10 members or so. The script containing the snippet above is run once/hour. 

I have now replaced the 'id' command with an ldapsearch command and that solves the problem - i.e. no more sssd_nss crashes.

Comment 9 Pavel Březina 2019-10-14 10:24:42 UTC

I did not managed to reproduce this crash. The crash seems to always occur on one place, however it runs successfully dozens of times but then suddenly the same code path crash but there is not any obvious reason.

Logs says that there are multiple ouf of band request to refresh users information which are triggered by a midpoint refresh. Some of these requests calls sssd.nss.MemoryCache.UpdateInitgroups method in nss responder. This invokes `nss_memorycache_update_initgroups` which always finish successfully. Then SSSD tries to send empty successful message as a reply to the incoming message. This works dozens of time, then it crashes randomly.

Because simple googling for '_dbus_list_unlink' reveals multiple bugs with the same dbus backtraces across different programs and distributions that has never been solved I am switching the component to dbus for further investigation.

David, please tell me if me or Adam can provide some more information. If there is any data that can be obtained from SSSD, I can build a scratch build for Adam.

Comment 10 Pavel Březina 2019-11-05 10:36:37 UTC

*** Bug 1734248 has been marked as a duplicate of this bug. ***

Comment 19 Ding-Yi Chen 2021-09-17 05:22:06 UTC

NULL is a valid return value, as it means no value found.


/**
 * Finds a value in the list. Returns the last link
 * with value equal to the given data pointer.
 * This is a linear-time operation.
 * Returns #NULL if no value found that matches.
 *
 * @param list address of the list head.
 * @param data the value to find.
 * @returns the link if found
 */
DBusList*
_dbus_list_find_last (DBusList **list,
                      void      *data)
{
  DBusList *link;

  link = _dbus_list_get_last_link (list);

  while (link != NULL)
    {
      if (link->data == data)
        return link;
      
      link = _dbus_list_get_prev_link (list, link);
    }

  return NULL;
}

Given that _dbus_assert is disabled in released 
What should _dbus_message_remove_counter behave when _dbus_list_find_last return NULL?

Should it just return, or skip the line _dbus_list_find_last?

Comment 30 David King 2021-11-24 10:22:53 UTC

Reassigning as per comment #21.

Comment 37 Alexey Tikhonov 2022-01-04 18:56:05 UTC

*** Bug 1783169 has been marked as a duplicate of this bug. ***

Comment 40 Red Hat Bugzilla 2023-09-18 00:17:26 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.

aboscatt
atikhono
bthekkep
dchen
ddas
dking
grajaiya
hkhot
jhrozek
jmiracol
lslebodn
mupadhye
mzidek
nmadhesh
pbrezina
pkulkarn
ravpatil
rmarigny
sbarcomb
sgoveas
stanislav.moravec
swachira
tscherf