Bug 1752730
| Summary: | crash in _dbus_list_unlink() (sssd_nss segfault libdbus) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | adam winberg <adam.winberg> |
| Component: | sssd | Assignee: | SSSD Maintainers <sssd-maint> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | sssd-qe <sssd-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.4 | CC: | aboscatt, atikhono, bthekkep, dchen, ddas, dking, grajaiya, hkhot, jhrozek, jmiracol, lslebodn, mupadhye, mzidek, nmadhesh, pbrezina, pkulkarn, ravpatil, rmarigny, sbarcomb, sgoveas, stanislav.moravec, swachira, tscherf |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 8.0 | Flags: | bthekkep:
needinfo-
pm-rhel: mirror+ |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | sssd-2.6.2-4.el8_6 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-12 16:07:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
adam winberg
2019-09-17 06:11:10 UTC
Hi, please send the logs and the script to my email: pbrezina. Thank you. Have sent the logs and the script This might be similar to: * https://pagure.io/SSSD/sssd/issue/2245 or * https://pagure.io/SSSD/sssd/issue/2660 Just to clarify: we do not use enumeration. To try to force the segfault I ran my scheduled script every 10min instead of once per hour. This had the opposed effect, I did not get any crash while doing this. Resetting the interval to once per hour produces crashes several times/day. Maybe some expiration in the sssd cache which causes this? Thank you for the coredump.
#0 _dbus_list_unlink (list=0x558f17440000, link=link@entry=0x0) at ../../dbus/dbus-list.c:502
#1 0x00007f26c24a17cd in _dbus_list_remove_link (list=<optimized out>, link=0x0) at ../../dbus/dbus-list.c:530
#2 0x00007f26c2490115 in _dbus_message_remove_counter (message=0x558f1743ff90, counter=0x558f17268cc0) at ../../dbus/dbus-message.c:384
#3 0x00007f26c2483af4 in _dbus_connection_message_sent_unlocked (connection=<optimized out>, message=<optimized out>) at ../../dbus/dbus-connection.c:664
#4 0x00007f26c249e558 in do_writing (transport=0x558f172697b0) at ../../dbus/dbus-transport-socket.c:726
#5 0x00007f26c249e7be in socket_do_iteration (transport=0x558f172697b0, flags=1, timeout_milliseconds=-1) at ../../dbus/dbus-transport-socket.c:1131
#6 0x00007f26c249d491 in _dbus_transport_do_iteration (transport=0x558f172697b0, flags=<optimized out>, timeout_milliseconds=<optimized out>) at ../../dbus/dbus-transport.c:1016
#7 0x00007f26c24852ec in _dbus_connection_do_iteration_unlocked (connection=0x558f17268ac0, pending=<optimized out>, flags=1, timeout_milliseconds=-1) at ../../dbus/dbus-connection.c:1227
#8 0x00007f26c24853d7 in _dbus_connection_send_preallocated_unlocked_no_update (connection=connection@entry=0x558f17268ac0, preallocated=0x0, message=message@entry=0x558f172c4780,
client_serial=client_serial@entry=0x0) at ../../dbus/dbus-connection.c:2057
#9 0x00007f26c24866cc in _dbus_connection_send_preallocated_and_unlock (client_serial=0x0, message=0x558f172c4780, preallocated=<optimized out>, connection=0x558f17268ac0)
at ../../dbus/dbus-connection.c:2114
#10 _dbus_connection_send_and_unlock (connection=0x558f17268ac0, message=message@entry=0x558f172c4780, client_serial=client_serial@entry=0x0) at ../../dbus/dbus-connection.c:2114
#11 0x00007f26c2486758 in dbus_connection_send (connection=<optimized out>, message=message@entry=0x558f172c4780, serial=serial@entry=0x0) at ../../dbus/dbus-connection.c:3326
#12 0x00007f26c2d04c79 in sbus_reply (conn=<optimized out>, reply=0x558f172c4780) at src/sbus/connection/sbus_send.c:216
#13 0x00007f26c2d13477 in sbus_issue_request_done (subreq=0x0) at src/sbus/router/sbus_router_handler.c:150
#14 0x00007f26c2d104b7 in sbus_request_notify_success (table=<optimized out>, key=<optimized out>, req=0x558f172a9340, messages_fn=0x7f26c2d10020 <sbus_request_messages>, reply=0x558f172c4780)
at src/sbus/request/sbus_request.c:289
#15 0x00007f26c28e1bd9 in tevent_common_invoke_timer_handler () from /lib64/libtevent.so.0
#16 0x00007f26c28e1d7e in tevent_common_loop_timer_delay () from /lib64/libtevent.so.0
#17 0x00007f26c28e2f2b in epoll_event_loop_once () from /lib64/libtevent.so.0
#18 0x00007f26c28e11bb in std_event_loop_once () from /lib64/libtevent.so.0
#19 0x00007f26c28dc395 in _tevent_loop_once () from /lib64/libtevent.so.0
#20 0x00007f26c28dc63b in tevent_common_loop_wait () from /lib64/libtevent.so.0
#21 0x00007f26c28e114b in std_event_loop_wait () from /lib64/libtevent.so.0
#22 0x00007f26c5977a07 in server_loop (main_ctx=0x558f1724fef0) at src/util/server.c:724
#23 0x0000558f169dfe61 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:485
Apparently, link is NULL and trying to access it results in SIGSEGV.
(gdb) f 0
#0 _dbus_list_unlink (list=0x558f17440000, link=link@entry=0x0) at ../../dbus/dbus-list.c:502
502 if (link->next == link)
(gdb) l
497 */
498 void
499 _dbus_list_unlink (DBusList **list,
500 DBusList *link)
501 {
502 if (link->next == link)
503 {
504 /* one-element list */
505 *list = NULL;
506 }
However, here is an assertion that says that link cannot be NULL:
(gdb) f 2
#2 0x00007f26c2490115 in _dbus_message_remove_counter (message=0x558f1743ff90, counter=0x558f17268cc0) at ../../dbus/dbus-message.c:384
384 _dbus_list_remove_link (&message->counters, link);
(gdb) l
379
380 link = _dbus_list_find_last (&message->counters,
381 counter);
382 _dbus_assert (link != NULL);
383
384 _dbus_list_remove_link (&message->counters, link);
David, do you know what could cause such backtrace?
(In reply to Pavel Březina from comment #6) > David, do you know what could cause such backtrace? Assertions are disabled in the dbus package. Assertions is dbus code represent situations which should not occur, so this is either a bug in libdbus somewhere, or the code is sssd that calls into libdbus. This is the bash script that causes the segfault:
for member in $(getent group $group | awk -F':' '{print$4}' | tr "," " ");do
if id -nGz "$member" 2>/dev/null | grep -qzxF "specialgroupg";then
specialgroupg_users+=" $member"
fi
done
where '$group' is an AD group. It is explicitly the 'id' command that causes the segfault and it seems to happen only if $group has more than 10 members or so. The script containing the snippet above is run once/hour.
I have now replaced the 'id' command with an ldapsearch command and that solves the problem - i.e. no more sssd_nss crashes.
I did not managed to reproduce this crash. The crash seems to always occur on one place, however it runs successfully dozens of times but then suddenly the same code path crash but there is not any obvious reason. Logs says that there are multiple ouf of band request to refresh users information which are triggered by a midpoint refresh. Some of these requests calls sssd.nss.MemoryCache.UpdateInitgroups method in nss responder. This invokes `nss_memorycache_update_initgroups` which always finish successfully. Then SSSD tries to send empty successful message as a reply to the incoming message. This works dozens of time, then it crashes randomly. Because simple googling for '_dbus_list_unlink' reveals multiple bugs with the same dbus backtraces across different programs and distributions that has never been solved I am switching the component to dbus for further investigation. David, please tell me if me or Adam can provide some more information. If there is any data that can be obtained from SSSD, I can build a scratch build for Adam. *** Bug 1734248 has been marked as a duplicate of this bug. *** NULL is a valid return value, as it means no value found.
/**
* Finds a value in the list. Returns the last link
* with value equal to the given data pointer.
* This is a linear-time operation.
* Returns #NULL if no value found that matches.
*
* @param list address of the list head.
* @param data the value to find.
* @returns the link if found
*/
DBusList*
_dbus_list_find_last (DBusList **list,
void *data)
{
DBusList *link;
link = _dbus_list_get_last_link (list);
while (link != NULL)
{
if (link->data == data)
return link;
link = _dbus_list_get_prev_link (list, link);
}
return NULL;
}
Given that _dbus_assert is disabled in released
What should _dbus_message_remove_counter behave when _dbus_list_find_last return NULL?
Should it just return, or skip the line _dbus_list_find_last?
Reassigning as per comment #21. *** Bug 1783169 has been marked as a duplicate of this bug. *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |