Bug 906398

Summary: sssd_be crashes sometimes
Product: Red Hat Enterprise Linux 6 Reporter: Kaushik Banerjee <kbanerje>
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED ERRATA QA Contact: Kaushik Banerjee <kbanerje>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.4CC: chhudson, dpal, grajaiya, jgalipea, jwest, lslebodn, mkosek, nkarandi, pbrezina, tlavigne
Target Milestone: rcKeywords: Regression, ZStream
Target Release: 6.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.9.2-89.el6 Doc Type: Bug Fix
Doc Text:
Cause: There was a get_attribute call used in the group processing codebase that, if a nonexistent attribute was requested, would allocate an empty attribute instead and reallocate the previous attribute array. The reallocation might invalidate existing pointers that were pointing to the array previously. Consequence: In case a group contained no members at all, the array would be reallocated and existing pointers invalidated, resulting in a crash. Fix: Another get_attribute was used that returns ENOENT instead of creating an empty attribute Result: Requesting an empty group no longer crashes the sssd
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 22:14:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 956136    
Attachments:
Description Flags
Backtrace of the crash
none
gzipped coredump
none
gzipped sssd_domain.log none

Description Kaushik Banerjee 2013-01-31 15:08:32 UTC
Created attachment 690999 [details]
Backtrace of the crash

Description of problem:
sssd_be crashes sometimes.

Version-Release number of selected component (if applicable):
1.9.2-82

How reproducible:
Can't reproduce. The crash appears from the sssd automation runs sometimes.

Steps to Reproduce:
1. None. But, from the timings of the crash, I know that "enumerate=true" and "ldap_schema=rfc2307bis" was set in sssd.conf when sssd_be crashed.
2.
  
Actual results:
sssd_be crashes. Will attach the backtrace and coredump.

Expected results:


Additional info:

Comment 1 Kaushik Banerjee 2013-01-31 15:10:20 UTC
Created attachment 691000 [details]
gzipped coredump

Comment 4 Jakub Hrozek 2013-01-31 15:43:53 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1799

Comment 5 Kaushik Banerjee 2013-02-24 09:03:43 UTC
Created attachment 701952 [details]
gzipped sssd_domain.log

I managed to get the domain logs at the time the crash occurred. Hope this helps.

Comment 7 Kaushik Banerjee 2013-04-12 08:58:49 UTC
From the domain logs I attached in comment #5 , it seems the number of groups members are shown in negative. And sssd_be crashes just after the following lines in log:

(Fri Feb 22 14:13:59 2013) [sssd[be[LDAP]]] [sdap_process_ghost_members] (0x0400): The group has 0 members
(Fri Feb 22 14:13:59 2013) [sssd[be[LDAP]]] [sdap_process_ghost_members] (0x0400): Group has -401273744 members
(Fri Feb 22 14:13:59 2013) [sssd[be[LDAP]]] [sdap_save_group] (0x0400): Storing info for group Group1

Comment 8 Jakub Hrozek 2013-04-12 09:37:19 UTC
I think I finally reproduced locally:

==12658== Invalid read of size 8
==12658==    at 0x12BF81A2: sdap_process_ghost_members (sdap_async_groups.c:366)
==12658==    by 0x12BFAB8A: sdap_save_group (sdap_async_groups.c:592)
==12658==    by 0x12BFC21C: sdap_save_groups (sdap_async_groups.c:782)
==12658==    by 0x12C01FED: sdap_get_groups_process (sdap_async_groups.c:1687)
==12658==    by 0x12BEBFAA: sdap_get_generic_done (sdap_async.c:1558)
==12658==    by 0x12BEB780: sdap_get_generic_ext_done (sdap_async.c:1449)
==12658==    by 0x12BE4677: sdap_process_message (sdap_async.c:366)
==12658==    by 0x12BE3BBF: sdap_process_result (sdap_async.c:209)
==12658==    by 0x12BE3211: sdap_ldap_next_result (sdap_async.c:159)
==12658==    by 0x54DDD3F: tevent_common_loop_timer_delay (in /usr/lib64/libtevent.so.0.9.17)
==12658==    by 0x54DD3EB: ??? (in /usr/lib64/libtevent.so.0.9.17)
==12658==    by 0x54DA05F: _tevent_loop_once (in /usr/lib64/libtevent.so.0.9.17)
==12658==  Address 0x1560bb38 is 312 bytes inside a block of size 320 free'd
==12658==    at 0x4C2AA2E: realloc (vg_replace_malloc.c:662)
==12658==    by 0x56EB10E: _talloc_realloc (in /usr/lib64/libtalloc.so.2.0.8)
==12658==    by 0x5264EB6: sysdb_attrs_get_el_ext (sysdb.c:319)
==12658==    by 0x5265004: sysdb_attrs_get_el (sysdb.c:347)
==12658==    by 0x12BF77FD: sdap_process_ghost_members (sdap_async_groups.c:326)
==12658==    by 0x12BFAB8A: sdap_save_group (sdap_async_groups.c:592)
==12658==    by 0x12BFC21C: sdap_save_groups (sdap_async_groups.c:782)
==12658==    by 0x12C01FED: sdap_get_groups_process (sdap_async_groups.c:1687)
==12658==    by 0x12BEBFAA: sdap_get_generic_done (sdap_async.c:1558)
==12658==    by 0x12BEB780: sdap_get_generic_ext_done (sdap_async.c:1449)
==12658==    by 0x12BE4677: sdap_process_message (sdap_async.c:366)
==12658==    by 0x12BE3BBF: sdap_process_result (sdap_async.c:209)
==12658== 
==12658== Invalid read of size 4
==12658==    at 0x12BF81B8: sdap_process_ghost_members (sdap_async_groups.c:367)
==12658==    by 0x12BFAB8A: sdap_save_group (sdap_async_groups.c:592)
==12658==    by 0x12BFC21C: sdap_save_groups (sdap_async_groups.c:782)
==12658==    by 0x12C01FED: sdap_get_groups_process (sdap_async_groups.c:1687)
==12658==    by 0x12BEBFAA: sdap_get_generic_done (sdap_async.c:1558)
==12658==    by 0x12BEB780: sdap_get_generic_ext_done (sdap_async.c:1449)
==12658==    by 0x12BE4677: sdap_process_message (sdap_async.c:366)
==12658==    by 0x12BE3BBF: sdap_process_result (sdap_async.c:209)
==12658==    by 0x12BE3211: sdap_ldap_next_result (sdap_async.c:159)
==12658==    by 0x54DDD3F: tevent_common_loop_timer_delay (in /usr/lib64/libtevent.so.0.9.17)
==12658==    by 0x54DD3EB: ??? (in /usr/lib64/libtevent.so.0.9.17)
==12658==    by 0x54DA05F: _tevent_loop_once (in /usr/lib64/libtevent.so.0.9.17)
==12658==  Address 0x1560bb30 is 304 bytes inside a block of size 320 free'd
==12658==    at 0x4C2AA2E: realloc (vg_replace_malloc.c:662)
==12658==    by 0x56EB10E: _talloc_realloc (in /usr/lib64/libtalloc.so.2.0.8)
==12658==    by 0x5264EB6: sysdb_attrs_get_el_ext (sysdb.c:319)
==12658==    by 0x5265004: sysdb_attrs_get_el (sysdb.c:347)
==12658==    by 0x12BF77FD: sdap_process_ghost_members (sdap_async_groups.c:326)
==12658==    by 0x12BFAB8A: sdap_save_group (sdap_async_groups.c:592)
==12658==    by 0x12BFC21C: sdap_save_groups (sdap_async_groups.c:782)
==12658==    by 0x12C01FED: sdap_get_groups_process (sdap_async_groups.c:1687)
==12658==    by 0x12BEBFAA: sdap_get_generic_done (sdap_async.c:1558)
==12658==    by 0x12BEB780: sdap_get_generic_ext_done (sdap_async.c:1449)
==12658==    by 0x12BE4677: sdap_process_message (sdap_async.c:366)
==12658==    by 0x12BE3BBF: sdap_process_result (sdap_async.c:209)
==12658==

Comment 12 Kaushik Banerjee 2013-04-17 14:27:29 UTC
Jakub, since you were able to reproduce and fix the issue, can you share the reproducer steps with us?

Comment 13 Jakub Hrozek 2013-04-17 14:34:05 UTC
(In reply to comment #12)
> Jakub, since you were able to reproduce and fix the issue, can you share the
> reproducer steps with us?

I set shorter enum_cache_timeout and ldap_enumeration_refresh_timeout to force enumeration to run more frequently, basically. I think that the fact that I had an empty group on my LDAP server also played a role.

Comment 22 Kaushik Banerjee 2013-09-03 16:05:58 UTC
The crash is no longer seen with automation runs using the build 1.9.2-123

Verified SanityOnly

Comment 23 errata-xmlrpc 2013-11-21 22:14:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1680.html