Bug 1783190 - [abrt] [faf] sssd: raise(): /usr/libexec/sssd/sssd_autofs killed by 6
Summary: [abrt] [faf] sssd: raise(): /usr/libexec/sssd/sssd_autofs killed by 6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.2
Assignee: Alexey Tikhonov
QA Contact: sssd-qe
URL: http://faf.lab.eng.brq.redhat.com/faf...
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-13 09:20 UTC by Steeve Goveas
Modified: 2020-05-02 19:15 UTC (History)
8 users (show)

Fixed In Version: sssd-2.2.3-9.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:56:29 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 5096 0 None closed util/sss_ptr_hash.c: potential double free in `sss_ptr_hash_delete_cb()` 2020-08-17 17:38:25 UTC
Red Hat Product Errata RHBA-2020:1863 0 None None None 2020-04-28 16:56:40 UTC

Description Steeve Goveas 2019-12-13 09:20:56 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/fa78a2abc202ce1ff059042775915deb61d9e069/

time:           Thu 12 Dec 2019 07:53:30 AM EST
package:        sssd-common-2.2.3-3.el8
reason:         sssd_autofs killed by SIGABRT
crash_function: talloc_abort
cmdline:        /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --logger=files
executable:     /usr/libexec/sssd/sssd_autofs
component:      sssd
uid:            0
username:       root
hostname:       ci-vm-10-0-154-155.hosted.upshift.rdu2.redhat.com
os_release:     Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
architecture:   x86_64
pwd:            /
kernel:         4.18.0-161.el8.x86_64
abrt_version:   2.10.9

Reports:
uReport: BTHASH=fa78a2abc202ce1ff059042775915deb61d9e069
ABRT Server: URL=http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/fa78a2abc202ce1ff059042775915deb61d9e069
ABRT Server: URL=http://faf.lab.eng.brq.redhat.com/faf/reports/14853/
CI Job: https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/sgoveas-dev-pipeline/job/automount82/6/

Full Backtrace:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/libexec/sssd/sssd_autofs-2.2.3-3.el8.x86_64.debug
[New LWP 29128]
Error while reading shared library symbols for /usr/lib64/sssd/libsss_util.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_util.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libbasicobjects.so.0:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libbasicobjects.so.0.1.0-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libref_array.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libref_array.so.1.2.1-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libcollection.so.4:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libcollection.so.4.1.1-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/sssd/libsss_cert.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_cert.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/sssd/libsss_crypt.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_crypt.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/sssd/libsss_child.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_child.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libsss_certmap.so.0:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libsss_certmap.so.0.1.0-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/sssd/libsss_debug.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_debug.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libsystemd.so.0:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libsystemd.so.0.23.0-239-21.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/sssd/libsss_iface.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_iface.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/sssd/libsss_sbus.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/sssd/libsss_sbus.so-2.2.3-3.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libdhash.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libdhash.so.1.1.0-0.6.1-39.el8.x86_64.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Error while reading shared library symbols for /lib64/libpath_utils.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libpath_utils.so.1.0.1-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/liblz4.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/liblz4.so.1.8.1-1.8.1.2-4.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libmount.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libmount.so.1.1.0-2.32.1-17.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libgcc_s.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/lib64/libgcc_s-8-20191121.so.1-8.3.1-5.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libblkid.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libblkid.so.1.1.0-2.32.1-17.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libuuid.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libuuid.so.1.3.0-2.32.1-17.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libsasl2.so.3:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libsasl2.so.3.0.0-2.1.27-1.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libk5crypto.so.3:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libk5crypto.so.3.1-1.17-17.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libcom_err.so.2:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libcom_err.so.2.1-1.44.6-3.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libkrb5support.so.0:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libkrb5support.so.0.1-1.17-17.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libkeyutils.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libkeyutils.so.1.6-1.5.10-6.el8.x86_64.debug
Error while reading shared library symbols for /usr/lib64/ldb/modules/ldb/memberof.so:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/ldb/modules/ldb/memberof.so-2.2.3-3.el8.x86_64.debug
Core was generated by `/usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --logger=files'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Thread 1 (Thread 0x7f8ac8c50b40 (LWP 29128)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
       set = {__val = {6786, 94046636117600, 140234044167343, 94046636110576, 2333820375184795732, 3689919478405018161, 94042607072058, 674131205115182592, 94046636078016, 94046636109776, 140234044167343, 2162335232, 1, 304394, 94046635903744, 140234005728470}}
       pid = <optimized out>
       tid = <optimized out>
       ret = <optimized out>
#1  0x00007f8ac4aa9b25 in __GI_abort () at abort.c:79
       save_stage = 1
       act = {__sigaction_handler = {sa_handler = 0xf0571c20, sa_sigaction = 0xf0571c20}, sa_mask = {__val = {94046635761680, 94046635761778, 5, 0, 94046636102928, 94046636102832, 0, 0, 0, 0, 674131205115182592, 1576155200, 40, 94046636102256, 140233991290053, 0}}, sa_flags = -262802240, sa_restorer = 0x5588f055f4c0}
       sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f8ac53b9de0 in talloc_abort (reason=0x7f8ac53c7a88 "Bad talloc magic value - unknown value") at ../../talloc.c:505
No locals.
#3  0x00007f8ac53b9d91 in talloc_abort_unknown_value () at ../../talloc.c:534
No locals.
#4  talloc_chunk_from_ptr (ptr=0x5588f0567350) at ../../talloc.c:534
       pp = 0x5588f0567350 ""
       tc = 0x5588f05672f0
       pp = <optimized out>
       tc = <optimized out>
#5  _talloc_free (ptr=0x5588f0567350, location=0x7f8ac86354af "src/util/sss_ptr_hash.c:163") at ../../talloc.c:1767
       tc = <optimized out>
#6  0x00007f8ac861a5ea in ?? () from /usr/lib64/sssd/libsss_util.so
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.
From                To                  Syms Read   Shared Object Library
0x00007f8ac8848090  0x00007f8ac8848e89  Yes         /lib64/libdl.so.2


Crash is seen in ldap_provider/automount test

2019-12-12T12:51:29 [ci-vm-10-0-154-155.h] ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

2019-12-12T12:51:29 [ci-vm-10-0-154-155.h] ::   Set entry cache autofs timeout value

2019-12-12T12:51:29 [ci-vm-10-0-154-155.h] ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

2019-12-12T12:51:40 [ci-vm-10-0-153-171.h] *** Current Time: Thu Dec 12 07:51:40 2019 Localwatchdog at: Fri Dec 13 07:47:39 2019

2019-12-12T12:51:49 [ci-vm-10-0-154-155.h] Redirecting to /bin/systemctl restart autofs.service

2019-12-12T12:51:50 [ci-vm-10-0-154-155.h] :: [ 07:51:50 ] :: [  BEGIN   ] :: Running 'cd /share2/key1;cd -'

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] /mnt/tests/gitlab.cee.redhat.com/sgoveas/sssd-qe/-/archive/automount82/sssd-qe-automount82.tar.gz/client/ldap_provider/automount

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] :: [ 07:51:50 ] :: [   PASS   ] :: Command 'cd /share2/key1;cd -' (Expected 0, got 0)

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] :: [ 07:51:50 ] :: [  BEGIN   ] :: Running 'mount | grep /share2/key1 | grep ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1'

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1 on /share2/key1 type nfs4 (rw,relatime,vers=4.2,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.154.155,local_lock=none,addr=10.0.153.171)

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] :: [ 07:51:50 ] :: [   PASS   ] :: Command 'mount | grep /share2/key1 | grep ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1' (Expected 0, got 0)

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] :: [ 07:51:50 ] :: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Searching for automount map entries with base \[ou=mount,dc=example,dc=com\]' 

2019-12-12T12:51:51 [ci-vm-10-0-154-155.h] :: [ 07:51:50 ] :: [   LOG    ] :: Sleeping for 12 seconds

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] :: [ 07:52:02 ] :: [  BEGIN   ] :: Running 'cd /share2/key1;cd -'

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] /mnt/tests/gitlab.cee.redhat.com/sgoveas/sssd-qe/-/archive/automount82/sssd-qe-automount82.tar.gz/client/ldap_provider/automount

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] :: [ 07:52:02 ] :: [   PASS   ] :: Command 'cd /share2/key1;cd -' (Expected 0, got 0)

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] :: [ 07:52:02 ] :: [  BEGIN   ] :: Running 'mount | grep /share2/key1 | grep ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1'

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1 on /share2/key1 type nfs4 (rw,relatime,vers=4.2,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.154.155,local_lock=none,addr=10.0.153.171)

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] :: [ 07:52:02 ] :: [   PASS   ] :: Command 'mount | grep /share2/key1 | grep ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1' (Expected 0, got 0)

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] :: [ 07:52:02 ] :: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should not contain 'Searching for automount map entries with base \[ou=mount,dc=example,dc=com\]' 

2019-12-12T12:52:03 [ci-vm-10-0-154-155.h] :: [ 07:52:02 ] :: [   LOG    ] :: Sleeping for 110 seconds

2019-12-12T12:52:15 [ci-vm-10-0-154-155.h] *** Current Time: Thu Dec 12 07:52:15 2019 Localwatchdog at: Fri Dec 13 07:48:15 2019

2019-12-12T12:52:40 [ci-vm-10-0-153-171.h] *** Current Time: Thu Dec 12 07:52:40 2019 Localwatchdog at: Fri Dec 13 07:47:39 2019

2019-12-12T12:53:15 [ci-vm-10-0-154-155.h] *** Current Time: Thu Dec 12 07:53:15 2019 Localwatchdog at: Fri Dec 13 07:48:15 2019

2019-12-12T12:53:40 [ci-vm-10-0-153-171.h] *** Current Time: Thu Dec 12 07:53:40 2019 Localwatchdog at: Fri Dec 13 07:47:39 2019

2019-12-12T12:53:53 [ci-vm-10-0-154-155.h] :: [ 07:53:52 ] :: [  BEGIN   ] :: Running 'cd /share2/key1;cd -'

2019-12-12T12:53:53 [ci-vm-10-0-154-155.h] /mnt/tests/gitlab.cee.redhat.com/sgoveas/sssd-qe/-/archive/automount82/sssd-qe-automount82.tar.gz/client/ldap_provider/automount

2019-12-12T12:53:53 [ci-vm-10-0-154-155.h] :: [ 07:53:53 ] :: [   PASS   ] :: Command 'cd /share2/key1;cd -' (Expected 0, got 0)

2019-12-12T12:53:53 [ci-vm-10-0-154-155.h] :: [ 07:53:53 ] :: [  BEGIN   ] :: Running 'mount | grep /share2/key1 | grep ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1'

2019-12-12T12:53:53 [ci-vm-10-0-154-155.h] ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1 on /share2/key1 type nfs4 (rw,relatime,vers=4.2,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.154.155,local_lock=none,addr=10.0.153.171)

2019-12-12T12:53:53 [ci-vm-10-0-154-155.h] :: [ 07:53:53 ] :: [   PASS   ] :: Command 'mount | grep /share2/key1 | grep ci-vm-10-0-153-171.hosted.upshift.rdu2.redhat.com:/export/shared2/key1' (Expected 0, got 0)

2019-12-12T12:53:54 [ci-vm-10-0-154-155.h] :: [ 07:53:53 ] :: [   FAIL   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Searching for automount map entries with base \[ou=mount,dc=example,dc=com\]' 

2019-12-12T12:53:54 [ci-vm-10-0-154-155.h] :: [ 07:53:53 ] :: [   LOG    ] :: sleep 86400

Comment 1 Alexey Tikhonov 2019-12-17 18:51:10 UTC
...
#3  0x00007fad87794d91 in talloc_abort_unknown_value () at ../../talloc.c:534
#4  talloc_chunk_from_ptr (ptr=0x55977819ea00) at ../../talloc.c:534
#5  _talloc_free (ptr=0x55977819ea00, location=0x7fad8aa104af "src/util/sss_ptr_hash.c:163") at ../../talloc.c:1767
#6  0x00007fad8a9f55ea in sss_ptr_hash_delete_cb (item=0x559778196f70, deltype=HASH_ENTRY_DESTROY, pvt=<optimized out>)
    at src/util/sss_ptr_hash.c:163
#7  0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
#8  0x00007fad8a9f5d46 in sss_ptr_hash_delete (free_value=false, key=0x55977819e980 "auto.master", table=0x55977818f4c0)
    at src/util/sss_ptr_hash.c:348
#9  sss_ptr_hash_delete (table=0x55977818f4c0, key=0x55977819e980 "auto.master", free_value=<optimized out>)
    at src/util/sss_ptr_hash.c:323
#10 0x00007fad879b33b9 in tevent_common_invoke_timer_handler (te=te@entry=0x5597781989d0, current_time=..., 
    removed=removed@entry=0x0) at ../../tevent_timed.c:370
#11 0x00007fad879b355e in tevent_common_loop_timer_delay (ev=0x559778164b00) at ../../tevent_timed.c:442
#12 0x00007fad879b4a99 in epoll_event_loop (tvalp=0x7ffce9a9bab0, epoll_ev=0x559778164d90) at ../../tevent_epoll.c:667
...


(gdb) frame 7
#7  0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
1077	        hdelete_callback(table, HASH_ENTRY_DESTROY, &element->entry);
(gdb) p *key
$3 = {type = HASH_KEY_STRING, {str = 0x55977819e980 "auto.master"...}}
(gdb) p *((struct talloc_chunk *)((char *)element->entry.value.ptr - 96))
$1 = {flags = 2014977200, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, destructor = 0x0, 
  name = 0x7fad8aa104af "src/util/sss_ptr_hash.c:163", size = 16...}

[where 96 == TC_HDR_SIZE == TC_ALIGN16(sizeof(struct talloc_chunk))]

=> libdhash tries to delete entry with key=="auto.master" which was already freed with `sss_ptr_hash_delete_cb()`
so when it get to `sss_ptr_hash_delete_cb()` again in frame #6
#6  0x00007fad8a9f55ea in sss_ptr_hash_delete_cb (item=0x559778196f70, deltype=HASH_ENTRY_DESTROY, pvt=<optimized out>)
    at src/util/sss_ptr_hash.c:163
163	    talloc_free(value);
  --  it is double free => raise


(gdb) frame 11
#11 0x00007fad879b355e in tevent_common_loop_timer_delay (ev=0x559778164b00) at ../../tevent_timed.c:442
442		ret = tevent_common_invoke_timer_handler(te, current_time, NULL);
(gdb) p te
$4 = (struct tevent_timer *) 0x5597781989d0
(gdb) p ev->timer_events 
$5 = (struct tevent_timer *) 0x55977817ee40

but https://git.samba.org/samba.git/?p=samba.git;a=blob;f=lib/tevent/tevent_timed.c#l409 :
struct tevent_timer *te = ev->timer_events;

And the moment it is not clear to me how could it happen that te != ev->timer_events

(gdb) p *te
$6 = {prev = 0x0, next = 0x0, event_ctx = 0x0, wrapper = 0x0, busy = true, destroyed = true, next_event = {tv_sec = 1575917185, 
    tv_usec = 379978}, handler = 0x559777be26e0 <autofs_enumctx_lifetime_timeout>, private_data = 0x559778195510, 
  handler_name = 0x559777bf9e38 "autofs_enumctx_lifetime_timeout", location = 0x559777bf9e08 "src/responder/autofs/autofssrv_cmd.c:167"...}
(gdb) p *((struct autofs_enum_ctx *)(te->private_data))
$10 = {result = 0x55977819fe30, found = true, ready = true, key = 0x55977819e980 "auto.master", table = 0x55977818f4c0, notify_list = 0x0}
(gdb) p *((struct talloc_chunk *)((char *)te - 96))
$7 = {flags = 6406304, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, 
  destructor = 0x7fad879b2c60 <tevent_common_timed_destructor>, name = 0x7fad879b5e33 "struct tevent_timer", size = 96, limit = 0x0, pool = 0x0}

 => this `te` was already freed and most probably executed before

What should actually be executed here is:
(gdb) p *ev->timer_events 
$8 = {prev = 0x559778198520, next = 0x559778198520, event_ctx = 0x559778164b00, wrapper = 0x0, busy = false, destroyed = false, next_event = {
    tv_sec = 1575917185, tv_usec = 406319}, handler = 0x559777be26e0 <autofs_enumctx_lifetime_timeout>, private_data = 0x559778186800, 
  handler_name = 0x559777bf9e38 "autofs_enumctx_lifetime_timeout", location = 0x559777bf9e08 "src/responder/autofs/autofssrv_cmd.c:167",...}
(gdb) p *((struct autofs_enum_ctx *)(ev->timer_events->private_data))
$9 = {result = 0x5597781a64c0, found = true, ready = true, key = 0x5597781a1530 "auto.direct", table = 0x55977818f4c0, notify_list = 0x0}


And also strange thing is:

(gdb) frame
#7  0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
1077	        hdelete_callback(table, HASH_ENTRY_DESTROY, &element->entry);
(gdb) p *key
$61 = {type = HASH_KEY_STRING, {str = 0x55977819e980 "auto.master", c_str = 0x55977819e980 "auto.master", ul = 94109043386752}}
(gdb) p *table
$53 = {p = 0, maxp = 4, entry_count = 1, bucket_count = 4, segment_count = 1, min_load_factor = 1, max_load_factor = 5, directory_size = 4, 
  directory_size_shift = 2, segment_size = 4, segment_size_shift = 2, delete_callback = 0x7fad8a9f5550 <sss_ptr_hash_delete_cb>..., statistics = {hash_accesses = 12, hash_collisions = 0, table_expansions = 0, table_contractions = 0}}

  -- entry_count = 1

And this single entry is:
(gdb) p *table->directory[0][1]
$49 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x5597781a0d00 "auto.direct"...}}, 
       value = {type = HASH_VALUE_PTR, {ptr = 0x5597781a14b0...}}}, next = 0x0}

how is it possible
(gdb) p *element
$56 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x55977818bf70 "auto.master"...}}, 
       value = {type = HASH_VALUE_PTR, {ptr = 0x55977819ea00...}}}, next = 0x0}



So at this moment I have two questions:
(1) how does it happen libtevent tries to execute already destroyed timer event instead of event contained in tevent_context?
(2) even so, how does libdhash find entry with key=="auto.master" if it was already removed from the hash table?


Steeve, do you have a reproducer?

Comment 2 Alexey Tikhonov 2019-12-18 16:01:22 UTC
(In reply to Alexey Tikhonov from comment #1)
> (gdb) p *te
> $6 = {prev = 0x0, next = 0x0, event_ctx = 0x0, wrapper = 0x0, busy = true,
> destroyed = true, next_event = {tv_sec = 1575917185, tv_usec = 379978}, handler = 0x559777be26e0
> <autofs_enumctx_lifetime_timeout>, private_data = 0x559778195510, 
> handler_name = 0x559777bf9e38 "autofs_enumctx_lifetime_timeout", location
> = 0x559777bf9e08 "src/responder/autofs/autofssrv_cmd.c:167"...}

> (gdb) p *((struct talloc_chunk *)((char *)te - 96))
> $7 = {flags = 6406304, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0,
> refs = 0x0, destructor = 0x7fad879b2c60 <tevent_common_timed_destructor>, name =
> 0x7fad879b5e33 "struct tevent_timer", size = 96, limit = 0x0, pool = 0x0}
> 
>  => this `te` was already freed and most probably executed before

Actually event was destroyed (destroyed=true) but not yet freed (name in talloc_chunk header still has type name) - d-tor failed due to busy=true.

Comment 5 Alexey Tikhonov 2020-01-06 18:39:39 UTC
(In reply to Alexey Tikhonov from comment #1)

> And the moment it is not clear to me how could it happen that te !=
> ev->timer_events

This is actually clear:
`tevent_common_invoke_timer_handler()` calls `DLIST_REMOVE(te->event_ctx->timer_events, te)`
so `te` that is being processed was removed from queue (located in the heap) already.

> (1) how does it happen libtevent tries to execute already destroyed timer
> event instead of event contained in tevent_context?

So this statement is just wrong.

Comment 6 Alexey Tikhonov 2020-01-06 20:01:17 UTC
(In reply to Alexey Tikhonov from comment #1)

> (2) even so, how does libdhash find entry with key=="auto.master" if it was
> already removed from the hash table?

This is something I still do not understand:

#7  0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
1077	        hdelete_callback(table, HASH_ENTRY_DESTROY, &element->entry);

sss_ptr_hash_delete_cb(element->entry) -> 
     autofs_maps_delete_cb() ->
          talloc_unlink(autofs_ctx->maps, enum_ctx = element->entry->value.ptr->ptr)  --  enum_ctx doesn't have talloc d-tor
     _talloc_free(element->entry->value.ptr) -> ... abort
--  this execution chain looks like it doesn't touch `element` and `element->entry.key.str` but:

(gdb) p element->entry.key.str 
$78 = 0x55977818bf70 "auto.master"
(gdb) p *((struct talloc_chunk *)((char *)element->entry.key.str - 96))
$79 = {flags = 2014964112, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, destructor = 0x0, name = 0x7fad8aa0a7a0 "src/util/util.c:379", size = 12, limit = 0x0, pool = 0x0}

and

(gdb) p *((struct talloc_chunk *)((char *)element - 96))
$75 = {flags = 3927370867, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, destructor = 0x0, name = 0x7fad8aa0a7a0 "src/util/util.c:379", size = 40, limit = 0x0, pool = 0x0}

https://pagure.io/SSSD/sssd/blob/master/f/src/util/util.c#_379 : `talloc_free(ptr)`  --  those elements are freed in the heap.
And hash `table` contained in the heap on the momemnt of crash doesn't contain entry with "auto.master" as well.

So looks like element/element->entry.key/element->entry.value are deleted and removed from the table after `hash_delete()->lookup()` found it in hash table but before `hdelete_callback()` returned...

Comment 7 Alexey Tikhonov 2020-01-07 12:27:43 UTC
(In reply to Alexey Tikhonov from comment #6)
> So looks like element/element->entry.key/element->entry.value are deleted
> and removed from the table after `hash_delete()->lookup()` found it in hash
> table but before `hdelete_callback()` returned...

Exactly.

Thanks to Pavel's hint that `sss_ptr_hash_spy` is linked to memory context of actually stored data, I figured out what is happening:

hash_delete() -> hdelete_callback [sss_ptr_hash_delete_cb] -> data->callback [autofs_maps_delete_cb] -> talloc_unlink(autofs_ctx->maps, enum_ctx) -> spy is freed -> sss_ptr_hash_spy_destructor() -> sss_ptr_hash_delete() -> hash_delete() called recursively and frees element, its key and value -> stack unwinds back to sss_ptr_hash_delete_cb::talloc_free(value)  --  double free

I guess this is quite old bug in `sss_ptr_hash` that was only revealed by recent autofs changes.

It is also present in sssd-1-16 so I think this bz is worth cloning for RHEL7.

Comment 9 Alexey Tikhonov 2020-01-11 15:25:30 UTC
(In reply to Alexey Tikhonov from comment #7)
> I guess this is quite old bug in `sss_ptr_hash` that was only revealed by
> recent autofs changes.

No. Actually bug was intoduced recently in f95db37aa8486304d0569d12a876b1c74ee1b0d1 and affects only 2.x branch.

> It is also present in sssd-1-16 so I think this bz is worth cloning for
> RHEL7.

So it doesn't affect RHEL7.

Comment 12 Pavel Březina 2020-01-14 10:08:02 UTC
* `master`
    * 26e33b1984cce3549df170f58f8221201ad54cfd - util/sss_ptr_hash: fixed double free in sss_ptr_hash_delete_cb()

Comment 16 errata-xmlrpc 2020-04-28 16:56:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1863


Note You need to log in before you can comment on or make changes to this bug.