Bug 1783190
| Summary: | [abrt] [faf] sssd: raise(): /usr/libexec/sssd/sssd_autofs killed by 6 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Steeve Goveas <sgoveas> |
| Component: | sssd | Assignee: | Alexey Tikhonov <atikhono> |
| Status: | CLOSED ERRATA | QA Contact: | sssd-qe <sssd-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | atikhono, grajaiya, jhrozek, lslebodn, mniranja, mzidek, pbrezina, tscherf |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/fa78a2abc202ce1ff059042775915deb61d9e069/ | ||
| Whiteboard: | sync-to-jira | ||
| Fixed In Version: | sssd-2.2.3-9.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-28 16:56:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Steeve Goveas
2019-12-13 09:20:56 UTC
...
#3 0x00007fad87794d91 in talloc_abort_unknown_value () at ../../talloc.c:534
#4 talloc_chunk_from_ptr (ptr=0x55977819ea00) at ../../talloc.c:534
#5 _talloc_free (ptr=0x55977819ea00, location=0x7fad8aa104af "src/util/sss_ptr_hash.c:163") at ../../talloc.c:1767
#6 0x00007fad8a9f55ea in sss_ptr_hash_delete_cb (item=0x559778196f70, deltype=HASH_ENTRY_DESTROY, pvt=<optimized out>)
at src/util/sss_ptr_hash.c:163
#7 0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
#8 0x00007fad8a9f5d46 in sss_ptr_hash_delete (free_value=false, key=0x55977819e980 "auto.master", table=0x55977818f4c0)
at src/util/sss_ptr_hash.c:348
#9 sss_ptr_hash_delete (table=0x55977818f4c0, key=0x55977819e980 "auto.master", free_value=<optimized out>)
at src/util/sss_ptr_hash.c:323
#10 0x00007fad879b33b9 in tevent_common_invoke_timer_handler (te=te@entry=0x5597781989d0, current_time=...,
removed=removed@entry=0x0) at ../../tevent_timed.c:370
#11 0x00007fad879b355e in tevent_common_loop_timer_delay (ev=0x559778164b00) at ../../tevent_timed.c:442
#12 0x00007fad879b4a99 in epoll_event_loop (tvalp=0x7ffce9a9bab0, epoll_ev=0x559778164d90) at ../../tevent_epoll.c:667
...
(gdb) frame 7
#7 0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
1077 hdelete_callback(table, HASH_ENTRY_DESTROY, &element->entry);
(gdb) p *key
$3 = {type = HASH_KEY_STRING, {str = 0x55977819e980 "auto.master"...}}
(gdb) p *((struct talloc_chunk *)((char *)element->entry.value.ptr - 96))
$1 = {flags = 2014977200, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, destructor = 0x0,
name = 0x7fad8aa104af "src/util/sss_ptr_hash.c:163", size = 16...}
[where 96 == TC_HDR_SIZE == TC_ALIGN16(sizeof(struct talloc_chunk))]
=> libdhash tries to delete entry with key=="auto.master" which was already freed with `sss_ptr_hash_delete_cb()`
so when it get to `sss_ptr_hash_delete_cb()` again in frame #6
#6 0x00007fad8a9f55ea in sss_ptr_hash_delete_cb (item=0x559778196f70, deltype=HASH_ENTRY_DESTROY, pvt=<optimized out>)
at src/util/sss_ptr_hash.c:163
163 talloc_free(value);
-- it is double free => raise
(gdb) frame 11
#11 0x00007fad879b355e in tevent_common_loop_timer_delay (ev=0x559778164b00) at ../../tevent_timed.c:442
442 ret = tevent_common_invoke_timer_handler(te, current_time, NULL);
(gdb) p te
$4 = (struct tevent_timer *) 0x5597781989d0
(gdb) p ev->timer_events
$5 = (struct tevent_timer *) 0x55977817ee40
but https://git.samba.org/samba.git/?p=samba.git;a=blob;f=lib/tevent/tevent_timed.c#l409 :
struct tevent_timer *te = ev->timer_events;
And the moment it is not clear to me how could it happen that te != ev->timer_events
(gdb) p *te
$6 = {prev = 0x0, next = 0x0, event_ctx = 0x0, wrapper = 0x0, busy = true, destroyed = true, next_event = {tv_sec = 1575917185,
tv_usec = 379978}, handler = 0x559777be26e0 <autofs_enumctx_lifetime_timeout>, private_data = 0x559778195510,
handler_name = 0x559777bf9e38 "autofs_enumctx_lifetime_timeout", location = 0x559777bf9e08 "src/responder/autofs/autofssrv_cmd.c:167"...}
(gdb) p *((struct autofs_enum_ctx *)(te->private_data))
$10 = {result = 0x55977819fe30, found = true, ready = true, key = 0x55977819e980 "auto.master", table = 0x55977818f4c0, notify_list = 0x0}
(gdb) p *((struct talloc_chunk *)((char *)te - 96))
$7 = {flags = 6406304, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0,
destructor = 0x7fad879b2c60 <tevent_common_timed_destructor>, name = 0x7fad879b5e33 "struct tevent_timer", size = 96, limit = 0x0, pool = 0x0}
=> this `te` was already freed and most probably executed before
What should actually be executed here is:
(gdb) p *ev->timer_events
$8 = {prev = 0x559778198520, next = 0x559778198520, event_ctx = 0x559778164b00, wrapper = 0x0, busy = false, destroyed = false, next_event = {
tv_sec = 1575917185, tv_usec = 406319}, handler = 0x559777be26e0 <autofs_enumctx_lifetime_timeout>, private_data = 0x559778186800,
handler_name = 0x559777bf9e38 "autofs_enumctx_lifetime_timeout", location = 0x559777bf9e08 "src/responder/autofs/autofssrv_cmd.c:167",...}
(gdb) p *((struct autofs_enum_ctx *)(ev->timer_events->private_data))
$9 = {result = 0x5597781a64c0, found = true, ready = true, key = 0x5597781a1530 "auto.direct", table = 0x55977818f4c0, notify_list = 0x0}
And also strange thing is:
(gdb) frame
#7 0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077
1077 hdelete_callback(table, HASH_ENTRY_DESTROY, &element->entry);
(gdb) p *key
$61 = {type = HASH_KEY_STRING, {str = 0x55977819e980 "auto.master", c_str = 0x55977819e980 "auto.master", ul = 94109043386752}}
(gdb) p *table
$53 = {p = 0, maxp = 4, entry_count = 1, bucket_count = 4, segment_count = 1, min_load_factor = 1, max_load_factor = 5, directory_size = 4,
directory_size_shift = 2, segment_size = 4, segment_size_shift = 2, delete_callback = 0x7fad8a9f5550 <sss_ptr_hash_delete_cb>..., statistics = {hash_accesses = 12, hash_collisions = 0, table_expansions = 0, table_contractions = 0}}
-- entry_count = 1
And this single entry is:
(gdb) p *table->directory[0][1]
$49 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x5597781a0d00 "auto.direct"...}},
value = {type = HASH_VALUE_PTR, {ptr = 0x5597781a14b0...}}}, next = 0x0}
how is it possible
(gdb) p *element
$56 = {entry = {key = {type = HASH_KEY_STRING, {str = 0x55977818bf70 "auto.master"...}},
value = {type = HASH_VALUE_PTR, {ptr = 0x55977819ea00...}}}, next = 0x0}
So at this moment I have two questions:
(1) how does it happen libtevent tries to execute already destroyed timer event instead of event contained in tevent_context?
(2) even so, how does libdhash find entry with key=="auto.master" if it was already removed from the hash table?
Steeve, do you have a reproducer?
(In reply to Alexey Tikhonov from comment #1) > (gdb) p *te > $6 = {prev = 0x0, next = 0x0, event_ctx = 0x0, wrapper = 0x0, busy = true, > destroyed = true, next_event = {tv_sec = 1575917185, tv_usec = 379978}, handler = 0x559777be26e0 > <autofs_enumctx_lifetime_timeout>, private_data = 0x559778195510, > handler_name = 0x559777bf9e38 "autofs_enumctx_lifetime_timeout", location > = 0x559777bf9e08 "src/responder/autofs/autofssrv_cmd.c:167"...} > (gdb) p *((struct talloc_chunk *)((char *)te - 96)) > $7 = {flags = 6406304, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, > refs = 0x0, destructor = 0x7fad879b2c60 <tevent_common_timed_destructor>, name = > 0x7fad879b5e33 "struct tevent_timer", size = 96, limit = 0x0, pool = 0x0} > > => this `te` was already freed and most probably executed before Actually event was destroyed (destroyed=true) but not yet freed (name in talloc_chunk header still has type name) - d-tor failed due to busy=true. (In reply to Alexey Tikhonov from comment #1) > And the moment it is not clear to me how could it happen that te != > ev->timer_events This is actually clear: `tevent_common_invoke_timer_handler()` calls `DLIST_REMOVE(te->event_ctx->timer_events, te)` so `te` that is being processed was removed from queue (located in the heap) already. > (1) how does it happen libtevent tries to execute already destroyed timer > event instead of event contained in tevent_context? So this statement is just wrong. (In reply to Alexey Tikhonov from comment #1) > (2) even so, how does libdhash find entry with key=="auto.master" if it was > already removed from the hash table? This is something I still do not understand: #7 0x00007fad87bbcd7d in hash_delete (table=table@entry=0x55977818f4c0, key=key@entry=0x7ffce9a9b9d0) at dhash/dhash.c:1077 1077 hdelete_callback(table, HASH_ENTRY_DESTROY, &element->entry); sss_ptr_hash_delete_cb(element->entry) -> autofs_maps_delete_cb() -> talloc_unlink(autofs_ctx->maps, enum_ctx = element->entry->value.ptr->ptr) -- enum_ctx doesn't have talloc d-tor _talloc_free(element->entry->value.ptr) -> ... abort -- this execution chain looks like it doesn't touch `element` and `element->entry.key.str` but: (gdb) p element->entry.key.str $78 = 0x55977818bf70 "auto.master" (gdb) p *((struct talloc_chunk *)((char *)element->entry.key.str - 96)) $79 = {flags = 2014964112, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, destructor = 0x0, name = 0x7fad8aa0a7a0 "src/util/util.c:379", size = 12, limit = 0x0, pool = 0x0} and (gdb) p *((struct talloc_chunk *)((char *)element - 96)) $75 = {flags = 3927370867, next = 0x0, prev = 0x0, parent = 0x0, child = 0x0, refs = 0x0, destructor = 0x0, name = 0x7fad8aa0a7a0 "src/util/util.c:379", size = 40, limit = 0x0, pool = 0x0} https://pagure.io/SSSD/sssd/blob/master/f/src/util/util.c#_379 : `talloc_free(ptr)` -- those elements are freed in the heap. And hash `table` contained in the heap on the momemnt of crash doesn't contain entry with "auto.master" as well. So looks like element/element->entry.key/element->entry.value are deleted and removed from the table after `hash_delete()->lookup()` found it in hash table but before `hdelete_callback()` returned... (In reply to Alexey Tikhonov from comment #6) > So looks like element/element->entry.key/element->entry.value are deleted > and removed from the table after `hash_delete()->lookup()` found it in hash > table but before `hdelete_callback()` returned... Exactly. Thanks to Pavel's hint that `sss_ptr_hash_spy` is linked to memory context of actually stored data, I figured out what is happening: hash_delete() -> hdelete_callback [sss_ptr_hash_delete_cb] -> data->callback [autofs_maps_delete_cb] -> talloc_unlink(autofs_ctx->maps, enum_ctx) -> spy is freed -> sss_ptr_hash_spy_destructor() -> sss_ptr_hash_delete() -> hash_delete() called recursively and frees element, its key and value -> stack unwinds back to sss_ptr_hash_delete_cb::talloc_free(value) -- double free I guess this is quite old bug in `sss_ptr_hash` that was only revealed by recent autofs changes. It is also present in sssd-1-16 so I think this bz is worth cloning for RHEL7. (In reply to Alexey Tikhonov from comment #7) > I guess this is quite old bug in `sss_ptr_hash` that was only revealed by > recent autofs changes. No. Actually bug was intoduced recently in f95db37aa8486304d0569d12a876b1c74ee1b0d1 and affects only 2.x branch. > It is also present in sssd-1-16 so I think this bz is worth cloning for > RHEL7. So it doesn't affect RHEL7. * `master`
* 26e33b1984cce3549df170f58f8221201ad54cfd - util/sss_ptr_hash: fixed double free in sss_ptr_hash_delete_cb()
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1863 |