Bug 1446302 - crash in sssd-kcm due to a race-condition between two concurrent requests
Summary: crash in sssd-kcm due to a race-condition between two concurrent requests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: SSSD Maintainers
QA Contact: Niranjan Mallapadi Raghavender
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-27 15:43 UTC by Jakub Hrozek
Modified: 2020-05-02 18:39 UTC (History)
11 users (show)

Fixed In Version: sssd-1.15.2-35.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 09:06:23 UTC
Target Upstream Version:
mniranja: needinfo-


Attachments (Terms of Use)
sssd_kcm core_backtrace (6.09 KB, text/plain)
2017-06-26 01:50 UTC, Niranjan Mallapadi Raghavender
no flags Details
sssd_secrets core_backtrace (4.83 KB, text/plain)
2017-06-26 01:53 UTC, Niranjan Mallapadi Raghavender
no flags Details
Valgrind Output (114.51 KB, text/plain)
2017-06-27 07:57 UTC, Niranjan Mallapadi Raghavender
no flags Details
valgrind use after free (41.98 KB, text/plain)
2017-06-27 09:19 UTC, Lukas Slebodnik
no flags Details


Links
System ID Priority Status Summary Last Updated
Github SSSD sssd issues 4402 None closed crash in sssd-kcm due to a race-condition between two concurrent requests 2020-06-01 07:39:16 UTC
Red Hat Product Errata RHEA-2017:2294 normal SHIPPED_LIVE sssd bug fix and enhancement update 2017-08-01 12:39:55 UTC

Description Jakub Hrozek 2017-04-27 15:43:03 UTC
This bug is created as a clone of upstream ticket:
https://pagure.io/SSSD/sssd/issue/3372

It seems to be use after free.

```
sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache
```

valgrind error:
```
==15846== 1 errors in context 1 of 1:
==15846== Invalid read of size 4
==15846==    at 0x5BFB6B9: talloc_chunk_from_ptr (talloc.c:452)
==15846==    by 0x5BFB6B9: __talloc_get_name (talloc.c:1486)
==15846==    by 0x5BFB6B9: talloc_check_name (talloc.c:1509)
==15846==    by 0x120F58: kcm_op_queue_add (kcmsrv_op_queue.c:136)
==15846==    by 0x120F58: kcm_op_queue_send (kcmsrv_op_queue.c:223)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)
==15846==    by 0x110856: main (kcm.c:313)
==15846==  Address 0xdccbea0 is 544 bytes inside a block of size 773 free'd
==15846==    at 0x4C2FCC8: free (vg_replace_malloc.c:530)
==15846==    by 0x5C024B3: _tc_free_poolmem (talloc.c:1000)
==15846==    by 0x5C024B3: _tc_free_internal (talloc.c:1141)
==15846==    by 0x5BFAAA7: _tc_free_children_internal (talloc.c:1593)
==15846==    by 0x5BFAAA7: _tc_free_internal (talloc.c:1104)
==15846==    by 0x5BFAAA7: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAAA7: _talloc_free (talloc.c:1716)
==15846==    by 0x59ECFC0: tevent_req_received (tevent_req.c:255)
==15846==    by 0x59ECFD8: tevent_req_destructor (tevent_req.c:107)
==15846==    by 0x5BFAF30: _tc_free_internal (talloc.c:1078)
==15846==    by 0x5BFAF30: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAF30: _talloc_free (talloc.c:1716)
==15846==    by 0x111711: kcm_cmd_request_done (kcmsrv_cmd.c:391)
==15846==    by 0x11E846: kcm_op_get_cache_uuid_list_done (kcmsrv_ops.c:1303)
==15846==    by 0x11B1F7: ccdb_sec_list_done (kcmsrv_ccache_secrets.c:1147)
==15846==    by 0x119DB0: sec_list_done (kcmsrv_ccache_secrets.c:224)
==15846==    by 0x12495A: tcurl_request_done (tev_curl.c:746)
==15846==    by 0x12495A: handle_curlmsg_done (tev_curl.c:234)
==15846==    by 0x12495A: process_curl_activity.isra.0 (tev_curl.c:245)
==15846==    by 0x124E4B: tcurlsock_input_available (tev_curl.c:288)
==15846==  Block was alloc'd at
==15846==    at 0x4C2EB1B: malloc (vg_replace_malloc.c:299)
==15846==    by 0x5BFE03B: __talloc_with_prefix (talloc.c:698)
==15846==    by 0x5BFE03B: _talloc_pool (talloc.c:752)
==15846==    by 0x5BFE03B: _talloc_pooled_object (talloc.c:820)
==15846==    by 0x59ECCAA: _tevent_req_create (tevent_req.c:73)
==15846==    by 0x120C43: kcm_op_queue_send (kcmsrv_op_queue.c:206)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)
```

Part of sssd-kcm.log
```
(Tue Apr 18 08:30:48:243213 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data 0xdcd93f0
(Tue Apr 18 08:30:48:246461 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/1000/ccache/
(Tue Apr 18 08:30:48:247437 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Apr 18 08:30:48:249128 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Apr 18 08:30:48:249732 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Apr 18 08:30:48:253592 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items
(Tue Apr 18 08:30:48:262271 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done
(Tue Apr 18 08:30:48:263827 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Found 2 ccaches
(Tue Apr 18 08:30:48:266425 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Listing all caches done
(Tue Apr 18 08:30:48:269666 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 23cb4886-fa25-41e1-b065-327cc6f565c4
(Tue Apr 18 08:30:48:270029 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 901c7634-98ee-4e56-bd6c-84d6548d0f9f
(Tue Apr 18 08:30:48:270963 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
(Tue Apr 18 08:30:48:272724 2017) [sssd[kcm]] [kcm_op_get_default_ccache_send] (0x1000): Getting client's default ccache
(Tue Apr 18 08:30:48:273393 2017) [sssd[kcm]] [ccdb_sec_get_default_send] (0x2000): Getting the default ccache
(Tue Apr 18 08:30:48:273723 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/1000/default, at socket /var/run/secrets.socket
(Tue Apr 18 08:30:48:273828 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:274587 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply
(Tue Apr 18 08:30:48:274802 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload
(Tue Apr 18 08:30:48:280941 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data (nil)
(Tue Apr 18 08:30:48:281043 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:281121 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:281227 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 4995
(Tue Apr 18 08:30:48:281319 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:282299 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Tue Apr 18 08:30:48:282717 2017) [sssd[kcm]] [kcm_input_parse] (0x1000): Received message with length 4
(Tue Apr 18 08:30:48:282789 2017) [sssd[kcm]] [kcm_get_opt] (0x2000): The client requested operation 20
(Tue Apr 18 08:30:48:282870 2017) [sssd[kcm]] [kcm_cmd_send] (0x0400): KCM operation GET_DEFAULT_CACHE
(Tue Apr 18 08:30:48:282935 2017) [sssd[kcm]] [kcm_cmd_send] (0x1000): 0 bytes on KCM input
(Tue Apr 18 08:30:48:283032 2017) [sssd[kcm]] [kcm_op_queue_send] (0x0200): Adding request by 1000 to the wait queue
(Tue Apr 18 08:30:48:284040 2017) [sssd[kcm]] [talloc_log_fn] (0x0010): Bad talloc magic value - wrong talloc version used/mixed
```

Comment 2 Jakub Hrozek 2017-05-24 14:08:29 UTC
 * master:
    * fb51bb68e62de7bb8542f5d224994eb7143040a6
    * 274489b092bba5fc81cb0f803843d56b267c5aaf

Comment 4 Jakub Hrozek 2017-05-26 11:24:36 UTC
The easiest way to reproduce is to acquire a ticket with kinit:
echo -n password | kinit foo@EXAMPLE.COM

And then run multiple parallel klist commands:
for i in $(seq 1 100); do klist -A; done

before the patch, one of the klists will trigger a segfault in sssd_kcm.

Comment 5 Niranjan Mallapadi Raghavender 2017-06-05 00:13:04 UTC
[root@idm1 sssd]# date;ps -ef | grep sssd_kcm
Mon Jun  5 05:40:17 IST 2017
root     21430 13287  0 05:40 pts/1    00:00:00 grep --color=auto sssd_kcm

06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST@CENTAUR.TEST



[root@idm1 sssd]# date;ps -ef | grep sssd_kcm
Mon Jun  5 05:41:34 IST 2017
root     21456     1  1 05:40 ?        00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root     21728 13287  0 05:41 pts/1    00:00:00 grep --color=auto sssd_kcm
[root@idm1 sssd]#

Comment 6 Lukas Slebodnik 2017-06-05 07:25:03 UTC
"date;ps -ef | grep sssd_kcm" is not a sufficient reproducer.

Comment 7 Lukas Slebodnik 2017-06-05 07:25:55 UTC
You can find more details in upstream ticket.

Comment 8 Niranjan Mallapadi Raghavender 2017-06-23 14:19:15 UTC
versions:
========
sssd-common-pac-1.15.2-50.el7.x86_64
sssd-proxy-1.15.2-50.el7.x86_64
sssd-common-1.15.2-50.el7.x86_64
sssd-ad-1.15.2-50.el7.x86_64
python-sssdconfig-1.15.2-50.el7.noarch
sssd-client-1.15.2-50.el7.x86_64
sssd-krb5-common-1.15.2-50.el7.x86_64
sssd-krb5-1.15.2-50.el7.x86_64
sssd-1.15.2-50.el7.x86_64
sssd-ldap-1.15.2-50.el7.x86_64
sssd-ipa-1.15.2-50.el7.x86_64
sssd-kcm-1.15.2-50.el7.x86_64


sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = centaur.test

[domain/centaur.test]
ad_domain = centaur.test
krb5_realm = CENTAUR.TEST
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
ldap_id_mapping = True
access_provider = ad
enumerate = True
debug_level = 6

[pam]
debug_level = 9

[ssh]

[sudo]

[kcm]
debug_level = 9

Ran the tests in below method:

root      root      2282  2070  0 Jun22 ?        00:00:00 tail -f /var/log/sssd/systemctl.log
root      7043     1  0 16:56 ?        00:00:00 /usr/sbin/sssd -i -f
root      7046  7043  0 16:56 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      7047  7043  0 16:56 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root     14851  7043  0 17:56 ?        00:00:21 /usr/libexec/sssd/sssd_be --domain centaur.test --uid 0 --gid 0 --debug-to-files
root     17756     1  0 19:45 ?        00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root     17762  4301  0 19:45 pts/0    00:00:00 grep --color=auto sssd


1. Create 3 users in AD (foobar1, foobar2, foobar3 with password Secret123)

Open 3 terminals and run the below in parallel

#/bin/bash
export KRB5CCNAME=KCM:
date
echo Secret123 | kinit foobar1
date;for i in $(seq 1 100); do klist -A; done

On terminal 2

export KRB5CCNAME=KCM:
date
echo Secret123 | kinit foobar2
date;for i in $(seq 1 100); do klist -A; done

On terminal 3

export KRB5CCNAME=KCM:
date
echo Secret123 | kinit foobar3
date;for i in $(seq 1 100); do klist -A; done


After running the above in parallel did not see any crash in sssd_kcm 

[root@ipaqavmd sssd]# date ;ps -ef | grep sssd
Fri Jun 23 10:18:13 EDT 2017
root     30734     1  0 09:18 ?        00:00:00 /usr/sbin/sssd -i -f
root     30735 30734  0 09:18 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain juno.test --uid 0 --gid 0 --debug-to-files
root     30736 30734  0 09:18 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root     30737 30734  0 09:18 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root     31325     1 11 10:16 ?        00:00:11 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root     31605     1 17 10:17 ?        00:00:04 /usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-to-files
root     31659 30900  0 10:18 pts/2    00:00:00 grep --color=auto sssd
[root@ipaqavmd sssd]# 

Did not see any crash of sssd_kcm and was responding

Comment 11 Niranjan Mallapadi Raghavender 2017-06-23 22:52:30 UTC

After running the tests multiple times i see this, sssd-secrets crashes and it looks like it restart sssd-kcm while restarting sssd-secrets service.


Initially the pid of sssd_kcm was 3504

root      3452     1  0 18:30 ?        00:00:00 /usr/sbin/sssd -i -f
root      3453  3452  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain juno.test --uid 0 --gid 0 --debug-to-files
root      3476  3452  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      3477  3452  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root      3504     1 10 18:30 ?        00:00:02 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root      3589     1  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-to-files
root      3594  3454  0 18:30 pts/5    00:00:00 grep --color=auto sssd


After running kinit from multiple terminals simultaneously i see below:


Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd[secrets][8549]: Starting up
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: Duplicate: core backtrace
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: DUP_OF_DIR: /var/spool/abrt/ccpp-2017-06-23-18:33:28-7001
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: Deleting problem directory ccpp-2017-06-23-18:37:45-8406 (dup of ccpp-2017-06-23-18:33:28-7001)
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus[634]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus-daemon[634]: dbus[634]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus[634]: [system] Successfully activated service 'org.freedesktop.problems'
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus-daemon[634]: dbus[634]: [system] Successfully activated service 'org.freedesktop.problems'
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd_secrets[8549]: Assertion failure: 0 == rv, at ../../.././nspr/pr/src/pthreads/ptthread.c:1010
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-hook-ccpp[8636]: Process 8549 (sssd_secrets) of user 0 killed by SIGABRT - ignoring (repeated crash)
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-secrets.service: main process exited, code=dumped, status=6/ABRT
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Unit sssd-secrets.service entered failed state.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-secrets.service failed.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Started SSSD Secrets Service responder.
-- Subject: Unit sssd-secrets.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sssd-secrets.service has finished starting up.
-- 
-- The start-up result is done.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Starting SSSD Secrets Service responder...
-- Subject: Unit sssd-secrets.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sssd-secrets.service has begun starting up.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd[secrets][8639]: Starting up
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-hook-ccpp[8661]: Process 3504 (sssd_kcm) of user 0 killed by SIGABRT - dumping core
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-kcm.service: main process exited, code=dumped, status=6/ABRT
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Unit sssd-kcm.service entered failed state.
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-kcm.service failed.
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Started SSSD Kerberos Cache Manager.
-- Subject: Unit sssd-kcm.service has finished start-up


I see there is BZ https://bugzilla.redhat.com/show_bug.cgi?id=1460689 filed for secrets responder crashing , is this crash similar to BZ 1460689, ?

Comment 12 Niranjan Mallapadi Raghavender 2017-06-26 01:49:57 UTC
I see 2 crashes sssd-kcm and sssd-secrets, Attaching the core_backtrace.

Comment 13 Niranjan Mallapadi Raghavender 2017-06-26 01:50:51 UTC
Created attachment 1291805 [details]
sssd_kcm core_backtrace

Comment 14 Niranjan Mallapadi Raghavender 2017-06-26 01:53:40 UTC
Created attachment 1291806 [details]
sssd_secrets core_backtrace

Comment 16 Jakub Hrozek 2017-06-26 07:34:05 UTC
(In reply to Niranjan Mallapadi Raghavender from comment #13)
> Created attachment 1291805 [details]
> sssd_kcm core_backtrace

This is a different backtrace than the original report:
#0  0x00007ff6179621f7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ff6179638e8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007ff6180fbcfc in talloc_abort (reason=0x7ff6181047d8 "Bad talloc magic value - wrong talloc version used/mixed") at ../talloc.c:426
No locals.
#3  0x00007ff6180ff702 in talloc_abort_magic (magic=<optimized out>) at ../talloc.c:434
No locals.
#4  talloc_chunk_from_ptr (ptr=0x55da430952a0) at ../talloc.c:454
        pp = 0x55da430952a0 "`\344\nA\332U"
        tc = 0x55da43095240
#5  __talloc_get_name (ptr=0x55da430952a0) at ../talloc.c:1486
No locals.
#6  _talloc_get_type_abort (ptr=0x55da430952a0, name=name@entry=0x55da410d8675 "struct tevent_req", location=location@entry=0x55da410dd5a0 "src/responder/kcm/kcmsrv_ops.c:178") at ../talloc.c:1543
No locals.
#7  0x000055da410b9858 in kcm_cmd_queue_done (subreq=0x55da430958f0) at src/responder/kcm/kcmsrv_ops.c:178
        req = <optimized out>
        state = <optimized out>
        ret = <optimized out>
        __FUNCTION__ = "kcm_cmd_queue_done"
#8  0x000055da410be367 in kcm_op_queue_entry_destructor (entry=entry@entry=0x55da430991e0) at src/responder/kcm/kcmsrv_op_queue.c:146
        next_entry = <optimized out>
        imm = <optimized out>
#9  0x00007ff6180fb2a0 in _tc_free_internal (location=0x7ff6183139b3 "../tevent_req.c:255", tc=0x55da43099180) at ../talloc.c:1078
        d = <optimized out>
        ptr_to_free = <optimized out>
        ptr = 0x55da430991e0
#10 _tc_free_children_internal (location=0x7ff6183139b3 "../tevent_req.c:255", ptr=0x55da430c50c0, tc=0x55da430c5060) at ../talloc.c:1593
        child = 0x55da430991e0
        new_parent = <optimized out>
#11 _tc_free_internal (tc=0x55da430c5060, location=0x7ff6183139b3 "../tevent_req.c:255") at ../talloc.c:1104

This might be some use-after-free, are you able to reproduce this on will with valgrind as well? (Just running valgrind /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-stderr instead of the service might do)

If this is reproducable on will, with or without valgrind, then I think this bug should fail QE.

Comment 17 Jakub Hrozek 2017-06-26 07:34:57 UTC
(In reply to Niranjan Mallapadi Raghavender from comment #14)
> Created attachment 1291806 [details]
> sssd_secrets core_backtrace

This is https://bugzilla.redhat.com/show_bug.cgi?id=1460689

Comment 19 Niranjan Mallapadi Raghavender 2017-06-27 07:57:10 UTC
Created attachment 1292204 [details]
Valgrind Output

Comment 20 Niranjan Mallapadi Raghavender 2017-06-27 07:57:57 UTC
Ran the scripts kcm1,kcm2, kcm3 in parallel from different terminals. I have attached the valgrind output,

Comment 21 Lukas Slebodnik 2017-06-27 08:02:51 UTC
(In reply to Niranjan Mallapadi Raghavender from comment #19)
> Created attachment 1292204 [details]
> Valgrind Output

That valgrind output shows just leaks caused by improper shutdown.
There is not any use-after free.

I'll try to run kcm* scripts in parallel.

Comment 22 Jakub Hrozek 2017-06-27 08:40:10 UTC
(In reply to Lukas Slebodnik from comment #21)
> (In reply to Niranjan Mallapadi Raghavender from comment #19)
> > Created attachment 1292204 [details]
> > Valgrind Output
> 
> That valgrind output shows just leaks caused by improper shutdown.
> There is not any use-after free.
> 
> I'll try to run kcm* scripts in parallel.

It might be a good idea to patch sssd-secrets with github PR #225, otherwise sssd-secrets will crash quite soon.

(I tried running the scripts in parallel yesterday and only saw sssd-secrets crashes, not any sssd-kcm crash..)

Comment 23 Lukas Slebodnik 2017-06-27 08:43:27 UTC
(In reply to Jakub Hrozek from comment #22)
> (In reply to Lukas Slebodnik from comment #21)
> > (In reply to Niranjan Mallapadi Raghavender from comment #19)
> > > Created attachment 1292204 [details]
> > > Valgrind Output
> > 
> > That valgrind output shows just leaks caused by improper shutdown.
> > There is not any use-after free.
> > 
> > I'll try to run kcm* scripts in parallel.
> 
> It might be a good idea to patch sssd-secrets with github PR #225, otherwise
> sssd-secrets will crash quite soon.
> 
> (I tried running the scripts in parallel yesterday and only saw sssd-secrets
> crashes, not any sssd-kcm crash..)

But that might be partially a reason why sssd-kcm crashed :-)

Comment 24 Lukas Slebodnik 2017-06-27 08:45:19 UTC
(In reply to Jakub Hrozek from comment #22)
> (In reply to Lukas Slebodnik from comment #21)
> > (In reply to Niranjan Mallapadi Raghavender from comment #19)
> > > Created attachment 1292204 [details]
> > > Valgrind Output
> > 
> > That valgrind output shows just leaks caused by improper shutdown.
> > There is not any use-after free.
> > 
> > I'll try to run kcm* scripts in parallel.
> 
> It might be a good idea to patch sssd-secrets with github PR #225, otherwise
> sssd-secrets will crash quite soon.
> 
> (I tried running the scripts in parallel yesterday and only saw sssd-secrets
> crashes, not any sssd-kcm crash..)

But that might be partially a reason why sssd-kcm crashed :-)

I somehow managed to reproduce crash without valgrind

sh# abrt-cli ls
id 141e0bf99a5fe6cc79eb66a2384d241496e59b42
reason:         sssd_kcm killed by SIGSEGV
time:           Tue 27 Jun 2017 04:30:44 AM EDT
cmdline:        /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
package:        sssd-kcm-1.15.2-50.el7
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/ccpp-2017-06-27-04:30:44-16530
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-06-27-04:30:44-16530' for creating a case in Red Hat Customer Portal

And related part of sssd-kcm log file
(Tue Jun 27 04:30:41:055538 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55e575847fc0
(Tue Jun 27 04:30:41:055557 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/0/ccache/
(Tue Jun 27 04:30:41:055580 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Jun 27 04:30:41:055597 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 04:30:41:055618 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 04:30:41:055643 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items
(Tue Jun 27 04:30:41:055669 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done
(Tue Jun 27 04:30:41:055687 2017) [sssd[kcm]] [sec_get_send] (0x2000): Retrieving ccache 306b717b-899a-459d-96ce-30e9d20f0263-0:6652
(Tue Jun 27 04:30:41:055727 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/0/ccache/306b717b-899a-459d-96ce-30e9d20f0263-0:6652
, at socket /var/run/secrets.socket
(Tue Jun 27 04:30:41:055748 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 1
(Tue Jun 27 04:30:41:056919 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data (nil)
(Tue Jun 27 04:30:41:056955 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 5000
(Tue Jun 27 04:30:41:056972 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Jun 27 04:30:43:971912 2017) [sssd[kcm]] [responder_idle_handler] (0x2000): Terminating idle responder [0x55e57582d6a0]
(Tue Jun 27 04:30:43:972009 2017) [sssd[kcm]] [kcm_responder_ctx_destructor] (0x0400): Responder is being shut down
(Tue Jun 27 04:30:43:972065 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55e5758725d0][17]
(Tue Jun 27 04:30:43:972131 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55e575872240][16]
(Tue Jun 27 04:30:43:972154 2017) [sssd[kcm]] [tcurl_request_destructor] (0x0080): Terminating TCURL request...
(Tue Jun 27 04:30:43:972235 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55e575847b20
(Tue Jun 27 04:30:43:972256 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 04:30:43:972273 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 04:30:44:195682 2017) [sssd[kcm]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb


I will try to reproduce with valgrind.

Comment 25 Lukas Slebodnik 2017-06-27 09:16:54 UTC
I do not think that sssd-kcm crash is related because there is 30 seconds delay between them

And maybe crash in sssd-secrets is really related.

 Jun 27 04:30:25 host.example.com sssd[secrets][26160]: Starting up
Jun 27 04:30:41 host.example.com sssd_secrets[26160]: Assertion failure: 0 == rv, at ../../.././nspr/pr/src/pthreads/ptthread.c:1010
Jun 27 04:30:41 host.example.com abrt-hook-ccpp[26803]: Process 26160 (sssd_secrets) of user 0 killed by SIGABRT - ignoring (repeated crash)
Jun 27 04:30:41 host.example.com systemd[1]: sssd-secrets.service: main process exited, code=dumped, status=6/ABRT
Jun 27 04:30:41 host.example.com systemd[1]: Unit sssd-secrets.service entered failed state.
Jun 27 04:30:41 host.example.com systemd[1]: sssd-secrets.service failed.
Jun 27 04:30:41 host.example.com systemd[1]: Started SSSD Secrets Service responder.
Jun 27 04:30:41 host.example.com systemd[1]: Starting SSSD Secrets Service responder...
Jun 27 04:30:41 host.example.com sssd[secrets][26805]: Starting up
Jun 27 04:30:43 host.example.com kernel: traps: sssd_kcm[16530] general protection ip:7f1d8797fa76 sp:7ffc2757b0a0 error:0 in libtalloc.so.2.1.9[7f1d8797c000+f000]
Jun 27 04:30:44 host.example.com abrt-hook-ccpp[26806]: Process 16530 (sssd_kcm) of user 0 killed by SIGSEGV - dumping core
Jun 27 04:30:44 host.example.com systemd[1]: sssd-kcm.service: main process exited, code=dumped, status=11/SEGV
Jun 27 04:30:44 host.example.com systemd[1]: Unit sssd-kcm.service entered failed state.
Jun 27 04:30:44 host.example.com systemd[1]: sssd-kcm.service failed.


(Tue Jun 27 05:09:17:541498 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 15 socket data 0xdbce670
(Tue Jun 27 05:09:17:541710 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/0/ccache/28654582-0eb6-48d0-b4f3-55c9845e395c-0:6652
(Tue Jun 27 05:09:17:541974 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Jun 27 05:09:17:542143 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 05:09:17:542303 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 05:09:17:546962 2017) [sssd[kcm]] [sec_get_done] (0x2000): GET done
(Tue Jun 27 05:09:17:547220 2017) [sssd[kcm]] [ccdb_sec_getbyname_done] (0x2000): Got ccache by UUID
(Tue Jun 27 05:09:17:547577 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success
(Tue Jun 27 05:09:17:547818 2017) [sssd[kcm]] [kcm_op_get_kdc_offset_send] (0x1000): Requested offset for principal 0:6652
(Tue Jun 27 05:09:17:548045 2017) [sssd[kcm]] [ccdb_sec_getbyname_send] (0x2000): Getting ccache by name
(Tue Jun 27 05:09:17:548218 2017) [sssd[kcm]] [sec_list_send] (0x2000): Listing all ccaches in the secrets store
(Tue Jun 27 05:09:17:548513 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/0/ccache/, at socket /var/run/secrets.socket
(Tue Jun 27 05:09:17:548719 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 1
(Tue Jun 27 05:09:17:548999 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply
(Tue Jun 27 05:09:17:549166 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload
(Tue Jun 27 05:09:17:549415 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Tue Jun 27 05:09:17:550251 2017) [sssd[kcm]] [responder_idle_handler] (0x2000): Terminating idle responder [0xd9e31d0]
(Tue Jun 27 05:09:17:550879 2017) [sssd[kcm]] [kcm_responder_ctx_destructor] (0x0400): Responder is being shut down
(Tue Jun 27 05:09:17:551630 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x163c9af0][18]
(Tue Jun 27 05:09:17:552104 2017) [sssd[kcm]] [tcurl_request_destructor] (0x0080): Terminating TCURL request...
(Tue Jun 27 05:09:17:552602 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 05:09:17:552799 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 05:09:26:512198 2017) [sssd[kcm]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb

I will attach valgrind log in another comment

Comment 26 Lukas Slebodnik 2017-06-27 09:19:23 UTC
Created attachment 1292228 [details]
valgrind use after free

Comment 27 Jakub Hrozek 2017-06-27 12:27:09 UTC
Thank you for the analysis. 

What I find strange is that the free that triggered the use-after-free was caused by:
==00:00:07:37.734 30982==    by 0x12B301: responder_idle_handler (responder_common.c:378)
==00:00:07:37.734 30982==    by 0x997BC96: tevent_common_loop_timer_delay (tevent_timed.c:369)

I don't know if this is caused by the sssd-secrets crash or not. But I'm quite confident that this kcm crash is not related to the original report of concurrency issue in the kcm_op_quueue. In my opinion this bug should be marked as VERIFIED and another one should be opened for the new issue.

Comment 28 Niranjan Mallapadi Raghavender 2017-06-27 12:51:02 UTC
Based on the above comments, Marking it as verified

Comment 29 errata-xmlrpc 2017-08-01 09:06:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2294

Comment 30 Mike G 2017-09-28 11:59:35 UTC
I just updated to the patches specified here and am still facing this same issue. The patched version does not resolve the service crash.

Thanks

Comment 31 Jakub Hrozek 2017-09-28 13:15:02 UTC
(In reply to Mike G from comment #30)
> I just updated to the patches specified here and am still facing this same
> issue. The patched version does not resolve the service crash.
> 
> Thanks

Please open a new bugzilla and attach the core files and the debug logs, please.

Comment 32 Lukas Slebodnik 2017-10-02 11:29:13 UTC
(In reply to Mike G from comment #30)
> I just updated to the patches specified here and am still facing this same
> issue. The patched version does not resolve the service crash.
> 
You probably hit a different bug with similar symptoms https://bugzilla.redhat.com/show_bug.cgi?id=1465457


Note You need to log in before you can comment on or make changes to this bug.