1446302 – crash in sssd-kcm due to a race-condition between two concurrent requests

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1446302 - crash in sssd-kcm due to a race-condition between two concurrent requests

Summary: crash in sssd-kcm due to a race-condition between two concurrent requests

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	SSSD Maintainers
QA Contact:	Niranjan Mallapadi Raghavender
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-27 15:43 UTC by Jakub Hrozek
Modified:	2020-05-02 18:39 UTC (History)
CC List:	11 users (show)
Fixed In Version:	sssd-1.15.2-35.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 09:06:23 UTC
Target Upstream Version:
Embargoed:
Flags:	mniranja: needinfo-

Attachments	(Terms of Use)
sssd_kcm core_backtrace (6.09 KB, text/plain) 2017-06-26 01:50 UTC, Niranjan Mallapadi Raghavender	no flags	Details
sssd_secrets core_backtrace (4.83 KB, text/plain) 2017-06-26 01:53 UTC, Niranjan Mallapadi Raghavender	no flags	Details
Valgrind Output (114.51 KB, text/plain) 2017-06-27 07:57 UTC, Niranjan Mallapadi Raghavender	no flags	Details
valgrind use after free (41.98 KB, text/plain) 2017-06-27 09:19 UTC, Lukas Slebodnik	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	SSSD sssd issues 4402	0	None	closed	crash in sssd-kcm due to a race-condition between two concurrent requests	2020-12-04 05:28:09 UTC
Red Hat Product Errata	RHEA-2017:2294	0	normal	SHIPPED_LIVE	sssd bug fix and enhancement update	2017-08-01 12:39:55 UTC

Description Jakub Hrozek 2017-04-27 15:43:03 UTC

This bug is created as a clone of upstream ticket:
https://pagure.io/SSSD/sssd/issue/3372

It seems to be use after free.

```
sh$ kinit lslebodn
kinit: Credentials cache I/O operation failed while getting default ccache
```

valgrind error:
```
==15846== 1 errors in context 1 of 1:
==15846== Invalid read of size 4
==15846==    at 0x5BFB6B9: talloc_chunk_from_ptr (talloc.c:452)
==15846==    by 0x5BFB6B9: __talloc_get_name (talloc.c:1486)
==15846==    by 0x5BFB6B9: talloc_check_name (talloc.c:1509)
==15846==    by 0x120F58: kcm_op_queue_add (kcmsrv_op_queue.c:136)
==15846==    by 0x120F58: kcm_op_queue_send (kcmsrv_op_queue.c:223)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)
==15846==    by 0x110856: main (kcm.c:313)
==15846==  Address 0xdccbea0 is 544 bytes inside a block of size 773 free'd
==15846==    at 0x4C2FCC8: free (vg_replace_malloc.c:530)
==15846==    by 0x5C024B3: _tc_free_poolmem (talloc.c:1000)
==15846==    by 0x5C024B3: _tc_free_internal (talloc.c:1141)
==15846==    by 0x5BFAAA7: _tc_free_children_internal (talloc.c:1593)
==15846==    by 0x5BFAAA7: _tc_free_internal (talloc.c:1104)
==15846==    by 0x5BFAAA7: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAAA7: _talloc_free (talloc.c:1716)
==15846==    by 0x59ECFC0: tevent_req_received (tevent_req.c:255)
==15846==    by 0x59ECFD8: tevent_req_destructor (tevent_req.c:107)
==15846==    by 0x5BFAF30: _tc_free_internal (talloc.c:1078)
==15846==    by 0x5BFAF30: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAF30: _talloc_free (talloc.c:1716)
==15846==    by 0x111711: kcm_cmd_request_done (kcmsrv_cmd.c:391)
==15846==    by 0x11E846: kcm_op_get_cache_uuid_list_done (kcmsrv_ops.c:1303)
==15846==    by 0x11B1F7: ccdb_sec_list_done (kcmsrv_ccache_secrets.c:1147)
==15846==    by 0x119DB0: sec_list_done (kcmsrv_ccache_secrets.c:224)
==15846==    by 0x12495A: tcurl_request_done (tev_curl.c:746)
==15846==    by 0x12495A: handle_curlmsg_done (tev_curl.c:234)
==15846==    by 0x12495A: process_curl_activity.isra.0 (tev_curl.c:245)
==15846==    by 0x124E4B: tcurlsock_input_available (tev_curl.c:288)
==15846==  Block was alloc'd at
==15846==    at 0x4C2EB1B: malloc (vg_replace_malloc.c:299)
==15846==    by 0x5BFE03B: __talloc_with_prefix (talloc.c:698)
==15846==    by 0x5BFE03B: _talloc_pool (talloc.c:752)
==15846==    by 0x5BFE03B: _talloc_pooled_object (talloc.c:820)
==15846==    by 0x59ECCAA: _tevent_req_create (tevent_req.c:73)
==15846==    by 0x120C43: kcm_op_queue_send (kcmsrv_op_queue.c:206)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)
```

Part of sssd-kcm.log
```
(Tue Apr 18 08:30:48:243213 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data 0xdcd93f0
(Tue Apr 18 08:30:48:246461 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/1000/ccache/
(Tue Apr 18 08:30:48:247437 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Apr 18 08:30:48:249128 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Apr 18 08:30:48:249732 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Apr 18 08:30:48:253592 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items
(Tue Apr 18 08:30:48:262271 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done
(Tue Apr 18 08:30:48:263827 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Found 2 ccaches
(Tue Apr 18 08:30:48:266425 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Listing all caches done
(Tue Apr 18 08:30:48:269666 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 23cb4886-fa25-41e1-b065-327cc6f565c4
(Tue Apr 18 08:30:48:270029 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 901c7634-98ee-4e56-bd6c-84d6548d0f9f
(Tue Apr 18 08:30:48:270963 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
(Tue Apr 18 08:30:48:272724 2017) [sssd[kcm]] [kcm_op_get_default_ccache_send] (0x1000): Getting client's default ccache
(Tue Apr 18 08:30:48:273393 2017) [sssd[kcm]] [ccdb_sec_get_default_send] (0x2000): Getting the default ccache
(Tue Apr 18 08:30:48:273723 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/1000/default, at socket /var/run/secrets.socket
(Tue Apr 18 08:30:48:273828 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:274587 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply
(Tue Apr 18 08:30:48:274802 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload
(Tue Apr 18 08:30:48:280941 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data (nil)
(Tue Apr 18 08:30:48:281043 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:281121 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:281227 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 4995
(Tue Apr 18 08:30:48:281319 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:282299 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Tue Apr 18 08:30:48:282717 2017) [sssd[kcm]] [kcm_input_parse] (0x1000): Received message with length 4
(Tue Apr 18 08:30:48:282789 2017) [sssd[kcm]] [kcm_get_opt] (0x2000): The client requested operation 20
(Tue Apr 18 08:30:48:282870 2017) [sssd[kcm]] [kcm_cmd_send] (0x0400): KCM operation GET_DEFAULT_CACHE
(Tue Apr 18 08:30:48:282935 2017) [sssd[kcm]] [kcm_cmd_send] (0x1000): 0 bytes on KCM input
(Tue Apr 18 08:30:48:283032 2017) [sssd[kcm]] [kcm_op_queue_send] (0x0200): Adding request by 1000 to the wait queue
(Tue Apr 18 08:30:48:284040 2017) [sssd[kcm]] [talloc_log_fn] (0x0010): Bad talloc magic value - wrong talloc version used/mixed
```

Comment 2 Jakub Hrozek 2017-05-24 14:08:29 UTC

 * master:
    * fb51bb68e62de7bb8542f5d224994eb7143040a6
    * 274489b092bba5fc81cb0f803843d56b267c5aaf

Comment 4 Jakub Hrozek 2017-05-26 11:24:36 UTC

The easiest way to reproduce is to acquire a ticket with kinit:
echo -n password | kinit foo

And then run multiple parallel klist commands:
for i in $(seq 1 100); do klist -A; done

before the patch, one of the klists will trigger a segfault in sssd_kcm.

Comment 5 Niranjan Mallapadi Raghavender 2017-06-05 00:13:04 UTC

[root@idm1 sssd]# date;ps -ef | grep sssd_kcm
Mon Jun  5 05:40:17 IST 2017
root     21430 13287  0 05:40 pts/1    00:00:00 grep --color=auto sssd_kcm

06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST
06/05/2017 05:10:55  06/05/2017 06:10:53  krbtgt/CENTAUR.TEST



[root@idm1 sssd]# date;ps -ef | grep sssd_kcm
Mon Jun  5 05:41:34 IST 2017
root     21456     1  1 05:40 ?        00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root     21728 13287  0 05:41 pts/1    00:00:00 grep --color=auto sssd_kcm
[root@idm1 sssd]#

Comment 6 Lukas Slebodnik 2017-06-05 07:25:03 UTC

"date;ps -ef | grep sssd_kcm" is not a sufficient reproducer.

Comment 7 Lukas Slebodnik 2017-06-05 07:25:55 UTC

You can find more details in upstream ticket.

Comment 8 Niranjan Mallapadi Raghavender 2017-06-23 14:19:15 UTC

versions:
========
sssd-common-pac-1.15.2-50.el7.x86_64
sssd-proxy-1.15.2-50.el7.x86_64
sssd-common-1.15.2-50.el7.x86_64
sssd-ad-1.15.2-50.el7.x86_64
python-sssdconfig-1.15.2-50.el7.noarch
sssd-client-1.15.2-50.el7.x86_64
sssd-krb5-common-1.15.2-50.el7.x86_64
sssd-krb5-1.15.2-50.el7.x86_64
sssd-1.15.2-50.el7.x86_64
sssd-ldap-1.15.2-50.el7.x86_64
sssd-ipa-1.15.2-50.el7.x86_64
sssd-kcm-1.15.2-50.el7.x86_64


sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = centaur.test

[domain/centaur.test]
ad_domain = centaur.test
krb5_realm = CENTAUR.TEST
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
ldap_id_mapping = True
access_provider = ad
enumerate = True
debug_level = 6

[pam]
debug_level = 9

[ssh]

[sudo]

[kcm]
debug_level = 9

Ran the tests in below method:

root      root      2282  2070  0 Jun22 ?        00:00:00 tail -f /var/log/sssd/systemctl.log
root      7043     1  0 16:56 ?        00:00:00 /usr/sbin/sssd -i -f
root      7046  7043  0 16:56 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      7047  7043  0 16:56 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root     14851  7043  0 17:56 ?        00:00:21 /usr/libexec/sssd/sssd_be --domain centaur.test --uid 0 --gid 0 --debug-to-files
root     17756     1  0 19:45 ?        00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root     17762  4301  0 19:45 pts/0    00:00:00 grep --color=auto sssd


1. Create 3 users in AD (foobar1, foobar2, foobar3 with password Secret123)

Open 3 terminals and run the below in parallel

#/bin/bash
export KRB5CCNAME=KCM:
date
echo Secret123 | kinit foobar1
date;for i in $(seq 1 100); do klist -A; done

On terminal 2

export KRB5CCNAME=KCM:
date
echo Secret123 | kinit foobar2
date;for i in $(seq 1 100); do klist -A; done

On terminal 3

export KRB5CCNAME=KCM:
date
echo Secret123 | kinit foobar3
date;for i in $(seq 1 100); do klist -A; done


After running the above in parallel did not see any crash in sssd_kcm 

[root@ipaqavmd sssd]# date ;ps -ef | grep sssd
Fri Jun 23 10:18:13 EDT 2017
root     30734     1  0 09:18 ?        00:00:00 /usr/sbin/sssd -i -f
root     30735 30734  0 09:18 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain juno.test --uid 0 --gid 0 --debug-to-files
root     30736 30734  0 09:18 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root     30737 30734  0 09:18 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root     31325     1 11 10:16 ?        00:00:11 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root     31605     1 17 10:17 ?        00:00:04 /usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-to-files
root     31659 30900  0 10:18 pts/2    00:00:00 grep --color=auto sssd
[root@ipaqavmd sssd]# 

Did not see any crash of sssd_kcm and was responding

Comment 11 Niranjan Mallapadi Raghavender 2017-06-23 22:52:30 UTC


After running the tests multiple times i see this, sssd-secrets crashes and it looks like it restart sssd-kcm while restarting sssd-secrets service.


Initially the pid of sssd_kcm was 3504

root      3452     1  0 18:30 ?        00:00:00 /usr/sbin/sssd -i -f
root      3453  3452  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain juno.test --uid 0 --gid 0 --debug-to-files
root      3476  3452  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      3477  3452  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root      3504     1 10 18:30 ?        00:00:02 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
root      3589     1  0 18:30 ?        00:00:00 /usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-to-files
root      3594  3454  0 18:30 pts/5    00:00:00 grep --color=auto sssd


After running kinit from multiple terminals simultaneously i see below:


Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd[secrets][8549]: Starting up
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: Duplicate: core backtrace
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: DUP_OF_DIR: /var/spool/abrt/ccpp-2017-06-23-18:33:28-7001
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: Deleting problem directory ccpp-2017-06-23-18:37:45-8406 (dup of ccpp-2017-06-23-18:33:28-7001)
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus[634]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus-daemon[634]: dbus[634]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus[634]: [system] Successfully activated service 'org.freedesktop.problems'
Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus-daemon[634]: dbus[634]: [system] Successfully activated service 'org.freedesktop.problems'
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd_secrets[8549]: Assertion failure: 0 == rv, at ../../.././nspr/pr/src/pthreads/ptthread.c:1010
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-hook-ccpp[8636]: Process 8549 (sssd_secrets) of user 0 killed by SIGABRT - ignoring (repeated crash)
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-secrets.service: main process exited, code=dumped, status=6/ABRT
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Unit sssd-secrets.service entered failed state.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-secrets.service failed.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Started SSSD Secrets Service responder.
-- Subject: Unit sssd-secrets.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sssd-secrets.service has finished starting up.
-- 
-- The start-up result is done.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Starting SSSD Secrets Service responder...
-- Subject: Unit sssd-secrets.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sssd-secrets.service has begun starting up.
Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd[secrets][8639]: Starting up
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-hook-ccpp[8661]: Process 3504 (sssd_kcm) of user 0 killed by SIGABRT - dumping core
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-kcm.service: main process exited, code=dumped, status=6/ABRT
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Unit sssd-kcm.service entered failed state.
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-kcm.service failed.
Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Started SSSD Kerberos Cache Manager.
-- Subject: Unit sssd-kcm.service has finished start-up


I see there is BZ https://bugzilla.redhat.com/show_bug.cgi?id=1460689 filed for secrets responder crashing , is this crash similar to BZ 1460689, ?

Comment 12 Niranjan Mallapadi Raghavender 2017-06-26 01:49:57 UTC

I see 2 crashes sssd-kcm and sssd-secrets, Attaching the core_backtrace.

Comment 13 Niranjan Mallapadi Raghavender 2017-06-26 01:50:51 UTC

Created attachment 1291805 [details]
sssd_kcm core_backtrace

Comment 14 Niranjan Mallapadi Raghavender 2017-06-26 01:53:40 UTC

Created attachment 1291806 [details]
sssd_secrets core_backtrace

Comment 16 Jakub Hrozek 2017-06-26 07:34:05 UTC

(In reply to Niranjan Mallapadi Raghavender from comment #13)
> Created attachment 1291805 [details]
> sssd_kcm core_backtrace

This is a different backtrace than the original report:
#0  0x00007ff6179621f7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ff6179638e8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007ff6180fbcfc in talloc_abort (reason=0x7ff6181047d8 "Bad talloc magic value - wrong talloc version used/mixed") at ../talloc.c:426
No locals.
#3  0x00007ff6180ff702 in talloc_abort_magic (magic=<optimized out>) at ../talloc.c:434
No locals.
#4  talloc_chunk_from_ptr (ptr=0x55da430952a0) at ../talloc.c:454
        pp = 0x55da430952a0 "`\344\nA\332U"
        tc = 0x55da43095240
#5  __talloc_get_name (ptr=0x55da430952a0) at ../talloc.c:1486
No locals.
#6  _talloc_get_type_abort (ptr=0x55da430952a0, name=name@entry=0x55da410d8675 "struct tevent_req", location=location@entry=0x55da410dd5a0 "src/responder/kcm/kcmsrv_ops.c:178") at ../talloc.c:1543
No locals.
#7  0x000055da410b9858 in kcm_cmd_queue_done (subreq=0x55da430958f0) at src/responder/kcm/kcmsrv_ops.c:178
        req = <optimized out>
        state = <optimized out>
        ret = <optimized out>
        __FUNCTION__ = "kcm_cmd_queue_done"
#8  0x000055da410be367 in kcm_op_queue_entry_destructor (entry=entry@entry=0x55da430991e0) at src/responder/kcm/kcmsrv_op_queue.c:146
        next_entry = <optimized out>
        imm = <optimized out>
#9  0x00007ff6180fb2a0 in _tc_free_internal (location=0x7ff6183139b3 "../tevent_req.c:255", tc=0x55da43099180) at ../talloc.c:1078
        d = <optimized out>
        ptr_to_free = <optimized out>
        ptr = 0x55da430991e0
#10 _tc_free_children_internal (location=0x7ff6183139b3 "../tevent_req.c:255", ptr=0x55da430c50c0, tc=0x55da430c5060) at ../talloc.c:1593
        child = 0x55da430991e0
        new_parent = <optimized out>
#11 _tc_free_internal (tc=0x55da430c5060, location=0x7ff6183139b3 "../tevent_req.c:255") at ../talloc.c:1104

This might be some use-after-free, are you able to reproduce this on will with valgrind as well? (Just running valgrind /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-stderr instead of the service might do)

If this is reproducable on will, with or without valgrind, then I think this bug should fail QE.

Comment 17 Jakub Hrozek 2017-06-26 07:34:57 UTC

(In reply to Niranjan Mallapadi Raghavender from comment #14)
> Created attachment 1291806 [details]
> sssd_secrets core_backtrace

This is https://bugzilla.redhat.com/show_bug.cgi?id=1460689

Comment 19 Niranjan Mallapadi Raghavender 2017-06-27 07:57:10 UTC

Created attachment 1292204 [details]
Valgrind Output

Comment 20 Niranjan Mallapadi Raghavender 2017-06-27 07:57:57 UTC

Ran the scripts kcm1,kcm2, kcm3 in parallel from different terminals. I have attached the valgrind output,

Comment 21 Lukas Slebodnik 2017-06-27 08:02:51 UTC

(In reply to Niranjan Mallapadi Raghavender from comment #19)
> Created attachment 1292204 [details]
> Valgrind Output

That valgrind output shows just leaks caused by improper shutdown.
There is not any use-after free.

I'll try to run kcm* scripts in parallel.

Comment 22 Jakub Hrozek 2017-06-27 08:40:10 UTC

(In reply to Lukas Slebodnik from comment #21)
> (In reply to Niranjan Mallapadi Raghavender from comment #19)
> > Created attachment 1292204 [details]
> > Valgrind Output
> 
> That valgrind output shows just leaks caused by improper shutdown.
> There is not any use-after free.
> 
> I'll try to run kcm* scripts in parallel.

It might be a good idea to patch sssd-secrets with github PR #225, otherwise sssd-secrets will crash quite soon.

(I tried running the scripts in parallel yesterday and only saw sssd-secrets crashes, not any sssd-kcm crash..)

Comment 23 Lukas Slebodnik 2017-06-27 08:43:27 UTC

(In reply to Jakub Hrozek from comment #22)
> (In reply to Lukas Slebodnik from comment #21)
> > (In reply to Niranjan Mallapadi Raghavender from comment #19)
> > > Created attachment 1292204 [details]
> > > Valgrind Output
> > 
> > That valgrind output shows just leaks caused by improper shutdown.
> > There is not any use-after free.
> > 
> > I'll try to run kcm* scripts in parallel.
> 
> It might be a good idea to patch sssd-secrets with github PR #225, otherwise
> sssd-secrets will crash quite soon.
> 
> (I tried running the scripts in parallel yesterday and only saw sssd-secrets
> crashes, not any sssd-kcm crash..)

But that might be partially a reason why sssd-kcm crashed :-)

Comment 24 Lukas Slebodnik 2017-06-27 08:45:19 UTC

(In reply to Jakub Hrozek from comment #22)
> (In reply to Lukas Slebodnik from comment #21)
> > (In reply to Niranjan Mallapadi Raghavender from comment #19)
> > > Created attachment 1292204 [details]
> > > Valgrind Output
> > 
> > That valgrind output shows just leaks caused by improper shutdown.
> > There is not any use-after free.
> > 
> > I'll try to run kcm* scripts in parallel.
> 
> It might be a good idea to patch sssd-secrets with github PR #225, otherwise
> sssd-secrets will crash quite soon.
> 
> (I tried running the scripts in parallel yesterday and only saw sssd-secrets
> crashes, not any sssd-kcm crash..)

But that might be partially a reason why sssd-kcm crashed :-)

I somehow managed to reproduce crash without valgrind

sh# abrt-cli ls
id 141e0bf99a5fe6cc79eb66a2384d241496e59b42
reason:         sssd_kcm killed by SIGSEGV
time:           Tue 27 Jun 2017 04:30:44 AM EDT
cmdline:        /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files
package:        sssd-kcm-1.15.2-50.el7
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/ccpp-2017-06-27-04:30:44-16530
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-06-27-04:30:44-16530' for creating a case in Red Hat Customer Portal

And related part of sssd-kcm log file
(Tue Jun 27 04:30:41:055538 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55e575847fc0
(Tue Jun 27 04:30:41:055557 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/0/ccache/
(Tue Jun 27 04:30:41:055580 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Jun 27 04:30:41:055597 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 04:30:41:055618 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 04:30:41:055643 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items
(Tue Jun 27 04:30:41:055669 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done
(Tue Jun 27 04:30:41:055687 2017) [sssd[kcm]] [sec_get_send] (0x2000): Retrieving ccache 306b717b-899a-459d-96ce-30e9d20f0263-0:6652
(Tue Jun 27 04:30:41:055727 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/0/ccache/306b717b-899a-459d-96ce-30e9d20f0263-0:6652
, at socket /var/run/secrets.socket
(Tue Jun 27 04:30:41:055748 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 1
(Tue Jun 27 04:30:41:056919 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data (nil)
(Tue Jun 27 04:30:41:056955 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 5000
(Tue Jun 27 04:30:41:056972 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Jun 27 04:30:43:971912 2017) [sssd[kcm]] [responder_idle_handler] (0x2000): Terminating idle responder [0x55e57582d6a0]
(Tue Jun 27 04:30:43:972009 2017) [sssd[kcm]] [kcm_responder_ctx_destructor] (0x0400): Responder is being shut down
(Tue Jun 27 04:30:43:972065 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55e5758725d0][17]
(Tue Jun 27 04:30:43:972131 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55e575872240][16]
(Tue Jun 27 04:30:43:972154 2017) [sssd[kcm]] [tcurl_request_destructor] (0x0080): Terminating TCURL request...
(Tue Jun 27 04:30:43:972235 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55e575847b20
(Tue Jun 27 04:30:43:972256 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 04:30:43:972273 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 04:30:44:195682 2017) [sssd[kcm]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb


I will try to reproduce with valgrind.

Comment 25 Lukas Slebodnik 2017-06-27 09:16:54 UTC

I do not think that sssd-kcm crash is related because there is 30 seconds delay between them

And maybe crash in sssd-secrets is really related.

 Jun 27 04:30:25 host.example.com sssd[secrets][26160]: Starting up
Jun 27 04:30:41 host.example.com sssd_secrets[26160]: Assertion failure: 0 == rv, at ../../.././nspr/pr/src/pthreads/ptthread.c:1010
Jun 27 04:30:41 host.example.com abrt-hook-ccpp[26803]: Process 26160 (sssd_secrets) of user 0 killed by SIGABRT - ignoring (repeated crash)
Jun 27 04:30:41 host.example.com systemd[1]: sssd-secrets.service: main process exited, code=dumped, status=6/ABRT
Jun 27 04:30:41 host.example.com systemd[1]: Unit sssd-secrets.service entered failed state.
Jun 27 04:30:41 host.example.com systemd[1]: sssd-secrets.service failed.
Jun 27 04:30:41 host.example.com systemd[1]: Started SSSD Secrets Service responder.
Jun 27 04:30:41 host.example.com systemd[1]: Starting SSSD Secrets Service responder...
Jun 27 04:30:41 host.example.com sssd[secrets][26805]: Starting up
Jun 27 04:30:43 host.example.com kernel: traps: sssd_kcm[16530] general protection ip:7f1d8797fa76 sp:7ffc2757b0a0 error:0 in libtalloc.so.2.1.9[7f1d8797c000+f000]
Jun 27 04:30:44 host.example.com abrt-hook-ccpp[26806]: Process 16530 (sssd_kcm) of user 0 killed by SIGSEGV - dumping core
Jun 27 04:30:44 host.example.com systemd[1]: sssd-kcm.service: main process exited, code=dumped, status=11/SEGV
Jun 27 04:30:44 host.example.com systemd[1]: Unit sssd-kcm.service entered failed state.
Jun 27 04:30:44 host.example.com systemd[1]: sssd-kcm.service failed.


(Tue Jun 27 05:09:17:541498 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 15 socket data 0xdbce670
(Tue Jun 27 05:09:17:541710 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/0/ccache/28654582-0eb6-48d0-b4f3-55c9845e395c-0:6652
(Tue Jun 27 05:09:17:541974 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Jun 27 05:09:17:542143 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 05:09:17:542303 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 05:09:17:546962 2017) [sssd[kcm]] [sec_get_done] (0x2000): GET done
(Tue Jun 27 05:09:17:547220 2017) [sssd[kcm]] [ccdb_sec_getbyname_done] (0x2000): Got ccache by UUID
(Tue Jun 27 05:09:17:547577 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success
(Tue Jun 27 05:09:17:547818 2017) [sssd[kcm]] [kcm_op_get_kdc_offset_send] (0x1000): Requested offset for principal 0:6652
(Tue Jun 27 05:09:17:548045 2017) [sssd[kcm]] [ccdb_sec_getbyname_send] (0x2000): Getting ccache by name
(Tue Jun 27 05:09:17:548218 2017) [sssd[kcm]] [sec_list_send] (0x2000): Listing all ccaches in the secrets store
(Tue Jun 27 05:09:17:548513 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/0/ccache/, at socket /var/run/secrets.socket
(Tue Jun 27 05:09:17:548719 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 1
(Tue Jun 27 05:09:17:548999 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply
(Tue Jun 27 05:09:17:549166 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload
(Tue Jun 27 05:09:17:549415 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Tue Jun 27 05:09:17:550251 2017) [sssd[kcm]] [responder_idle_handler] (0x2000): Terminating idle responder [0xd9e31d0]
(Tue Jun 27 05:09:17:550879 2017) [sssd[kcm]] [kcm_responder_ctx_destructor] (0x0400): Responder is being shut down
(Tue Jun 27 05:09:17:551630 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x163c9af0][18]
(Tue Jun 27 05:09:17:552104 2017) [sssd[kcm]] [tcurl_request_destructor] (0x0080): Terminating TCURL request...
(Tue Jun 27 05:09:17:552602 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Jun 27 05:09:17:552799 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Jun 27 05:09:26:512198 2017) [sssd[kcm]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb

I will attach valgrind log in another comment

Comment 26 Lukas Slebodnik 2017-06-27 09:19:23 UTC

Created attachment 1292228 [details]
valgrind use after free

Comment 27 Jakub Hrozek 2017-06-27 12:27:09 UTC

Thank you for the analysis. 

What I find strange is that the free that triggered the use-after-free was caused by:
==00:00:07:37.734 30982==    by 0x12B301: responder_idle_handler (responder_common.c:378)
==00:00:07:37.734 30982==    by 0x997BC96: tevent_common_loop_timer_delay (tevent_timed.c:369)

I don't know if this is caused by the sssd-secrets crash or not. But I'm quite confident that this kcm crash is not related to the original report of concurrency issue in the kcm_op_quueue. In my opinion this bug should be marked as VERIFIED and another one should be opened for the new issue.

Comment 28 Niranjan Mallapadi Raghavender 2017-06-27 12:51:02 UTC

Based on the above comments, Marking it as verified

Comment 29 errata-xmlrpc 2017-08-01 09:06:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2294

Comment 30 Mike G 2017-09-28 11:59:35 UTC

I just updated to the patches specified here and am still facing this same issue. The patched version does not resolve the service crash.

Thanks

Comment 31 Jakub Hrozek 2017-09-28 13:15:02 UTC

(In reply to Mike G from comment #30)
> I just updated to the patches specified here and am still facing this same
> issue. The patched version does not resolve the service crash.
> 
> Thanks

Please open a new bugzilla and attach the core files and the debug logs, please.

Comment 32 Lukas Slebodnik 2017-10-02 11:29:13 UTC

(In reply to Mike G from comment #30)
> I just updated to the patches specified here and am still facing this same
> issue. The patched version does not resolve the service crash.
> 
You probably hit a different bug with similar symptoms https://bugzilla.redhat.com/show_bug.cgi?id=1465457

Note You need to log in before you can comment on or make changes to this bug.