Bug 1446302
| Summary: | crash in sssd-kcm due to a race-condition between two concurrent requests | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jakub Hrozek <jhrozek> | ||||||||||
| Component: | sssd | Assignee: | SSSD Maintainers <sssd-maint> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Niranjan Mallapadi Raghavender <mniranja> | ||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 7.3 | CC: | grajaiya, jhrozek, lslebodn, mgomon, mkosek, mniranja, mzidek, nsoman, pbrezina, sgoveas, tscherf | ||||||||||
| Target Milestone: | rc | Flags: | mniranja:
needinfo-
|
||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | sssd-1.15.2-35.el7 | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-08-01 09:06:23 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Jakub Hrozek
2017-04-27 15:43:03 UTC
* master:
* fb51bb68e62de7bb8542f5d224994eb7143040a6
* 274489b092bba5fc81cb0f803843d56b267c5aaf
The easiest way to reproduce is to acquire a ticket with kinit: echo -n password | kinit foo And then run multiple parallel klist commands: for i in $(seq 1 100); do klist -A; done before the patch, one of the klists will trigger a segfault in sssd_kcm. [root@idm1 sssd]# date;ps -ef | grep sssd_kcm Mon Jun 5 05:40:17 IST 2017 root 21430 13287 0 05:40 pts/1 00:00:00 grep --color=auto sssd_kcm 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST 06/05/2017 05:10:55 06/05/2017 06:10:53 krbtgt/CENTAUR.TEST [root@idm1 sssd]# date;ps -ef | grep sssd_kcm Mon Jun 5 05:41:34 IST 2017 root 21456 1 1 05:40 ? 00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files root 21728 13287 0 05:41 pts/1 00:00:00 grep --color=auto sssd_kcm [root@idm1 sssd]# "date;ps -ef | grep sssd_kcm" is not a sufficient reproducer. You can find more details in upstream ticket. versions: ======== sssd-common-pac-1.15.2-50.el7.x86_64 sssd-proxy-1.15.2-50.el7.x86_64 sssd-common-1.15.2-50.el7.x86_64 sssd-ad-1.15.2-50.el7.x86_64 python-sssdconfig-1.15.2-50.el7.noarch sssd-client-1.15.2-50.el7.x86_64 sssd-krb5-common-1.15.2-50.el7.x86_64 sssd-krb5-1.15.2-50.el7.x86_64 sssd-1.15.2-50.el7.x86_64 sssd-ldap-1.15.2-50.el7.x86_64 sssd-ipa-1.15.2-50.el7.x86_64 sssd-kcm-1.15.2-50.el7.x86_64 sssd.conf [sssd] config_file_version = 2 services = nss, pam domains = centaur.test [domain/centaur.test] ad_domain = centaur.test krb5_realm = CENTAUR.TEST cache_credentials = True id_provider = ad krb5_store_password_if_offline = True default_shell = /bin/bash ldap_id_mapping = True use_fully_qualified_names = True fallback_homedir = /home/%u@%d ldap_id_mapping = True access_provider = ad enumerate = True debug_level = 6 [pam] debug_level = 9 [ssh] [sudo] [kcm] debug_level = 9 Ran the tests in below method: root root 2282 2070 0 Jun22 ? 00:00:00 tail -f /var/log/sssd/systemctl.log root 7043 1 0 16:56 ? 00:00:00 /usr/sbin/sssd -i -f root 7046 7043 0 16:56 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files root 7047 7043 0 16:56 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files root 14851 7043 0 17:56 ? 00:00:21 /usr/libexec/sssd/sssd_be --domain centaur.test --uid 0 --gid 0 --debug-to-files root 17756 1 0 19:45 ? 00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files root 17762 4301 0 19:45 pts/0 00:00:00 grep --color=auto sssd 1. Create 3 users in AD (foobar1, foobar2, foobar3 with password Secret123) Open 3 terminals and run the below in parallel #/bin/bash export KRB5CCNAME=KCM: date echo Secret123 | kinit foobar1 date;for i in $(seq 1 100); do klist -A; done On terminal 2 export KRB5CCNAME=KCM: date echo Secret123 | kinit foobar2 date;for i in $(seq 1 100); do klist -A; done On terminal 3 export KRB5CCNAME=KCM: date echo Secret123 | kinit foobar3 date;for i in $(seq 1 100); do klist -A; done After running the above in parallel did not see any crash in sssd_kcm [root@ipaqavmd sssd]# date ;ps -ef | grep sssd Fri Jun 23 10:18:13 EDT 2017 root 30734 1 0 09:18 ? 00:00:00 /usr/sbin/sssd -i -f root 30735 30734 0 09:18 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain juno.test --uid 0 --gid 0 --debug-to-files root 30736 30734 0 09:18 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files root 30737 30734 0 09:18 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files root 31325 1 11 10:16 ? 00:00:11 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files root 31605 1 17 10:17 ? 00:00:04 /usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-to-files root 31659 30900 0 10:18 pts/2 00:00:00 grep --color=auto sssd [root@ipaqavmd sssd]# Did not see any crash of sssd_kcm and was responding After running the tests multiple times i see this, sssd-secrets crashes and it looks like it restart sssd-kcm while restarting sssd-secrets service. Initially the pid of sssd_kcm was 3504 root 3452 1 0 18:30 ? 00:00:00 /usr/sbin/sssd -i -f root 3453 3452 0 18:30 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain juno.test --uid 0 --gid 0 --debug-to-files root 3476 3452 0 18:30 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files root 3477 3452 0 18:30 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files root 3504 1 10 18:30 ? 00:00:02 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files root 3589 1 0 18:30 ? 00:00:00 /usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-to-files root 3594 3454 0 18:30 pts/5 00:00:00 grep --color=auto sssd After running kinit from multiple terminals simultaneously i see below: Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd[secrets][8549]: Starting up Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: Duplicate: core backtrace Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: DUP_OF_DIR: /var/spool/abrt/ccpp-2017-06-23-18:33:28-7001 Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-server[8548]: Deleting problem directory ccpp-2017-06-23-18:37:45-8406 (dup of ccpp-2017-06-23-18:33:28-7001) Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus[634]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus-daemon[634]: dbus[634]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus[634]: [system] Successfully activated service 'org.freedesktop.problems' Jun 23 18:37:46 ipaqavmd.idmqe.lab.eng.bos.redhat.com dbus-daemon[634]: dbus[634]: [system] Successfully activated service 'org.freedesktop.problems' Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd_secrets[8549]: Assertion failure: 0 == rv, at ../../.././nspr/pr/src/pthreads/ptthread.c:1010 Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-hook-ccpp[8636]: Process 8549 (sssd_secrets) of user 0 killed by SIGABRT - ignoring (repeated crash) Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-secrets.service: main process exited, code=dumped, status=6/ABRT Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Unit sssd-secrets.service entered failed state. Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-secrets.service failed. Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Started SSSD Secrets Service responder. -- Subject: Unit sssd-secrets.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit sssd-secrets.service has finished starting up. -- -- The start-up result is done. Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Starting SSSD Secrets Service responder... -- Subject: Unit sssd-secrets.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit sssd-secrets.service has begun starting up. Jun 23 18:38:01 ipaqavmd.idmqe.lab.eng.bos.redhat.com sssd[secrets][8639]: Starting up Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com abrt-hook-ccpp[8661]: Process 3504 (sssd_kcm) of user 0 killed by SIGABRT - dumping core Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-kcm.service: main process exited, code=dumped, status=6/ABRT Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Unit sssd-kcm.service entered failed state. Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: sssd-kcm.service failed. Jun 23 18:38:09 ipaqavmd.idmqe.lab.eng.bos.redhat.com systemd[1]: Started SSSD Kerberos Cache Manager. -- Subject: Unit sssd-kcm.service has finished start-up I see there is BZ https://bugzilla.redhat.com/show_bug.cgi?id=1460689 filed for secrets responder crashing , is this crash similar to BZ 1460689, ? I see 2 crashes sssd-kcm and sssd-secrets, Attaching the core_backtrace. Created attachment 1291805 [details]
sssd_kcm core_backtrace
Created attachment 1291806 [details]
sssd_secrets core_backtrace
(In reply to Niranjan Mallapadi Raghavender from comment #13) > Created attachment 1291805 [details] > sssd_kcm core_backtrace This is a different backtrace than the original report: #0 0x00007ff6179621f7 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007ff6179638e8 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007ff6180fbcfc in talloc_abort (reason=0x7ff6181047d8 "Bad talloc magic value - wrong talloc version used/mixed") at ../talloc.c:426 No locals. #3 0x00007ff6180ff702 in talloc_abort_magic (magic=<optimized out>) at ../talloc.c:434 No locals. #4 talloc_chunk_from_ptr (ptr=0x55da430952a0) at ../talloc.c:454 pp = 0x55da430952a0 "`\344\nA\332U" tc = 0x55da43095240 #5 __talloc_get_name (ptr=0x55da430952a0) at ../talloc.c:1486 No locals. #6 _talloc_get_type_abort (ptr=0x55da430952a0, name=name@entry=0x55da410d8675 "struct tevent_req", location=location@entry=0x55da410dd5a0 "src/responder/kcm/kcmsrv_ops.c:178") at ../talloc.c:1543 No locals. #7 0x000055da410b9858 in kcm_cmd_queue_done (subreq=0x55da430958f0) at src/responder/kcm/kcmsrv_ops.c:178 req = <optimized out> state = <optimized out> ret = <optimized out> __FUNCTION__ = "kcm_cmd_queue_done" #8 0x000055da410be367 in kcm_op_queue_entry_destructor (entry=entry@entry=0x55da430991e0) at src/responder/kcm/kcmsrv_op_queue.c:146 next_entry = <optimized out> imm = <optimized out> #9 0x00007ff6180fb2a0 in _tc_free_internal (location=0x7ff6183139b3 "../tevent_req.c:255", tc=0x55da43099180) at ../talloc.c:1078 d = <optimized out> ptr_to_free = <optimized out> ptr = 0x55da430991e0 #10 _tc_free_children_internal (location=0x7ff6183139b3 "../tevent_req.c:255", ptr=0x55da430c50c0, tc=0x55da430c5060) at ../talloc.c:1593 child = 0x55da430991e0 new_parent = <optimized out> #11 _tc_free_internal (tc=0x55da430c5060, location=0x7ff6183139b3 "../tevent_req.c:255") at ../talloc.c:1104 This might be some use-after-free, are you able to reproduce this on will with valgrind as well? (Just running valgrind /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-stderr instead of the service might do) If this is reproducable on will, with or without valgrind, then I think this bug should fail QE. (In reply to Niranjan Mallapadi Raghavender from comment #14) > Created attachment 1291806 [details] > sssd_secrets core_backtrace This is https://bugzilla.redhat.com/show_bug.cgi?id=1460689 Created attachment 1292204 [details]
Valgrind Output
Ran the scripts kcm1,kcm2, kcm3 in parallel from different terminals. I have attached the valgrind output, (In reply to Niranjan Mallapadi Raghavender from comment #19) > Created attachment 1292204 [details] > Valgrind Output That valgrind output shows just leaks caused by improper shutdown. There is not any use-after free. I'll try to run kcm* scripts in parallel. (In reply to Lukas Slebodnik from comment #21) > (In reply to Niranjan Mallapadi Raghavender from comment #19) > > Created attachment 1292204 [details] > > Valgrind Output > > That valgrind output shows just leaks caused by improper shutdown. > There is not any use-after free. > > I'll try to run kcm* scripts in parallel. It might be a good idea to patch sssd-secrets with github PR #225, otherwise sssd-secrets will crash quite soon. (I tried running the scripts in parallel yesterday and only saw sssd-secrets crashes, not any sssd-kcm crash..) (In reply to Jakub Hrozek from comment #22) > (In reply to Lukas Slebodnik from comment #21) > > (In reply to Niranjan Mallapadi Raghavender from comment #19) > > > Created attachment 1292204 [details] > > > Valgrind Output > > > > That valgrind output shows just leaks caused by improper shutdown. > > There is not any use-after free. > > > > I'll try to run kcm* scripts in parallel. > > It might be a good idea to patch sssd-secrets with github PR #225, otherwise > sssd-secrets will crash quite soon. > > (I tried running the scripts in parallel yesterday and only saw sssd-secrets > crashes, not any sssd-kcm crash..) But that might be partially a reason why sssd-kcm crashed :-) (In reply to Jakub Hrozek from comment #22) > (In reply to Lukas Slebodnik from comment #21) > > (In reply to Niranjan Mallapadi Raghavender from comment #19) > > > Created attachment 1292204 [details] > > > Valgrind Output > > > > That valgrind output shows just leaks caused by improper shutdown. > > There is not any use-after free. > > > > I'll try to run kcm* scripts in parallel. > > It might be a good idea to patch sssd-secrets with github PR #225, otherwise > sssd-secrets will crash quite soon. > > (I tried running the scripts in parallel yesterday and only saw sssd-secrets > crashes, not any sssd-kcm crash..) But that might be partially a reason why sssd-kcm crashed :-) I somehow managed to reproduce crash without valgrind sh# abrt-cli ls id 141e0bf99a5fe6cc79eb66a2384d241496e59b42 reason: sssd_kcm killed by SIGSEGV time: Tue 27 Jun 2017 04:30:44 AM EDT cmdline: /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-to-files package: sssd-kcm-1.15.2-50.el7 uid: 0 (root) count: 1 Directory: /var/spool/abrt/ccpp-2017-06-27-04:30:44-16530 Run 'abrt-cli report /var/spool/abrt/ccpp-2017-06-27-04:30:44-16530' for creating a case in Red Hat Customer Portal And related part of sssd-kcm log file (Tue Jun 27 04:30:41:055538 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55e575847fc0 (Tue Jun 27 04:30:41:055557 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/0/ccache/ (Tue Jun 27 04:30:41:055580 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success (Tue Jun 27 04:30:41:055597 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1 (Tue Jun 27 04:30:41:055618 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests (Tue Jun 27 04:30:41:055643 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items (Tue Jun 27 04:30:41:055669 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done (Tue Jun 27 04:30:41:055687 2017) [sssd[kcm]] [sec_get_send] (0x2000): Retrieving ccache 306b717b-899a-459d-96ce-30e9d20f0263-0:6652 (Tue Jun 27 04:30:41:055727 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/0/ccache/306b717b-899a-459d-96ce-30e9d20f0263-0:6652 , at socket /var/run/secrets.socket (Tue Jun 27 04:30:41:055748 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 1 (Tue Jun 27 04:30:41:056919 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data (nil) (Tue Jun 27 04:30:41:056955 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 5000 (Tue Jun 27 04:30:41:056972 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests (Tue Jun 27 04:30:43:971912 2017) [sssd[kcm]] [responder_idle_handler] (0x2000): Terminating idle responder [0x55e57582d6a0] (Tue Jun 27 04:30:43:972009 2017) [sssd[kcm]] [kcm_responder_ctx_destructor] (0x0400): Responder is being shut down (Tue Jun 27 04:30:43:972065 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55e5758725d0][17] (Tue Jun 27 04:30:43:972131 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55e575872240][16] (Tue Jun 27 04:30:43:972154 2017) [sssd[kcm]] [tcurl_request_destructor] (0x0080): Terminating TCURL request... (Tue Jun 27 04:30:43:972235 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55e575847b20 (Tue Jun 27 04:30:43:972256 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1 (Tue Jun 27 04:30:43:972273 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests (Tue Jun 27 04:30:44:195682 2017) [sssd[kcm]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb I will try to reproduce with valgrind. I do not think that sssd-kcm crash is related because there is 30 seconds delay between them And maybe crash in sssd-secrets is really related. Jun 27 04:30:25 host.example.com sssd[secrets][26160]: Starting up Jun 27 04:30:41 host.example.com sssd_secrets[26160]: Assertion failure: 0 == rv, at ../../.././nspr/pr/src/pthreads/ptthread.c:1010 Jun 27 04:30:41 host.example.com abrt-hook-ccpp[26803]: Process 26160 (sssd_secrets) of user 0 killed by SIGABRT - ignoring (repeated crash) Jun 27 04:30:41 host.example.com systemd[1]: sssd-secrets.service: main process exited, code=dumped, status=6/ABRT Jun 27 04:30:41 host.example.com systemd[1]: Unit sssd-secrets.service entered failed state. Jun 27 04:30:41 host.example.com systemd[1]: sssd-secrets.service failed. Jun 27 04:30:41 host.example.com systemd[1]: Started SSSD Secrets Service responder. Jun 27 04:30:41 host.example.com systemd[1]: Starting SSSD Secrets Service responder... Jun 27 04:30:41 host.example.com sssd[secrets][26805]: Starting up Jun 27 04:30:43 host.example.com kernel: traps: sssd_kcm[16530] general protection ip:7f1d8797fa76 sp:7ffc2757b0a0 error:0 in libtalloc.so.2.1.9[7f1d8797c000+f000] Jun 27 04:30:44 host.example.com abrt-hook-ccpp[26806]: Process 16530 (sssd_kcm) of user 0 killed by SIGSEGV - dumping core Jun 27 04:30:44 host.example.com systemd[1]: sssd-kcm.service: main process exited, code=dumped, status=11/SEGV Jun 27 04:30:44 host.example.com systemd[1]: Unit sssd-kcm.service entered failed state. Jun 27 04:30:44 host.example.com systemd[1]: sssd-kcm.service failed. (Tue Jun 27 05:09:17:541498 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 15 socket data 0xdbce670 (Tue Jun 27 05:09:17:541710 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/0/ccache/28654582-0eb6-48d0-b4f3-55c9845e395c-0:6652 (Tue Jun 27 05:09:17:541974 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success (Tue Jun 27 05:09:17:542143 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1 (Tue Jun 27 05:09:17:542303 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests (Tue Jun 27 05:09:17:546962 2017) [sssd[kcm]] [sec_get_done] (0x2000): GET done (Tue Jun 27 05:09:17:547220 2017) [sssd[kcm]] [ccdb_sec_getbyname_done] (0x2000): Got ccache by UUID (Tue Jun 27 05:09:17:547577 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success (Tue Jun 27 05:09:17:547818 2017) [sssd[kcm]] [kcm_op_get_kdc_offset_send] (0x1000): Requested offset for principal 0:6652 (Tue Jun 27 05:09:17:548045 2017) [sssd[kcm]] [ccdb_sec_getbyname_send] (0x2000): Getting ccache by name (Tue Jun 27 05:09:17:548218 2017) [sssd[kcm]] [sec_list_send] (0x2000): Listing all ccaches in the secrets store (Tue Jun 27 05:09:17:548513 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/0/ccache/, at socket /var/run/secrets.socket (Tue Jun 27 05:09:17:548719 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 1 (Tue Jun 27 05:09:17:548999 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply (Tue Jun 27 05:09:17:549166 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload (Tue Jun 27 05:09:17:549415 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent! (Tue Jun 27 05:09:17:550251 2017) [sssd[kcm]] [responder_idle_handler] (0x2000): Terminating idle responder [0xd9e31d0] (Tue Jun 27 05:09:17:550879 2017) [sssd[kcm]] [kcm_responder_ctx_destructor] (0x0400): Responder is being shut down (Tue Jun 27 05:09:17:551630 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x163c9af0][18] (Tue Jun 27 05:09:17:552104 2017) [sssd[kcm]] [tcurl_request_destructor] (0x0080): Terminating TCURL request... (Tue Jun 27 05:09:17:552602 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1 (Tue Jun 27 05:09:17:552799 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests (Tue Jun 27 05:09:26:512198 2017) [sssd[kcm]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb I will attach valgrind log in another comment Created attachment 1292228 [details]
valgrind use after free
Thank you for the analysis. What I find strange is that the free that triggered the use-after-free was caused by: ==00:00:07:37.734 30982== by 0x12B301: responder_idle_handler (responder_common.c:378) ==00:00:07:37.734 30982== by 0x997BC96: tevent_common_loop_timer_delay (tevent_timed.c:369) I don't know if this is caused by the sssd-secrets crash or not. But I'm quite confident that this kcm crash is not related to the original report of concurrency issue in the kcm_op_quueue. In my opinion this bug should be marked as VERIFIED and another one should be opened for the new issue. Based on the above comments, Marking it as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2294 I just updated to the patches specified here and am still facing this same issue. The patched version does not resolve the service crash. Thanks (In reply to Mike G from comment #30) > I just updated to the patches specified here and am still facing this same > issue. The patched version does not resolve the service crash. > > Thanks Please open a new bugzilla and attach the core files and the debug logs, please. (In reply to Mike G from comment #30) > I just updated to the patches specified here and am still facing this same > issue. The patched version does not resolve the service crash. > You probably hit a different bug with similar symptoms https://bugzilla.redhat.com/show_bug.cgi?id=1465457 |