This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2162939 - Job for autofs.service failed because a fatal signal was delivered causing the control process to dump core.
Summary: Job for autofs.service failed because a fatal signal was delivered causing th...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: autofs
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ian Kent
QA Contact: Kun Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-22 05:17 UTC by Anuj Borah
Modified: 2023-09-23 11:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-23 11:27:10 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-7921 0 None Migrated None 2023-09-23 11:27:02 UTC
Red Hat Issue Tracker RHELPLAN-145952 0 None None None 2023-01-22 05:20:41 UTC

Description Anuj Borah 2023-01-22 05:17:58 UTC
Description of problem:

systemctl restart autofs
Job for autofs.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status autofs.service" and "journalctl -xeu autofs.service" for details.




Version-Release number of selected component (if applicable):


systemctl restart autofs
Job for autofs.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status autofs.service" and "journalctl -xeu autofs.service" for details.


[root@ip-10-0-199-29 sssd]# cat sssd.conf
[sssd]
config_file_version = 2
services = nss, pam, autofs
domains = example1

[domain/example1]
ldap_search_base = dc=example,dc=test
id_provider = ldap
auth_provider = ldap
ldap_user_home_directory = /home/%u
ldap_uri = ldaps://ip-10-0-195-55.rhos-01.prod.psi.rdu2.redhat.com
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
use_fully_qualified_names = True
debug_level = 9
autofs_provider = ldap
ldap_autofs_map_object_class = automountMap
ldap_autofs_map_name = ou
ldap_autofs_entry_object_class = automount
ldap_autofs_entry_key = cn
ldap_autofs_entry_value = automountInformation

[root@ip-10-0-199-29 sssd]# 

[root@ip-10-0-199-29 sssd]# rpm -qa | grep sssd
python3-sssdconfig-2.8.2-2.el9.noarch
sssd-winbind-idmap-2.8.2-2.el9.x86_64
sssd-client-2.8.2-2.el9.x86_64
sssd-nfs-idmap-2.8.2-2.el9.x86_64
sssd-common-2.8.2-2.el9.x86_64
sssd-krb5-common-2.8.2-2.el9.x86_64
sssd-dbus-2.8.2-2.el9.x86_64
sssd-common-pac-2.8.2-2.el9.x86_64
sssd-ad-2.8.2-2.el9.x86_64
sssd-krb5-2.8.2-2.el9.x86_64
sssd-ldap-2.8.2-2.el9.x86_64
sssd-proxy-2.8.2-2.el9.x86_64
sssd-ipa-2.8.2-2.el9.x86_64
sssd-2.8.2-2.el9.x86_64
sssd-tools-2.8.2-2.el9.x86_64
sssd-kcm-2.8.2-2.el9.x86_64
[root@ip-10-0-199-29 sssd]# rpm -qa | grep autofs
autofs-5.1.7-36.el9.x86_64
libsss_autofs-2.8.2-2.el9.x86_64
[root@ip-10-0-199-29 sssd]# 





How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ian Kent 2023-01-23 02:29:55 UTC
I have a couple of questions.

You have included sssd configuration in the above comments but no autofs configuration
at all.

Are you using sssd with autofs?
Please also include autofs configuration information.

You posted something in comment#1.
I don't know what it is, can you elaborate please?

If it is a compressed core dump then there's not enough information here
for me to construct a system to examine it. I could try and use a sos-report
from a system where the problem has been seen to construct such a system but
otherwise I can't do anything but guess which is usually hopeless.

Comment 5 Alexey Tikhonov 2023-01-24 14:30:08 UTC
Looking at the coredump with latest 9.2 compose (not sure if `glibc` version is the same) and sssd-2.8.2-2 on top (not sure if it matters):

```

Core was generated by `/usr/sbin/automount --systemd-service --dont-check-daemon'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1894853588 in ?? ()
[Current thread is 1 (Thread 0x7f1893361640 (LWP 27824))]


(gdb) bt
#0  0x00007f1894853588 in ?? ()
#1  0x00007f1893361988 in ?? ()
#2  0x00007f1894c9c931 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74
#3  __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
#4  0x00007f1894c9f6d6 in start_thread (arg=<optimized out>) at pthread_create.c:454
#5  0x00007f1894c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


(gdb) info threads
  Id   Target Id                                   Frame 
* 1    Thread 0x7f1893361640 (LWP 27824) (Exiting) 0x00007f1894853588 in ?? ()
  2    Thread 0x7f18948608c0 (LWP 27817)           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
    expected=0, futex_word=0x7ffe88f12dd0) at futex-internal.c:57
  3    Thread 0x7f1894363640 (LWP 27820)           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
    expected=0, futex_word=0x7f1894ff4c68 <cond+40>) at futex-internal.c:57
  4    Thread 0x7f1893b62640 (LWP 27821)           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
    expected=0, futex_word=0x5607d28651a8 <cond+40>) at futex-internal.c:57
  5    Thread 0x7f1892b60640 (LWP 27827)           clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62


(gdb) thread 2
[Switching to thread 2 (Thread 0x7f18948608c0 (LWP 27817))]
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffe88f12dd0)
    at futex-internal.c:57
57	    return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffe88f12dd0)
    at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7ffe88f12dd0, expected=expected@entry=0, clockid=clockid@entry=0, 
    abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffe88f12dd0, 
    expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffe88f12d80, cond=0x7ffe88f12da8)
    at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x7ffe88f12da8, mutex=mutex@entry=0x7ffe88f12d80) at pthread_cond_wait.c:619
#5  0x00005607d285035b in master_do_mount (entry=0x5607d3e76e40) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1374
#6  master_mount_mounts (master=master@entry=0x5607d3e530a0, age=age@entry=3077)
    at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1541
#7  0x00005607d2851117 in master_read_master (master=0x5607d3e530a0, age=3077)
    at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1201
#8  0x00005607d283c7ee in main (argc=0, argv=<optimized out>) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/automount.c:2703


(gdb) thread 3
[Switching to thread 3 (Thread 0x7f1894363640 (LWP 27820))]
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>)
    at futex-internal.c:57
57	    return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>)
    at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f1894ff4c68 <cond+40>, expected=expected@entry=0, 
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f1894ff4c68 <cond+40>, 
    expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f1894ff4c00 <mutex>, cond=0x7f1894ff4c40 <cond>)
    at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x7f1894ff4c40 <cond>, mutex=mutex@entry=0x7f1894ff4c00 <mutex>) at pthread_cond_wait.c:619
#5  0x00007f1894fdbb6a in alarm_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/lib/alarm.c:221
#6  0x00007f1894c9f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#7  0x00007f1894c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


(gdb) thread 4
[Switching to thread 4 (Thread 0x7f1893b62640 (LWP 27821))]
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>)
    at futex-internal.c:57
57	    return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>)
    at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x5607d28651a8 <cond+40>, expected=expected@entry=0, 
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5607d28651a8 <cond+40>, 
    expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5607d28651c0 <mutex>, cond=0x5607d2865180 <cond>)
    at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x5607d2865180 <cond>, mutex=mutex@entry=0x5607d28651c0 <mutex>) at pthread_cond_wait.c:619
#5  0x00005607d284eedb in st_queue_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/state.c:1072
#6  0x00007f1894c9f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#7  0x00007f1894c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


(gdb) thread 5
[Switching to thread 5 (Thread 0x7f1892b60640 (LWP 27827))]
#0  clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
62		test	%RAX_LP, %RAX_LP
(gdb) bt
#0  clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
#1  0x0000000000000000 in ?? ()

```

Comment 9 Ian Kent 2023-01-25 01:12:36 UTC
(In reply to Alexey Tikhonov from comment #5)
> Looking at the coredump with latest 9.2 compose (not sure if `glibc` version
> is the same) and sssd-2.8.2-2 on top (not sure if it matters):
> 
> ```
> 
> Core was generated by `/usr/sbin/automount --systemd-service
> --dont-check-daemon'.

This is the normal unit start command.

> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f1894853588 in ?? ()
> [Current thread is 1 (Thread 0x7f1893361640 (LWP 27824))]
> 
> 
> (gdb) bt
> #0  0x00007f1894853588 in ?? ()
> #1  0x00007f1893361988 in ?? ()
> #2  0x00007f1894c9c931 in __GI___nptl_deallocate_tsd () at
> nptl_deallocate_tsd.c:74
> #3  __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
> #4  0x00007f1894c9f6d6 in start_thread (arg=<optimized out>) at
> pthread_create.c:454
> #5  0x00007f1894c3f450 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

This is what we see when a thread specific key destructor execution fails
to be executed.

The problem is (I'm pretty sure) that the first two entries there might be
decelerated static so they don't show up in the back trace.

As an check I can remove static declarations from any tsd key desctructors
and see if we can see what the names are, that could tell us where to look.

Basically we don't really know who owns the tsd key.

> 
> 
> (gdb) info threads
>   Id   Target Id                                   Frame 
> * 1    Thread 0x7f1893361640 (LWP 27824) (Exiting) 0x00007f1894853588 in ??
> ()
>   2    Thread 0x7f18948608c0 (LWP 27817)          
> __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
>     expected=0, futex_word=0x7ffe88f12dd0) at futex-internal.c:57
>   3    Thread 0x7f1894363640 (LWP 27820)          
> __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
>     expected=0, futex_word=0x7f1894ff4c68 <cond+40>) at futex-internal.c:57
>   4    Thread 0x7f1893b62640 (LWP 27821)          
> __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, 
>     expected=0, futex_word=0x5607d28651a8 <cond+40>) at futex-internal.c:57
>   5    Thread 0x7f1892b60640 (LWP 27827)           clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
> 
> 
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7f18948608c0 (LWP 27817))]
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0,
> op=393, expected=0, futex_word=0x7ffe88f12dd0)
>     at futex-internal.c:57
> 57	    return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op,
> expected,
> (gdb) bt
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0,
> op=393, expected=0, futex_word=0x7ffe88f12dd0)
>     at futex-internal.c:57
> #1  __futex_abstimed_wait_common
> (futex_word=futex_word@entry=0x7ffe88f12dd0, expected=expected@entry=0,
> clockid=clockid@entry=0, 
>     abstime=abstime@entry=0x0, private=private@entry=0,
> cancel=cancel@entry=true) at futex-internal.c:87
> #2  0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64
> (futex_word=futex_word@entry=0x7ffe88f12dd0, 
>     expected=expected@entry=0, clockid=clockid@entry=0,
> abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
> #3  0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0,
> clockid=0, mutex=0x7ffe88f12d80, cond=0x7ffe88f12da8)
>     at pthread_cond_wait.c:504
> #4  ___pthread_cond_wait (cond=cond@entry=0x7ffe88f12da8,
> mutex=mutex@entry=0x7ffe88f12d80) at pthread_cond_wait.c:619
> #5  0x00005607d285035b in master_do_mount (entry=0x5607d3e76e40) at
> /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1374
> #6  master_mount_mounts (master=master@entry=0x5607d3e530a0,
> age=age@entry=3077)
>     at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1541
> #7  0x00005607d2851117 in master_read_master (master=0x5607d3e530a0,
> age=3077)
>     at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1201
> #8  0x00005607d283c7ee in main (argc=0, argv=<optimized out>) at
> /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/automount.c:2703

Looks like the main thread, currently reading the autofs master map.

> 
> 
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7f1894363640 (LWP 27820))]
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0,
> op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>)
>     at futex-internal.c:57
> 57	    return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op,
> expected,
> (gdb) bt
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0,
> op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>)
>     at futex-internal.c:57
> #1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f1894ff4c68
> <cond+40>, expected=expected@entry=0, 
>     clockid=clockid@entry=0, abstime=abstime@entry=0x0,
> private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
> #2  0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64
> (futex_word=futex_word@entry=0x7f1894ff4c68 <cond+40>, 
>     expected=expected@entry=0, clockid=clockid@entry=0,
> abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
> #3  0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0,
> clockid=0, mutex=0x7f1894ff4c00 <mutex>, cond=0x7f1894ff4c40 <cond>)
>     at pthread_cond_wait.c:504
> #4  ___pthread_cond_wait (cond=cond@entry=0x7f1894ff4c40 <cond>,
> mutex=mutex@entry=0x7f1894ff4c00 <mutex>) at pthread_cond_wait.c:619
> #5  0x00007f1894fdbb6a in alarm_handler (arg=<optimized out>) at
> /usr/src/debug/autofs-5.1.7-36.el9.x86_64/lib/alarm.c:221
> #6  0x00007f1894c9f802 in start_thread (arg=<optimized out>) at
> pthread_create.c:443
> #7  0x00007f1894c3f450 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

autofs alarm handler thread, should be just waiting to be poked at
this stage.

> 
> 
> (gdb) thread 4
> [Switching to thread 4 (Thread 0x7f1893b62640 (LWP 27821))]
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0,
> op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>)
>     at futex-internal.c:57
> 57	    return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op,
> expected,
> (gdb) bt
> #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0,
> op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>)
>     at futex-internal.c:57
> #1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x5607d28651a8
> <cond+40>, expected=expected@entry=0, 
>     clockid=clockid@entry=0, abstime=abstime@entry=0x0,
> private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
> #2  0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64
> (futex_word=futex_word@entry=0x5607d28651a8 <cond+40>, 
>     expected=expected@entry=0, clockid=clockid@entry=0,
> abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
> #3  0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0,
> clockid=0, mutex=0x5607d28651c0 <mutex>, cond=0x5607d2865180 <cond>)
>     at pthread_cond_wait.c:504
> #4  ___pthread_cond_wait (cond=cond@entry=0x5607d2865180 <cond>,
> mutex=mutex@entry=0x5607d28651c0 <mutex>) at pthread_cond_wait.c:619
> #5  0x00005607d284eedb in st_queue_handler (arg=<optimized out>) at
> /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/state.c:1072
> #6  0x00007f1894c9f802 in start_thread (arg=<optimized out>) at
> pthread_create.c:443
> #7  0x00007f1894c3f450 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

autofs job queue runner, I think this should also be idle at this
point.

> 
> 
> (gdb) thread 5
> [Switching to thread 5 (Thread 0x7f1892b60640 (LWP 27827))]
> #0  clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
> 62		test	%RAX_LP, %RAX_LP
> (gdb) bt
> #0  clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
> #1  0x0000000000000000 in ?? ()
> 
> ```

Not sure, might be the main thread.

If we can find out the function name of what's being called it will
give us something to look for, unfortunately, they may be static and
not owned by automount. If it is me then and they are in automount
at least we will be able to find them by making sure they are not
declared static.

Historically what I've done to fix these things is to dlopen() the
shared library that owns the tsd at the start of autofs execution
in the main thread and dlclose() just before exit.

Maybe there's something else going that I'm missing this time, not
sure ...

Comment 10 Ian Kent 2023-01-25 01:35:58 UTC
It's about now that we start wondering if there have been glibc changes.

There is one (not entirely recent) contributed change that uses a tsd
key. It uses free() as the descructor and now I'm wondering if free()
will be called with NULL or garbage if the key has never actually been
set ... this key is initialised to NULL ... which isn't done for the
others I use, maybe that's now a problem for glibc ...

I must say documentation on exactly what I should be doing there isn't
clear.

Comment 11 Ian Kent 2023-01-25 01:42:44 UTC
(In reply to Ian Kent from comment #10)
> There is one (not entirely recent) contributed change that uses a tsd
> key. It uses free() as the descructor and now I'm wondering if free()
> will be called with NULL or garbage if the key has never actually been
> set ... this key is initialised to NULL ... which isn't done for the
> others I use, maybe that's now a problem for glibc ...

Mmm ... pthread.h says ...

/* Create a key value identifying a location in the thread-specific
   data area.  Each thread maintains a distinct thread-specific data
   area.  DESTR_FUNCTION, if non-NULL, is called with the value
   associated to that key when the key is destroyed.
   DESTR_FUNCTION is not called if the value associated is NULL when
   the key is destroyed.  */

So I shouldn't need to worry about NULL checks in destructors.

Comment 12 Ian Kent 2023-01-25 01:45:30 UTC
(In reply to Ian Kent from comment #11)
> Mmm ... pthread.h says ...
> 
> /* Create a key value identifying a location in the thread-specific
>    data area.  Each thread maintains a distinct thread-specific data
>    area.  DESTR_FUNCTION, if non-NULL, is called with the value
>    associated to that key when the key is destroyed.
>    DESTR_FUNCTION is not called if the value associated is NULL when
>    the key is destroyed.  */
> 
> So I shouldn't need to worry about NULL checks in destructors.

I wonder how the tsd key itself should be initialized so that glibc
won't try and access a key that's not been set or even if that matters?

Comment 13 Florian Weimer 2023-01-25 10:00:10 UTC
It's probably an issue like bug 2143159.

You can put this script into a .py file:

pthread_keys = gdb.parse_and_eval('__pthread_keys')
pthread_keys.fetch_lazy()
pthread_keys_range = pthread_keys.type.range()
space = gdb.current_progspace()
for i in range(pthread_keys_range[0], pthread_keys_range[1] + 1):
    k = pthread_keys[i]
    seq = k['seq']
    destr = k['destr']
    if k['seq'] != 0 or k['destr'] != 0:
        soname = space.solib_name(int(destr))
        print("[{}:{}] {} ({})".format(i, seq, destr, soname))

Install glibc debugging information with dnf debuginfo-install. Then run the script against a running automount process, like this:

gdb --batch -p `pgrep automount`  -x /path/to/script.py

It should dump the registered TLS destructors, hopefully that provides a clue what is going on. If you trigger the crash afterwards, we can hopefully compare addresses and find out which TLS destructor is responsible.

Comment 14 Frank Sorenson 2023-03-08 22:55:43 UTC
Also hitting the tls destructor crash in case 3368370:

#0  0x00007febab392510 in ?? ()
#1  0x00007febaaf40931 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74
#2  __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
#3  0x00007febaaf436d6 in start_thread (arg=<optimized out>) at pthread_create.c:454
#4  0x00007febaaee3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

the other threads are uninteresting:

Thread 4 (Thread 0x7febaa4978c0 (LWP 1397)):
#0  0x00007febaaef9aaa in __GI___sigtimedwait (set=set@entry=0x7ffea1c2f590, info=info@entry=0x7ffea1c2f450, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:61
#1  0x00007febaaef90ec in __GI___sigwait (set=set@entry=0x7ffea1c2f590, sig=sig@entry=0x7ffea1c2f558) at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2  0x000055c9f8ece884 in statemachine (arg=0x0) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:1600
#3  main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:2754

Thread 3 (Thread 0x7febaa496640 (LWP 1486)):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7febab0f3c88 <cond+40>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7febab0f3c88 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007febaaf403ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7febab0f3c88 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007febaaf42ba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7febab0f3c20 <mutex>, cond=0x7febab0f3c60 <cond>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x7febab0f3c60 <cond>, mutex=mutex@entry=0x7febab0f3c20 <mutex>) at pthread_cond_wait.c:619
#5  0x00007febab0dab6a in alarm_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/lib/alarm.c:221
#6  0x00007febaaf43802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#7  0x00007febaaee3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x7feba9c95640 (LWP 1487)):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55c9f8ef71ac <cond+44>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55c9f8ef71ac <cond+44>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007febaaf403ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55c9f8ef71ac <cond+44>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007febaaf42ba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c9f8ef71c0 <mutex>, cond=0x55c9f8ef7180 <cond>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x55c9f8ef7180 <cond>, mutex=mutex@entry=0x55c9f8ef71c0 <mutex>) at pthread_cond_wait.c:619
#5  0x000055c9f8ee0a4b in st_queue_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/state.c:1072
#6  0x00007febaaf43802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#7  0x00007febaaee3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


[0:1] 0x55c9f8ecf170 <key_thread_stdenv_vars_destroy at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:1982> (None)
[1:1] 0x7febaaf53910 <__GI___libc_free at malloc.c:3235> (/home/sos/3368370/root/lib64/libc.so.6)
[2:1] 0x7febab1a9670 <xmlFreeGlobalState at /usr/src/debug/libxml2-2.9.13-3.el9_1.x86_64/threads.c:558> (/home/sos/3368370/root/lib64/libxml2.so.2)
[3:1] 0x7febab392510 (None)
[4:1] 0x7febab392510 (None)
[5:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None)
[6:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None)
[7:1] 0x7feba845c630 <sss_at_thread_exit at src/sss_client/common.c:87> (/home/sos/3368370/root/lib64/libnss_sss.so.2)


This looks to match bz2143159, so is this an sssd issue instead of autofs?

I can provide coredump and sosreport, if needed.

Comment 15 Ian Kent 2023-03-09 00:03:33 UTC
(In reply to Frank Sorenson from comment #14)
> Also hitting the tls destructor crash in case 3368370:
> 
> #0  0x00007febab392510 in ?? ()
> #1  0x00007febaaf40931 in __GI___nptl_deallocate_tsd () at
> nptl_deallocate_tsd.c:74
> #2  __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
> #3  0x00007febaaf436d6 in start_thread (arg=<optimized out>) at
> pthread_create.c:454
> #4  0x00007febaaee3450 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 

snip ...

> [0:1] 0x55c9f8ecf170 <key_thread_stdenv_vars_destroy at
> /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:1982> (None)

Part of main thread, always available.

> [1:1] 0x7febaaf53910 <__GI___libc_free at malloc.c:3235>
> (/home/sos/3368370/root/lib64/libc.so.6)

Libc is always available, has to be to issue that message.

> [2:1] 0x7febab1a9670 <xmlFreeGlobalState at
> /usr/src/debug/libxml2-2.9.13-3.el9_1.x86_64/threads.c:558>
> (/home/sos/3368370/root/lib64/libxml2.so.2)

Libxml2 is available, it's dlopened in the main thread as a workaround
to prevent the nptl crash, been there for ages.

> [3:1] 0x7febab392510 (None)
> [4:1] 0x7febab392510 (None)

Lets assume these are NULL desctructors so it's not these.
Wonder why they are here, perhaps a tsd key leak ... don't know.

> [5:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at
> /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None)
> [6:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at
> /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None)

Both available in the main thread/executable.

> [7:1] 0x7feba845c630 <sss_at_thread_exit at src/sss_client/common.c:87>
> (/home/sos/3368370/root/lib64/libnss_sss.so.2)

Which leaves sss as the best candidate.

> 
> 
> This looks to match bz2143159, so is this an sssd issue instead of autofs?

I think so, we should check sss version and perhaps try an update to verify.

Ian

Comment 18 RHEL Program Management 2023-09-23 11:26:06 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 19 RHEL Program Management 2023-09-23 11:27:10 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.