Bug 2162939
| Summary: | Job for autofs.service failed because a fatal signal was delivered causing the control process to dump core. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Anuj Borah <aborah> |
| Component: | autofs | Assignee: | Ian Kent <ikent> |
| Status: | CLOSED MIGRATED | QA Contact: | Kun Wang <kunwan> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.2 | CC: | atikhono, fsorenso, fweimer, ikent, xzhou |
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Reopened |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-23 11:27:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Anuj Borah
2023-01-22 05:17:58 UTC
I have a couple of questions. You have included sssd configuration in the above comments but no autofs configuration at all. Are you using sssd with autofs? Please also include autofs configuration information. You posted something in comment#1. I don't know what it is, can you elaborate please? If it is a compressed core dump then there's not enough information here for me to construct a system to examine it. I could try and use a sos-report from a system where the problem has been seen to construct such a system but otherwise I can't do anything but guess which is usually hopeless. Looking at the coredump with latest 9.2 compose (not sure if `glibc` version is the same) and sssd-2.8.2-2 on top (not sure if it matters):
```
Core was generated by `/usr/sbin/automount --systemd-service --dont-check-daemon'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f1894853588 in ?? ()
[Current thread is 1 (Thread 0x7f1893361640 (LWP 27824))]
(gdb) bt
#0 0x00007f1894853588 in ?? ()
#1 0x00007f1893361988 in ?? ()
#2 0x00007f1894c9c931 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74
#3 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
#4 0x00007f1894c9f6d6 in start_thread (arg=<optimized out>) at pthread_create.c:454
#5 0x00007f1894c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f1893361640 (LWP 27824) (Exiting) 0x00007f1894853588 in ?? ()
2 Thread 0x7f18948608c0 (LWP 27817) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393,
expected=0, futex_word=0x7ffe88f12dd0) at futex-internal.c:57
3 Thread 0x7f1894363640 (LWP 27820) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393,
expected=0, futex_word=0x7f1894ff4c68 <cond+40>) at futex-internal.c:57
4 Thread 0x7f1893b62640 (LWP 27821) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393,
expected=0, futex_word=0x5607d28651a8 <cond+40>) at futex-internal.c:57
5 Thread 0x7f1892b60640 (LWP 27827) clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f18948608c0 (LWP 27817))]
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffe88f12dd0)
at futex-internal.c:57
57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffe88f12dd0)
at futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7ffe88f12dd0, expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2 0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffe88f12dd0,
expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3 0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffe88f12d80, cond=0x7ffe88f12da8)
at pthread_cond_wait.c:504
#4 ___pthread_cond_wait (cond=cond@entry=0x7ffe88f12da8, mutex=mutex@entry=0x7ffe88f12d80) at pthread_cond_wait.c:619
#5 0x00005607d285035b in master_do_mount (entry=0x5607d3e76e40) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1374
#6 master_mount_mounts (master=master@entry=0x5607d3e530a0, age=age@entry=3077)
at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1541
#7 0x00005607d2851117 in master_read_master (master=0x5607d3e530a0, age=3077)
at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1201
#8 0x00005607d283c7ee in main (argc=0, argv=<optimized out>) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/automount.c:2703
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f1894363640 (LWP 27820))]
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>)
at futex-internal.c:57
57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>)
at futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f1894ff4c68 <cond+40>, expected=expected@entry=0,
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2 0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f1894ff4c68 <cond+40>,
expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3 0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f1894ff4c00 <mutex>, cond=0x7f1894ff4c40 <cond>)
at pthread_cond_wait.c:504
#4 ___pthread_cond_wait (cond=cond@entry=0x7f1894ff4c40 <cond>, mutex=mutex@entry=0x7f1894ff4c00 <mutex>) at pthread_cond_wait.c:619
#5 0x00007f1894fdbb6a in alarm_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/lib/alarm.c:221
#6 0x00007f1894c9f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#7 0x00007f1894c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f1893b62640 (LWP 27821))]
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>)
at futex-internal.c:57
57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>)
at futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x5607d28651a8 <cond+40>, expected=expected@entry=0,
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2 0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5607d28651a8 <cond+40>,
expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3 0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5607d28651c0 <mutex>, cond=0x5607d2865180 <cond>)
at pthread_cond_wait.c:504
#4 ___pthread_cond_wait (cond=cond@entry=0x5607d2865180 <cond>, mutex=mutex@entry=0x5607d28651c0 <mutex>) at pthread_cond_wait.c:619
#5 0x00005607d284eedb in st_queue_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/state.c:1072
#6 0x00007f1894c9f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#7 0x00007f1894c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f1892b60640 (LWP 27827))]
#0 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
62 test %RAX_LP, %RAX_LP
(gdb) bt
#0 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
#1 0x0000000000000000 in ?? ()
```
(In reply to Alexey Tikhonov from comment #5) > Looking at the coredump with latest 9.2 compose (not sure if `glibc` version > is the same) and sssd-2.8.2-2 on top (not sure if it matters): > > ``` > > Core was generated by `/usr/sbin/automount --systemd-service > --dont-check-daemon'. This is the normal unit start command. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f1894853588 in ?? () > [Current thread is 1 (Thread 0x7f1893361640 (LWP 27824))] > > > (gdb) bt > #0 0x00007f1894853588 in ?? () > #1 0x00007f1893361988 in ?? () > #2 0x00007f1894c9c931 in __GI___nptl_deallocate_tsd () at > nptl_deallocate_tsd.c:74 > #3 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23 > #4 0x00007f1894c9f6d6 in start_thread (arg=<optimized out>) at > pthread_create.c:454 > #5 0x00007f1894c3f450 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 This is what we see when a thread specific key destructor execution fails to be executed. The problem is (I'm pretty sure) that the first two entries there might be decelerated static so they don't show up in the back trace. As an check I can remove static declarations from any tsd key desctructors and see if we can see what the names are, that could tell us where to look. Basically we don't really know who owns the tsd key. > > > (gdb) info threads > Id Target Id Frame > * 1 Thread 0x7f1893361640 (LWP 27824) (Exiting) 0x00007f1894853588 in ?? > () > 2 Thread 0x7f18948608c0 (LWP 27817) > __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, > expected=0, futex_word=0x7ffe88f12dd0) at futex-internal.c:57 > 3 Thread 0x7f1894363640 (LWP 27820) > __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, > expected=0, futex_word=0x7f1894ff4c68 <cond+40>) at futex-internal.c:57 > 4 Thread 0x7f1893b62640 (LWP 27821) > __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, > expected=0, futex_word=0x5607d28651a8 <cond+40>) at futex-internal.c:57 > 5 Thread 0x7f1892b60640 (LWP 27827) clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62 > > > (gdb) thread 2 > [Switching to thread 2 (Thread 0x7f18948608c0 (LWP 27817))] > #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, > op=393, expected=0, futex_word=0x7ffe88f12dd0) > at futex-internal.c:57 > 57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, > expected, > (gdb) bt > #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, > op=393, expected=0, futex_word=0x7ffe88f12dd0) > at futex-internal.c:57 > #1 __futex_abstimed_wait_common > (futex_word=futex_word@entry=0x7ffe88f12dd0, expected=expected@entry=0, > clockid=clockid@entry=0, > abstime=abstime@entry=0x0, private=private@entry=0, > cancel=cancel@entry=true) at futex-internal.c:87 > #2 0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 > (futex_word=futex_word@entry=0x7ffe88f12dd0, > expected=expected@entry=0, clockid=clockid@entry=0, > abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 > #3 0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, > clockid=0, mutex=0x7ffe88f12d80, cond=0x7ffe88f12da8) > at pthread_cond_wait.c:504 > #4 ___pthread_cond_wait (cond=cond@entry=0x7ffe88f12da8, > mutex=mutex@entry=0x7ffe88f12d80) at pthread_cond_wait.c:619 > #5 0x00005607d285035b in master_do_mount (entry=0x5607d3e76e40) at > /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1374 > #6 master_mount_mounts (master=master@entry=0x5607d3e530a0, > age=age@entry=3077) > at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1541 > #7 0x00005607d2851117 in master_read_master (master=0x5607d3e530a0, > age=3077) > at /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/master.c:1201 > #8 0x00005607d283c7ee in main (argc=0, argv=<optimized out>) at > /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/automount.c:2703 Looks like the main thread, currently reading the autofs master map. > > > (gdb) thread 3 > [Switching to thread 3 (Thread 0x7f1894363640 (LWP 27820))] > #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, > op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>) > at futex-internal.c:57 > 57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, > expected, > (gdb) bt > #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, > op=393, expected=0, futex_word=0x7f1894ff4c68 <cond+40>) > at futex-internal.c:57 > #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f1894ff4c68 > <cond+40>, expected=expected@entry=0, > clockid=clockid@entry=0, abstime=abstime@entry=0x0, > private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87 > #2 0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 > (futex_word=futex_word@entry=0x7f1894ff4c68 <cond+40>, > expected=expected@entry=0, clockid=clockid@entry=0, > abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 > #3 0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, > clockid=0, mutex=0x7f1894ff4c00 <mutex>, cond=0x7f1894ff4c40 <cond>) > at pthread_cond_wait.c:504 > #4 ___pthread_cond_wait (cond=cond@entry=0x7f1894ff4c40 <cond>, > mutex=mutex@entry=0x7f1894ff4c00 <mutex>) at pthread_cond_wait.c:619 > #5 0x00007f1894fdbb6a in alarm_handler (arg=<optimized out>) at > /usr/src/debug/autofs-5.1.7-36.el9.x86_64/lib/alarm.c:221 > #6 0x00007f1894c9f802 in start_thread (arg=<optimized out>) at > pthread_create.c:443 > #7 0x00007f1894c3f450 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 autofs alarm handler thread, should be just waiting to be poked at this stage. > > > (gdb) thread 4 > [Switching to thread 4 (Thread 0x7f1893b62640 (LWP 27821))] > #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, > op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>) > at futex-internal.c:57 > 57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, > expected, > (gdb) bt > #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, > op=393, expected=0, futex_word=0x5607d28651a8 <cond+40>) > at futex-internal.c:57 > #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x5607d28651a8 > <cond+40>, expected=expected@entry=0, > clockid=clockid@entry=0, abstime=abstime@entry=0x0, > private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87 > #2 0x00007f1894c9c3ff in __GI___futex_abstimed_wait_cancelable64 > (futex_word=futex_word@entry=0x5607d28651a8 <cond+40>, > expected=expected@entry=0, clockid=clockid@entry=0, > abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 > #3 0x00007f1894c9eba0 in __pthread_cond_wait_common (abstime=0x0, > clockid=0, mutex=0x5607d28651c0 <mutex>, cond=0x5607d2865180 <cond>) > at pthread_cond_wait.c:504 > #4 ___pthread_cond_wait (cond=cond@entry=0x5607d2865180 <cond>, > mutex=mutex@entry=0x5607d28651c0 <mutex>) at pthread_cond_wait.c:619 > #5 0x00005607d284eedb in st_queue_handler (arg=<optimized out>) at > /usr/src/debug/autofs-5.1.7-36.el9.x86_64/daemon/state.c:1072 > #6 0x00007f1894c9f802 in start_thread (arg=<optimized out>) at > pthread_create.c:443 > #7 0x00007f1894c3f450 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 autofs job queue runner, I think this should also be idle at this point. > > > (gdb) thread 5 > [Switching to thread 5 (Thread 0x7f1892b60640 (LWP 27827))] > #0 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62 > 62 test %RAX_LP, %RAX_LP > (gdb) bt > #0 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62 > #1 0x0000000000000000 in ?? () > > ``` Not sure, might be the main thread. If we can find out the function name of what's being called it will give us something to look for, unfortunately, they may be static and not owned by automount. If it is me then and they are in automount at least we will be able to find them by making sure they are not declared static. Historically what I've done to fix these things is to dlopen() the shared library that owns the tsd at the start of autofs execution in the main thread and dlclose() just before exit. Maybe there's something else going that I'm missing this time, not sure ... It's about now that we start wondering if there have been glibc changes. There is one (not entirely recent) contributed change that uses a tsd key. It uses free() as the descructor and now I'm wondering if free() will be called with NULL or garbage if the key has never actually been set ... this key is initialised to NULL ... which isn't done for the others I use, maybe that's now a problem for glibc ... I must say documentation on exactly what I should be doing there isn't clear. (In reply to Ian Kent from comment #10) > There is one (not entirely recent) contributed change that uses a tsd > key. It uses free() as the descructor and now I'm wondering if free() > will be called with NULL or garbage if the key has never actually been > set ... this key is initialised to NULL ... which isn't done for the > others I use, maybe that's now a problem for glibc ... Mmm ... pthread.h says ... /* Create a key value identifying a location in the thread-specific data area. Each thread maintains a distinct thread-specific data area. DESTR_FUNCTION, if non-NULL, is called with the value associated to that key when the key is destroyed. DESTR_FUNCTION is not called if the value associated is NULL when the key is destroyed. */ So I shouldn't need to worry about NULL checks in destructors. (In reply to Ian Kent from comment #11) > Mmm ... pthread.h says ... > > /* Create a key value identifying a location in the thread-specific > data area. Each thread maintains a distinct thread-specific data > area. DESTR_FUNCTION, if non-NULL, is called with the value > associated to that key when the key is destroyed. > DESTR_FUNCTION is not called if the value associated is NULL when > the key is destroyed. */ > > So I shouldn't need to worry about NULL checks in destructors. I wonder how the tsd key itself should be initialized so that glibc won't try and access a key that's not been set or even if that matters? It's probably an issue like bug 2143159. You can put this script into a .py file: pthread_keys = gdb.parse_and_eval('__pthread_keys') pthread_keys.fetch_lazy() pthread_keys_range = pthread_keys.type.range() space = gdb.current_progspace() for i in range(pthread_keys_range[0], pthread_keys_range[1] + 1): k = pthread_keys[i] seq = k['seq'] destr = k['destr'] if k['seq'] != 0 or k['destr'] != 0: soname = space.solib_name(int(destr)) print("[{}:{}] {} ({})".format(i, seq, destr, soname)) Install glibc debugging information with dnf debuginfo-install. Then run the script against a running automount process, like this: gdb --batch -p `pgrep automount` -x /path/to/script.py It should dump the registered TLS destructors, hopefully that provides a clue what is going on. If you trigger the crash afterwards, we can hopefully compare addresses and find out which TLS destructor is responsible. Also hitting the tls destructor crash in case 3368370: #0 0x00007febab392510 in ?? () #1 0x00007febaaf40931 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74 #2 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23 #3 0x00007febaaf436d6 in start_thread (arg=<optimized out>) at pthread_create.c:454 #4 0x00007febaaee3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 the other threads are uninteresting: Thread 4 (Thread 0x7febaa4978c0 (LWP 1397)): #0 0x00007febaaef9aaa in __GI___sigtimedwait (set=set@entry=0x7ffea1c2f590, info=info@entry=0x7ffea1c2f450, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:61 #1 0x00007febaaef90ec in __GI___sigwait (set=set@entry=0x7ffea1c2f590, sig=sig@entry=0x7ffea1c2f558) at ../sysdeps/unix/sysv/linux/sigwait.c:28 #2 0x000055c9f8ece884 in statemachine (arg=0x0) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:1600 #3 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:2754 Thread 3 (Thread 0x7febaa496640 (LWP 1486)): #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7febab0f3c88 <cond+40>) at futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7febab0f3c88 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87 #2 0x00007febaaf403ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7febab0f3c88 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 #3 0x00007febaaf42ba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7febab0f3c20 <mutex>, cond=0x7febab0f3c60 <cond>) at pthread_cond_wait.c:504 #4 ___pthread_cond_wait (cond=cond@entry=0x7febab0f3c60 <cond>, mutex=mutex@entry=0x7febab0f3c20 <mutex>) at pthread_cond_wait.c:619 #5 0x00007febab0dab6a in alarm_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/lib/alarm.c:221 #6 0x00007febaaf43802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #7 0x00007febaaee3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Thread 2 (Thread 0x7feba9c95640 (LWP 1487)): #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55c9f8ef71ac <cond+44>) at futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55c9f8ef71ac <cond+44>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87 #2 0x00007febaaf403ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55c9f8ef71ac <cond+44>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 #3 0x00007febaaf42ba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c9f8ef71c0 <mutex>, cond=0x55c9f8ef7180 <cond>) at pthread_cond_wait.c:504 #4 ___pthread_cond_wait (cond=cond@entry=0x55c9f8ef7180 <cond>, mutex=mutex@entry=0x55c9f8ef71c0 <mutex>) at pthread_cond_wait.c:619 #5 0x000055c9f8ee0a4b in st_queue_handler (arg=<optimized out>) at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/state.c:1072 #6 0x00007febaaf43802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #7 0x00007febaaee3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 [0:1] 0x55c9f8ecf170 <key_thread_stdenv_vars_destroy at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:1982> (None) [1:1] 0x7febaaf53910 <__GI___libc_free at malloc.c:3235> (/home/sos/3368370/root/lib64/libc.so.6) [2:1] 0x7febab1a9670 <xmlFreeGlobalState at /usr/src/debug/libxml2-2.9.13-3.el9_1.x86_64/threads.c:558> (/home/sos/3368370/root/lib64/libxml2.so.2) [3:1] 0x7febab392510 (None) [4:1] 0x7febab392510 (None) [5:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None) [6:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None) [7:1] 0x7feba845c630 <sss_at_thread_exit at src/sss_client/common.c:87> (/home/sos/3368370/root/lib64/libnss_sss.so.2) This looks to match bz2143159, so is this an sssd issue instead of autofs? I can provide coredump and sosreport, if needed. (In reply to Frank Sorenson from comment #14) > Also hitting the tls destructor crash in case 3368370: > > #0 0x00007febab392510 in ?? () > #1 0x00007febaaf40931 in __GI___nptl_deallocate_tsd () at > nptl_deallocate_tsd.c:74 > #2 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23 > #3 0x00007febaaf436d6 in start_thread (arg=<optimized out>) at > pthread_create.c:454 > #4 0x00007febaaee3450 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > snip ... > [0:1] 0x55c9f8ecf170 <key_thread_stdenv_vars_destroy at > /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/automount.c:1982> (None) Part of main thread, always available. > [1:1] 0x7febaaf53910 <__GI___libc_free at malloc.c:3235> > (/home/sos/3368370/root/lib64/libc.so.6) Libc is always available, has to be to issue that message. > [2:1] 0x7febab1a9670 <xmlFreeGlobalState at > /usr/src/debug/libxml2-2.9.13-3.el9_1.x86_64/threads.c:558> > (/home/sos/3368370/root/lib64/libxml2.so.2) Libxml2 is available, it's dlopened in the main thread as a workaround to prevent the nptl crash, been there for ages. > [3:1] 0x7febab392510 (None) > [4:1] 0x7febab392510 (None) Lets assume these are NULL desctructors so it's not these. Wonder why they are here, perhaps a tsd key leak ... don't know. > [5:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at > /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None) > [6:1] 0x55c9f8ecf1b0 <key_mnt_params_destroy at > /usr/src/debug/autofs-5.1.7-32.el9_1.1.x86_64/daemon/direct.c:53> (None) Both available in the main thread/executable. > [7:1] 0x7feba845c630 <sss_at_thread_exit at src/sss_client/common.c:87> > (/home/sos/3368370/root/lib64/libnss_sss.so.2) Which leaves sss as the best candidate. > > > This looks to match bz2143159, so is this an sssd issue instead of autofs? I think so, we should check sss version and perhaps try an update to verify. Ian Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |