Bug 2181478
| Summary: | Main process exited after running `systemctl reload autofs.service` | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Joerg <jkastnin> | ||||||
| Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||
| Status: | CLOSED MIGRATED | QA Contact: | Kun Wang <kunwan> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 9.1 | CC: | xzhou | ||||||
| Target Milestone: | rc | Keywords: | MigratedToJIRA | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2023-09-23 11:35:50 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1953346 [details]
Screenshot taken from console showing segfault
I checked and found that RHEL 8.7 with the following component versions **is not** affected: autofs-5.1.4-83.el8.x86_64 libsss_autofs-2.7.3-4.el8_7.3.x86_64 kernel 4.18.0-425.13.1.el8_7.x86_64 I checked the following package versions in RHEL 9 which are all affected: autofs.x86_64 1:5.1.7-27.el9 rhel-9-for-x86_64-baseos-rpms autofs.x86_64 1:5.1.7-31.el9 rhel-9-for-x86_64-baseos-rpms autofs.x86_64 1:5.1.7-32.el9_1.1 rhel-9-for-x86_64-baseos-rpms Can you get onto the system where the crash occurred and get a gdb back trace please. And post it here. This would be a lot easier than me setting up a system to be the same as the system of the sosreport when there are a couple of known problems that have already been resolved. Are you using sssd on the system? There's a known problem with a particular version of sssd and the description of how it happened was the same as what you describe here. It might not be the same though. I'm pretty sure that this will be resolved by the changes for bug 2179753 or the above fix for sssd. I'll get onto bug 2179753 soon as I can and see if I can identify the fixed version of sssd. Ian Hi Ian, As we had already discussed on Slack I'm not able to get you a useful gdb back trace. Here are some additional bits I could share. The package sssd is not installed. But I found the same error happening on Fedora release 39 (Rawhide) with the following component versions: libsss_autofs-2.8.2-4.fc38.x86_64 autofs-5.1.8-9.fc39.x86_64 kernel 6.3.0-0.rc3.20230322gita1effab7a3a3.31.fc39.x86_64 Joerg Hello again,
After tinkering with gdb for a while I got this:
---
[root@rhel91-2023-03-24 coredump]# coredumpctl list /usr/sbin/automount
TIME PID UID GID SIG COREFILE EXE SIZE
Fri 2023-03-24 13:49:49 CET 822 0 0 SIGSEGV present /usr/sbin/automount 244.5K
[root@rhel91-2023-03-24 coredump]# coredumpctl debug
PID: 822 (automount)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Fri 2023-03-24 13:49:49 CET (1min 54s ago)
Command Line: /usr/sbin/automount --systemd-service --dont-check-daemon
Executable: /usr/sbin/automount
Control Group: /system.slice/autofs.service
Unit: autofs.service
Slice: system.slice
Boot ID: 41a54fcbe5af408190332e4ecbf59691
Machine ID: b28b5c50076340e996780f4a472cf3cd
Hostname: rhel91-2023-03-24
Storage: /var/lib/systemd/coredump/core.automount.0.41a54fcbe5af408190332e4ecbf59691.822.1679662189000000.zst (present)
Disk Size: 244.5K
Message: Process 822 (automount) of user 0 dumped core.
Module linux-vdso.so.1 with build-id b2402caaf299e146b50e7caee420b3ac0677afa9
Module lookup_hosts.so with build-id 6502c9ac809410c115658c39cf619a60cb18e4e2
Module mount_bind.so with build-id 1db3141bf32b6ec1b2b7b1ea5948eadaf99d383a
Module mount_nfs.so with build-id 6ac450cdece6836c5303d9422d7ff9a7fc939a3e
Module parse_sun.so with build-id 4aedc2ea29fbcacf76ded03a868e9a915da1ccc4
Module lookup_file.so with build-id fed02f38e452e345c18084568a384adc14e0200b
Module libpcre2-8.so.0 with build-id dac773591ff85ee4d18b00795d8bca123f3d5d66
Module libselinux.so.1 with build-id 321a1f9b5537883ee8ec04c65a9edbaefcc7b5aa
Module libgpg-error.so.0 with build-id 9d27198f0ca61c66cd921675219dffc0bad16a1a
Module libresolv.so.2 with build-id dd26798426928fb454335411ecfeb883030b1f6c
Module libcrypto.so.3 with build-id 5a47668cb7ac23dbdfcce8a8a6923484fd67d8a5
Module libkeyutils.so.1 with build-id 83c6539bd0d3140678ba836b8baa1b215efa2632
Module libkrb5support.so.0 with build-id 22c23607e8875f3b081adbc7fe9fbd612b7a57a5
Module libm.so.6 with build-id c0eb573a2171d96b1aa970edb07f3368573bf845
Module libz.so.1 with build-id a39f7a92539115971debc39f2f9b66b74f8f7bb8
Module ld-linux-x86-64.so.2 with build-id df9c6b298bf5e3c1d0eb6a0911f3f561908a704d
Module libgcrypt.so.20 with build-id 7f21916b83ba6859ff1392a52958f355567ae339
Module libcap.so.2 with build-id c7625c8524a3d7756043555a1e7b1c3cb56fabbe
Module liblz4.so.1 with build-id 4d32cb5fa39c86b05cc10cc380f3a8a0d6d9d648
Module libzstd.so.1 with build-id f0c68ad1b3f8941857af47c6887736d835317ccc
Module liblzma.so.5 with build-id 330eb2fe0769e5466e2e0ac1b158e1e8452738c9
Module libcom_err.so.2 with build-id ec70fb11e14fe7dadde8353e95592eb7b8bd4b3a
Module libk5crypto.so.3 with build-id bd537be81f12497f2d5b8a590665ce28c303b85c
Module libkrb5.so.3 with build-id 8c62715e7b422618177de85f20fbc3a89128f06c
Module libgssapi_krb5.so.2 with build-id 4dae28e73361fa8c8b216353852acd992e669a06
Module libc.so.6 with build-id 82f7ae28e16376aa97cc3bf50b40ab2d1043924a
Module libgcc_s.so.1 with build-id 9526c65fed0e95fbb6b988476cc811ca19d5c9c9
Module libautofs.so with build-id abcf609c82d95711cd1563aa7504f263b528660e
Module libxml2.so.2 with build-id 3175d5777b54e42141250543b6acc4794da1b104
Module libsystemd.so.0 with build-id 0cce699958c66324d0a1bb698c28da0911b749f4
Module libtirpc.so.3 with build-id 6a25d54850681edbfacb53f44df71f4e1fa7b52b
Module automount with build-id 6c2c4e55d530a205f518be13e851d994c76e00fb
Stack trace of thread 1264:
#0 0x00007f39bd5cc510 n/a (n/a + 0x0)
#1 0x0000000000000000 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.el9
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/automount...
Reading symbols from /usr/lib/debug/usr/sbin/automount-5.1.7-32.el9_1.1.x86_64.debug...
[New LWP 1264]
[New LWP 822]
[New LWP 825]
[New LWP 826]
[New LWP 845]
[New LWP 1265]
[New LWP 1266]
[New LWP 838]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/automount --systemd-service --dont-check-daemon'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f39bd5cc510 in ?? ()
[Current thread is 1 (Thread 0x7f39bddd9640 (LWP 1264))]
(gdb) bt
#0 0x00007f39bd5cc510 in ?? ()
#1 0x00007f39c08b2931 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74
#2 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23
#3 0x00007f39c08b56d6 in start_thread (arg=<optimized out>) at pthread_create.c:454
#4 0x00007f39c0855450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)
---
Hope that helps.
Regards,
Joerg
(In reply to Joerg from comment #7) > Hello again, > After tinkering with gdb for a while I got this: Right, snip ... > Core was generated by `/usr/sbin/automount --systemd-service > --dont-check-daemon'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f39bd5cc510 in ?? () > [Current thread is 1 (Thread 0x7f39bddd9640 (LWP 1264))] > (gdb) bt > #0 0x00007f39bd5cc510 in ?? () > #1 0x00007f39c08b2931 in __GI___nptl_deallocate_tsd () at > nptl_deallocate_tsd.c:74 > #2 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23 > #3 0x00007f39c08b56d6 in start_thread (arg=<optimized out>) at > pthread_create.c:454 > #4 0x00007f39c0855450 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > (gdb) This is what I expected, it's not conclusive because we don't know who owns those Thread Specific Data (tsd) keys. It's the same signature we saw for the sssd bug and another bug that appeared to be a mistake exposed by a newer version of glibc. But that later one was fixed in revision 32. There were some other changes that went into revision 34 but I'm pretty sure this is a result of a large change I had to do recently for a customer that had over 32000 exports from an NFS server and the existing code just wasn't written to handle that many exports. Point being it was a significant change and a couple of bugs got though testing. We'll see Monday. Ian Ok, I was able to reproduce this on my f37 machine here at home after building and installing from my rawhide package source. It looks like this is the sssd bug. I don't have sssd configured on my machine here either but it still crashes. I'm not sure what version of sssd this was introduced in but I have libsss_autofs-2.8.2-1.fc37.x86_64 on f37 which is broken. It was fixed in sssd revision 2.8.2-2. I expect you have: automount: sss files in /etc/nsswitch.conf. Changing this to either: automount: files or automount: files sss worked around the problem for me. Give it a try and see if you get the same results as me, ;) Ian Hi Ian, My /etc/nsswitch.conf contains: ~~~ # grep automount /etc/nsswitch.conf automount: files sss ~~~ I have to change it to `automount: files` to work around the problem. In case sss stays on the line the service keeps crashing on relaod. Joerg (In reply to Joerg from comment #10) > Hi Ian, > My /etc/nsswitch.conf contains: > > ~~~ > # grep automount /etc/nsswitch.conf > automount: files sss > ~~~ > > I have to change it to `automount: files` to work around the problem. In > case sss stays on the line the service keeps crashing on relaod. Either way it's probably sss is this case but there are a couple of other things that need to be fixed. I'm pretty sure that it's fixed in RHEL 2.8.2-2.el9, the package revision numbers don't match in Fedora, it has 2.8.2-4, IIRC, and it doesn't have the fix. In any case 2.8.2-2 is needed on RHEL. Applying the changes to RHEL is going to take a couple of days, there were two other changes I have scheduled for RHEL-9.3.0 and I want to keep the order the same between RHEL-8 and RHEL-9, and there's the CI testing that needs to be done for each change. Ian (In reply to Ian Kent from comment #11) > In any case 2.8.2-2 is needed on RHEL. > > Applying the changes to RHEL is going to take a couple of days, there were > two other changes I have scheduled for RHEL-9.3.0 and I want to keep the > order the same between RHEL-8 and RHEL-9, and there's the CI testing that > needs to be done for each change. > > Ian Thanks for letting me know. As I have two workarounds: 1. Change 'automount: files sss' to 'automount: files' OR 2. Run 'systemctl restart autofs' instead of 'systemctl reload autofs' It's not urgent on my end. I'll monitor this Bugzilla and the linked one I filed against Rawhide to see when the fixed version is available and released. Cheers, Joerg (In reply to Joerg from comment #12) > (In reply to Ian Kent from comment #11) > > In any case 2.8.2-2 is needed on RHEL. > > > > Applying the changes to RHEL is going to take a couple of days, there were > > two other changes I have scheduled for RHEL-9.3.0 and I want to keep the > > order the same between RHEL-8 and RHEL-9, and there's the CI testing that > > needs to be done for each change. > > > > Ian > > Thanks for letting me know. As I have two workarounds: > > 1. Change 'automount: files sss' to 'automount: files' OR > 2. Run 'systemctl restart autofs' instead of 'systemctl reload autofs' > > It's not urgent on my end. I'll monitor this Bugzilla and the linked one I > filed against Rawhide to see when the fixed version is available and > released. What Fedora release do you need? The change I applied should make it's way onto the mirrors fairly quickly, it's revision autofs-5.1.8-20.fc39. So Fedora has (or will have) the same changes as RHEL and CentOS. We can go back 2 Fedora releases but I'm not sure which release the sssd regression was introduced. Ian (In reply to Ian Kent from comment #13) > What Fedora release do you need? > > The change I applied should make it's way onto the mirrors fairly quickly, > it's > revision autofs-5.1.8-20.fc39. So Fedora has (or will have) the same changes > as > RHEL and CentOS. > > We can go back 2 Fedora releases but I'm not sure which release the sssd > regression was introduced. I would need it in F37, but I could wait for F38 in case it causes too much trouble to bring it back to F37. IMHO it's more important to get the fix for RHEL 9. Joerg (In reply to Joerg from comment #14) > (In reply to Ian Kent from comment #13) > > What Fedora release do you need? > > > > The change I applied should make it's way onto the mirrors fairly quickly, > > it's > > revision autofs-5.1.8-20.fc39. So Fedora has (or will have) the same changes > > as > > RHEL and CentOS. > > > > We can go back 2 Fedora releases but I'm not sure which release the sssd > > regression was introduced. > > I would need it in F37, but I could wait for F38 in case it causes too much > trouble to bring it back to F37. > IMHO it's more important to get the fix for RHEL 9. The big question is when was the sssd regression introduced. I can update autofs back to f37 but we need the sssd folks to fix any broken releases in Fedora. I'll see what I can find wrt. the sssd version as I go. RHEL is a very different proposition, we'll discus that in the RHEL-9 bug once I get the update done for RHEL-9.3.0. Ian Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |
Created attachment 1953345 [details] Coredump of automounter process Description of problem: Running the command `systemctl reload autofs.service` returns to the prompt without any output (no news are good news, right?). But checking the service status by running `systemctl status autofs.service` shows that the main process exited. Instead of reloading the config the status changed from active to failed. To recover from this issue run `systemctl start autofs.service`. Version-Release number of selected component (if applicable): autofs-5.1.7-32.el9_1.1.x86_64 libsss_autofs-2.7.3-4.el9.x86_64 kernel 5.14.0-162.6.1.el9_1.x86_64 How reproducible: This issue is reproducible on any fresh install of RHEL 9.1. Steps to Reproduce: 1. Take a fresh install of RHEL 9.1 (e.g. with minimal environment). 2. Run `# dnf in autofs`. 3. Run `# systemctl enable --now autofs`. 4. Check that `# systemctl status autofs` returns active and running. 5. Run `# systemctl reload autofs`. 6. Run `# systemctl status autofs` again. Actual results: ~~~ [root@rhel91-2023-03-24 ~]# systemctl reload autofs [root@rhel91-2023-03-24 ~]# systemctl --no-pager status autofs × autofs.service - Automounts filesystems on demand Loaded: loaded (/usr/lib/systemd/system/autofs.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Fri 2023-03-24 10:11:15 CET; 2s ago Duration: 33.915s Process: 14310 ExecStart=/usr/sbin/automount $OPTIONS --systemd-service --dont-check-daemon (code=killed, signal=SEGV) Process: 14324 ExecReload=/usr/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Main PID: 14310 (code=killed, signal=SEGV) CPU: 23ms Mar 24 10:10:41 rhel91-2023-03-24 systemd[1]: Starting Automounts filesystems on demand... Mar 24 10:10:41 rhel91-2023-03-24 systemd[1]: Started Automounts filesystems on demand. Mar 24 10:11:14 rhel91-2023-03-24 systemd[1]: Reloading Automounts filesystems on demand... Mar 24 10:11:14 rhel91-2023-03-24 systemd[1]: Reloaded Automounts filesystems on demand. Mar 24 10:11:15 rhel91-2023-03-24 systemd[1]: autofs.service: Main process exited, code=killed, status=11/SEGV Mar 24 10:11:15 rhel91-2023-03-24 systemd[1]: autofs.service: Failed with result 'signal'. ~~~ Expected results: Service should reload without segfault. Additional info: I have reproduced this issue on a new install using a test system. I attached the systemd-coredump, a screenshot from console and sos report to this bugzilla. They contain no secrets as it's a pure test system.