Bug 2137959
| Summary: | sbcl segfaults at startup | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Alexey Dobriyan <adobriyan> | ||||||||||||
| Component: | sbcl | Assignee: | Rex Dieter <rdieter> | ||||||||||||
| Status: | ASSIGNED --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 38 | CC: | ben.kreuter, green, hiredman, james, koppe, loganjerry, luca.giuzzi, rdieter, redhat | ||||||||||||
| Target Milestone: | --- | Keywords: | Reopened | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2023-05-25 18:29:25 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 1920582 [details]
strace -f -o 1.log sbcl
Created attachment 1920584 [details]
strace -f -o 1.log sbcl (SIGSEGV)
It is like 90% reproducible. In fact the first time it worked was when I tried to collect strace log for this bug. setarch -R doesn't help. I am seeing the same problem on F37, x86_64. After a few tries SBCL starts up no problem. I see the same problem. It also affects maxima (which is based upon sbcl). Incidentally: I see the problem on a machine with the following cpu model name : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz but I do not see the issue on a computer with the following model name : Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz Both machines have exactly the same sw configuration (generated by a kickstart file, so I am sure it is the same) and have been updated at the same time. I'm not alone! FWIW, I'm using this official SBCL build without problems: $ sha256sum sbcl-2.2.10-x86-64-linux-binary.tar.bz2 7dde88ec2db3ca1012aa158366c3794a6170cf012e321a19339d422aa9309af6 sbcl-2.2.10-x86-64-linux-binary.tar.bz2 Luca, can you post /proc/cpuinfo from both i5 and i7? Created attachment 1944423 [details]
cpuinfo i7 (not working)
Created attachment 1944424 [details]
cpuinfo i5 (working)
Witnessing the same on 12th Gen Intel(R) Core(TM) i7-1260P. SBCL 2.3.0 official build works fine. Created attachment 1944742 [details]
cpuinfo i7 (old, working)
Incidentally, on an older i7 I do not have any problem at all.
I attach the relevant cpuinfo.
Both i7s are DELL laptops (xps15 9550 the old one and xps 13 9305 the new one which has issues)
By doing ltrace between instances which run and instances which crash of sbcl...
An instance where the interpreter runs has:
sigaction(SIGSEGV, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil) = 0
sigismember(<0-2,12-14,16,19,22-28>, SIGUSR2) = 0
sigaction(SIGUSR2, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil) = 0
mmap64(0, 0x452178, 7, 0x4022) = 0x7fe41eb6c000
sem_init(0x7fe41ef7a000, 0, 1, 0x7fe41eb70000) = 0
sem_init(0x7fe41ef7a020, 0, 0, 0x7fe41eb70000) = 0
sem_init(0x7fe41ef7a040, 0, 0, 0x7fe41eb70000) = 0
sysconf(250, 0, 0, 0x7fe41eb70000) = 0x38c0
sigaltstack(0x7fff53b9fbf0, 0, 0, 0x7fe41eb70000) = 0
pthread_self(0x7fff53b9fbf0, 0, 0, 0x7fe42c62f33b) = 0x7fe42c5f0b80
mprotect(0x7fe41eb70000, 0x8000, 0, 0x7fe42c62f33b) = 0
mprotect(0x7fe41ee68000, 0x8000, 0, 0x8000) = 0
mprotect(0x7fe41ee70000, 0x8000, 0, 0x7fe42c6f6b4b) = 0
mprotect(0x7fe41eb78000, 0x8000, 5, 0x7fe41eb78000) = 0
mprotect(0x7fe41ee60000, 0x8000, 0, 0x8000) = 0
mprotect(0x7fe41ee78000, 0x8000, 0, 0x7fe41ee78000) = 0
--- SIGSEGV (Segmentation fault) ---
__errno_location() = 0x7fe42c5f0af0
mprotect(0x50344000, 4096, 7, 11) = 0
--- SIGSEGV (Segmentation fault) ---__errno_location() = 0x7fe42c5f0af0
mprotect(0x50340000, 4096, 7, 11) = 0
pthread_mutex_lock(0x44df20, 1, 0x7fe41ef72068, 0) = 0
pthread_mutex_unlock(0x44df20, 0, 0x7fe41efbf010, 0x7fe41efbf520) = 0
--- SIGSEGV (Segmentation fault) ---
__errno_location() = 0x7fe42c5f0af0
(and so on)
a crashing instance has:
sigaction(SIGSEGV, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil) = 0
sigismember(<0-2,12-14,16,19,22-28>, SIGUSR2) = 0
sigaction(SIGUSR2, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil) = 0
mmap64(0, 0x452178, 7, 0x4022) = 0x7f6f6bfad000
sem_init(0x7f6f6c3ba000, 0, 1, 0x7f6f6bfb0000) = 0
sem_init(0x7f6f6c3ba020, 0, 0, 0x7f6f6bfb0000) = 0
sem_init(0x7f6f6c3ba040, 0, 0, 0x7f6f6bfb0000) = 0
sysconf(250, 0, 0, 0x7f6f6bfb0000) = 0x38c0
sigaltstack(0x7fff468b1110, 0, 0, 0x7f6f6bfb0000) = 0
pthread_self(0x7fff468b1110, 0, 0, 0x7f6f79ac833b) = 0x7f6f79a89b80
mprotect(0x7f6f6bfb0000, 0x8000, 0, 0x7f6f79ac833b) = 0
mprotect(0x7f6f6c2a8000, 0x8000, 0, 0x8000) = 0
mprotect(0x7f6f6c2b0000, 0x8000, 0, 0x7f6f79b8fb4b) = 0
mprotect(0x7f6f6bfb8000, 0x8000, 5, 0x7f6f6bfb8000) = 0
mprotect(0x7f6f6c2a0000, 0x8000, 0, 0x8000) = 0
mprotect(0x7f6f6c2b8000, 0x8000, 0, 0x7f6f6c2b8000) = 0
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++
It seems the segmentation fault is not caught. Could it be a race condition?
I installed F37 in VM and sbcl doesn't segfault with default qemu 64-bit CPU (QEMU Virtual CPU version 2.5+). Segfaults with 5950X. On 5950X -cpu qemu64 # OK -cpu qemu64,pku # OK -cpu qemu64,xsave # OK -cpu qemu64,pku,xsave # segfault I've checked new cpuid flags from Luca's CPUs, and it is NOT clwb, fsrm, rdpi, rdt_a, sha_ni, umip, vaes, vpclmulqdq. Workaround is to reboot with "nopke" kernel command line option. [ 0.000000] Linux version 6.1.8-100.fc36.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Tue Jan 24 20:32:33 UTC 2023 [ 0.000000] Command line: ... nopku [ 0.127655] x86: 'nopku' specified, disabling Memory Protection Keys It is "nopku" of course. This is something Fedora-specific: $ cd /opt/sbcl-2.0.1-x86-64-linux $ ./src/runtime/sbcl --core ./output/sbcl.core doesn't segfault with vendor SBCL 2.0.1 (sbcl-2.0.1-x86-64-linux-binary.tar.bz2) I have recompiled sbcl from its git source (commit 1a729d6bfe476e1e8de323f89f5240748f6bf942) on f37 without the nopku option and it seems to pass all of its test and not to crash. Actually, the recompiled sbcl works but I see as dmesg output [80358.790590] x86/split lock detection: #AC: sbcl/253646 took a split_lock trap at address: 0x52396c19 [80376.387794] x86/split lock detection: #AC: sbcl/253781 took a split_lock trap at address: 0x52cb9b22 etc. etc. so I believe that there might be another problem waiting to happen. However, I am downloading the srpm of the fedora package now, to see if a recompile is enough to get it to run consistently. I can confirm that a recompiled version of sbcl from the srpm is still affected and does not run in a consistent way. BTW "nopku" host kernel doesn't work for me, only guest VM kernel "nopku": Essentially what pku/ospke feature does is bumping signal frame size: -cpu qemu64,xsave [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Enabled xstate features 0x3, context size is 576 bytes, using 'standard' format. [ 0.000000] signal: max sigframe size: 1520 -cpu qemu64,xsave,pku [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers' [ 0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8 [ 0.000000] x86/fpu: Enabled xstate features 0x203, context size is 2440 bytes, using 'standard' format. [ 0.000000] signal: max sigframe size: 3376 "nopku" in the guest kernel actually disables pku/ospke. "nopku" in the host kernel deletes "ospke" but doesn't change fpu state and sigframe calculations which are run very early. I'd say recomiling fedora kernel with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n should fix fedora sbcl magically. I am on fedora rawhide and hit this segfault issue a while go.
I grabbed 9d6b2242c5531503c3040c995247fefad56869ec (tagged sbcl-2.3.0) from the sbcl git repo, and ran:
CFLAGS="${CFLAGS} -D_GNU_SOURCE" ./make.sh
using that build with the run-sbcl.sh script I haven't seen any segfaults.
Koschei informed me that the pvs-sbcl package is failing to build in Rawhide, apparently due to this issue. This message is a reminder that Fedora Linux 36 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '36'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 36 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16. Fedora Linux 36 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. The bug is still present under fedora 38. Updating sbcl to a more recent version than the packaged one solves the issue. Is this package even actively maintained anymore? There are several open issues in regards to how outdated the version of SBCL on Fedora is and how many problems this is causing. SBCL 2.0.1 was released more than 3 years ago. I have opened a PR to update to version 2.3.5, which seems to fix the segfaulting issue. See https://src.fedoraproject.org/rpms/sbcl/pull-request/1. I have also attempted to contact the maintainers but have not received any response. Rex did a great job of maintaining this package, and the rest of the Lisp ecosystem, for many years. I think I may still be listed as a maintainer for sbcl, and if nobody else will, I will try to update it. I still use sbcl daily. It's been many years since I've done any packaging work, and will likely need to relearn the tools/process, but I'll try. I'd be happy to hand this off to folks more interested, I've limited time/availability these days. Thanks. Thank you for keeping it working for so long, Rex. Anthony, if you would like some help, feel free to contact me. I'm about to retire the only package I maintain that uses sbcl (pvs-sbcl), but wouldn't mind lending a hand. I saw that the PR mentioned in comment 29 had been merged, but not built. Since sbcl is on the list of packages to be retired due to failing to build for so long, I did the build. I hope I did not step on any toes. Anyway, sbcl should no longer segfault in Rawhide. Anthony, I am willing to walk you through the process of updating the package the next time that needs to be done. Contact me if you want me to do so. Anthony I can also work with you to help with keeping this package up to date and would be happy to be added as a maintainer (I already maintain emacs-slime which depends on this package and am familiar with the package update process). My availability is somewhat unpredictable but in general I should be able to do the work. |
Description of problem: sbcl segfaults at startup, unusable Version-Release number of selected component (if applicable): 2.0.1-8.fc36 How reproducible: 99.999999% Steps to Reproduce: $ sbcl Actual results: $ sbcl This is SBCL 2.0.1-8.fc36, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. Segmentation fault (core dumped) Expected results: Additional info: see strace log