Bug 2137959 - sbcl segfaults at startup
Summary: sbcl segfaults at startup
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: sbcl
Version: 38
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Rex Dieter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-26 17:58 UTC by Alexey Dobriyan
Modified: 2023-07-12 17:05 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-25 18:29:25 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace -f -o 1.log sbcl (22.79 KB, text/plain)
2022-10-26 17:58 UTC, Alexey Dobriyan
no flags Details
strace -f -o 1.log sbcl (SIGSEGV) (11.35 KB, text/plain)
2022-10-26 18:00 UTC, Alexey Dobriyan
no flags Details
cpuinfo i7 (not working) (14.07 KB, text/plain)
2023-02-15 21:27 UTC, Luca Giuzzi
no flags Details
cpuinfo i5 (working) (11.78 KB, text/plain)
2023-02-15 21:27 UTC, Luca Giuzzi
no flags Details
cpuinfo i7 (old, working) (11.64 KB, text/plain)
2023-02-17 11:11 UTC, Luca Giuzzi
no flags Details

Description Alexey Dobriyan 2022-10-26 17:58:15 UTC
Description of problem:
sbcl segfaults at startup, unusable


Version-Release number of selected component (if applicable):
2.0.1-8.fc36


How reproducible:
99.999999%


Steps to Reproduce:

    $ sbcl

Actual results:

$ sbcl
This is SBCL 2.0.1-8.fc36, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
Segmentation fault (core dumped)


Expected results:


Additional info:
see strace log

Comment 1 Alexey Dobriyan 2022-10-26 17:58:56 UTC
Created attachment 1920582 [details]
strace -f -o 1.log sbcl

Comment 2 Alexey Dobriyan 2022-10-26 18:00:14 UTC
Created attachment 1920584 [details]
strace -f -o 1.log sbcl (SIGSEGV)

Comment 3 Alexey Dobriyan 2022-10-26 18:01:53 UTC
It is like 90% reproducible.

In fact the first time it worked was when I tried to collect strace log for this bug.

Comment 4 Alexey Dobriyan 2022-10-26 18:13:57 UTC
setarch -R doesn't help.

Comment 5 Benjamin Kreuter 2023-01-07 15:09:45 UTC
I am seeing the same problem on F37, x86_64.  After a few tries SBCL starts up no problem.

Comment 6 Luca Giuzzi 2023-02-14 22:59:34 UTC
I see the same problem. It also affects maxima (which is based upon sbcl).
Incidentally: I see the problem on a machine with the following cpu

model name	: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

but I do not see the issue on a computer with the following 

model name	: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz

Both machines have exactly the same sw configuration (generated by a kickstart file, so I am sure
it is the same) and have been updated at the same time.

Comment 7 Alexey Dobriyan 2023-02-15 15:41:03 UTC
I'm not alone!

FWIW, I'm using this official SBCL build without problems:

$ sha256sum sbcl-2.2.10-x86-64-linux-binary.tar.bz2
7dde88ec2db3ca1012aa158366c3794a6170cf012e321a19339d422aa9309af6  sbcl-2.2.10-x86-64-linux-binary.tar.bz2

Comment 8 Alexey Dobriyan 2023-02-15 15:42:18 UTC
Luca, can you post /proc/cpuinfo from both i5 and i7?

Comment 9 Luca Giuzzi 2023-02-15 21:27:06 UTC
Created attachment 1944423 [details]
cpuinfo i7 (not working)

Comment 10 Luca Giuzzi 2023-02-15 21:27:54 UTC
Created attachment 1944424 [details]
cpuinfo i5 (working)

Comment 11 stefanvdwalt 2023-02-17 07:54:34 UTC
Witnessing the same on 12th Gen Intel(R) Core(TM) i7-1260P. SBCL 2.3.0 official build works fine.

Comment 12 Luca Giuzzi 2023-02-17 11:11:30 UTC
Created attachment 1944742 [details]
cpuinfo i7 (old, working)

Incidentally, on an older i7 I do not have any problem at all.
I attach the relevant cpuinfo.

Both i7s are DELL laptops (xps15 9550 the old one and xps 13 9305 the new one which has issues)

Comment 13 Luca Giuzzi 2023-02-17 14:04:51 UTC
By doing ltrace between instances which run and instances which crash of sbcl... 
An instance where the interpreter runs has:
sigaction(SIGSEGV, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil)                      = 0
sigismember(<0-2,12-14,16,19,22-28>, SIGUSR2)                                               = 0
sigaction(SIGUSR2, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil)                      = 0
mmap64(0, 0x452178, 7, 0x4022)                                                              = 0x7fe41eb6c000
sem_init(0x7fe41ef7a000, 0, 1, 0x7fe41eb70000)                                              = 0
sem_init(0x7fe41ef7a020, 0, 0, 0x7fe41eb70000)                                              = 0
sem_init(0x7fe41ef7a040, 0, 0, 0x7fe41eb70000)                                              = 0
sysconf(250, 0, 0, 0x7fe41eb70000)                                                          = 0x38c0
sigaltstack(0x7fff53b9fbf0, 0, 0, 0x7fe41eb70000)                                           = 0
pthread_self(0x7fff53b9fbf0, 0, 0, 0x7fe42c62f33b)                                          = 0x7fe42c5f0b80
mprotect(0x7fe41eb70000, 0x8000, 0, 0x7fe42c62f33b)                                         = 0
mprotect(0x7fe41ee68000, 0x8000, 0, 0x8000)                                                 = 0
mprotect(0x7fe41ee70000, 0x8000, 0, 0x7fe42c6f6b4b)                                         = 0
mprotect(0x7fe41eb78000, 0x8000, 5, 0x7fe41eb78000)                                         = 0
mprotect(0x7fe41ee60000, 0x8000, 0, 0x8000)                                                 = 0
mprotect(0x7fe41ee78000, 0x8000, 0, 0x7fe41ee78000)                                         = 0
--- SIGSEGV (Segmentation fault) ---
__errno_location()                                                                          = 0x7fe42c5f0af0
mprotect(0x50344000, 4096, 7, 11)                                                           = 0
--- SIGSEGV (Segmentation fault) ---__errno_location()                                                                          = 0x7fe42c5f0af0
mprotect(0x50340000, 4096, 7, 11)                                                           = 0
pthread_mutex_lock(0x44df20, 1, 0x7fe41ef72068, 0)                                          = 0
pthread_mutex_unlock(0x44df20, 0, 0x7fe41efbf010, 0x7fe41efbf520)                           = 0
--- SIGSEGV (Segmentation fault) ---
__errno_location()                                                                          = 0x7fe42c5f0af0

(and so on)
a crashing instance has:
sigaction(SIGSEGV, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil)                      = 0
sigismember(<0-2,12-14,16,19,22-28>, SIGUSR2)                                               = 0
sigaction(SIGUSR2, { 0x4171c0, <0-2,11-14,16,19,22-28>, 0, nil }, nil)                      = 0
mmap64(0, 0x452178, 7, 0x4022)                                                              = 0x7f6f6bfad000
sem_init(0x7f6f6c3ba000, 0, 1, 0x7f6f6bfb0000)                                              = 0
sem_init(0x7f6f6c3ba020, 0, 0, 0x7f6f6bfb0000)                                              = 0
sem_init(0x7f6f6c3ba040, 0, 0, 0x7f6f6bfb0000)                                              = 0
sysconf(250, 0, 0, 0x7f6f6bfb0000)                                                          = 0x38c0
sigaltstack(0x7fff468b1110, 0, 0, 0x7f6f6bfb0000)                                           = 0
pthread_self(0x7fff468b1110, 0, 0, 0x7f6f79ac833b)                                          = 0x7f6f79a89b80
mprotect(0x7f6f6bfb0000, 0x8000, 0, 0x7f6f79ac833b)                                         = 0
mprotect(0x7f6f6c2a8000, 0x8000, 0, 0x8000)                                                 = 0
mprotect(0x7f6f6c2b0000, 0x8000, 0, 0x7f6f79b8fb4b)                                         = 0
mprotect(0x7f6f6bfb8000, 0x8000, 5, 0x7f6f6bfb8000)                                         = 0
mprotect(0x7f6f6c2a0000, 0x8000, 0, 0x8000)                                                 = 0
mprotect(0x7f6f6c2b8000, 0x8000, 0, 0x7f6f6c2b8000)                                         = 0
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

It seems the segmentation fault is not caught. Could it be a race condition?

Comment 14 Alexey Dobriyan 2023-02-17 17:54:57 UTC
I installed F37 in VM and sbcl doesn't segfault with default qemu 64-bit CPU (QEMU Virtual CPU  version 2.5+).

Segfaults with 5950X.

Comment 15 Alexey Dobriyan 2023-02-17 19:08:21 UTC
On 5950X

-cpu qemu64              # OK
-cpu qemu64,pku          # OK
-cpu qemu64,xsave        # OK
-cpu qemu64,pku,xsave    # segfault

I've checked new cpuid flags from Luca's CPUs, and it is NOT clwb, fsrm, rdpi, rdt_a, sha_ni, umip, vaes, vpclmulqdq.

Comment 16 Alexey Dobriyan 2023-02-17 19:21:48 UTC
Workaround is to reboot with "nopke" kernel command line option.

[    0.000000] Linux version 6.1.8-100.fc36.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Tue Jan 24 20:32:33 UTC 2023

[    0.000000] Command line: ... nopku

[    0.127655] x86: 'nopku' specified, disabling Memory Protection Keys

Comment 17 Alexey Dobriyan 2023-02-17 19:22:20 UTC
It is "nopku" of course.

Comment 18 Alexey Dobriyan 2023-02-17 19:42:39 UTC
This is something Fedora-specific:

$ cd /opt/sbcl-2.0.1-x86-64-linux
$ ./src/runtime/sbcl --core ./output/sbcl.core

doesn't segfault with vendor SBCL 2.0.1 (sbcl-2.0.1-x86-64-linux-binary.tar.bz2)

Comment 19 Luca Giuzzi 2023-02-18 06:34:15 UTC
I have recompiled sbcl from its git source (commit 1a729d6bfe476e1e8de323f89f5240748f6bf942) on f37 without the nopku option and it seems to pass all of its test and not to crash.

Comment 20 Luca Giuzzi 2023-02-18 06:43:13 UTC
Actually, the recompiled sbcl works but I see as dmesg output
[80358.790590] x86/split lock detection: #AC: sbcl/253646 took a split_lock trap at address: 0x52396c19
[80376.387794] x86/split lock detection: #AC: sbcl/253781 took a split_lock trap at address: 0x52cb9b22
etc. etc.

so I believe that there might be another problem waiting to happen.
However, I am downloading the srpm of the fedora package now, to see if a recompile is enough to get it to run consistently.

Comment 21 Luca Giuzzi 2023-02-18 06:56:49 UTC
I can confirm that a recompiled version of sbcl from the srpm is still affected and does not run in a consistent way.

Comment 22 Alexey Dobriyan 2023-02-18 19:11:20 UTC
BTW "nopku" host kernel doesn't work for me, only guest VM kernel "nopku":

Essentially what pku/ospke feature does is bumping signal frame size:


-cpu qemu64,xsave
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Enabled xstate features 0x3, context size is 576 bytes, using 'standard' format.
[    0.000000] signal: max sigframe size: 1520


-cpu qemu64,xsave,pku
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[    0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]:    8
[    0.000000] x86/fpu: Enabled xstate features 0x203, context size is 2440 bytes, using 'standard' format.
[    0.000000] signal: max sigframe size: 3376

"nopku" in the guest kernel actually disables pku/ospke.

"nopku" in the host kernel deletes "ospke" but doesn't change fpu state and sigframe calculations which are run very early.

I'd say recomiling fedora kernel with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n should fix fedora sbcl magically.

Comment 23 Kevin Downey 2023-03-09 22:25:12 UTC
I am on fedora rawhide and hit this segfault issue a while go.

I grabbed 9d6b2242c5531503c3040c995247fefad56869ec (tagged sbcl-2.3.0) from the sbcl git repo, and ran:

CFLAGS="${CFLAGS} -D_GNU_SOURCE" ./make.sh

using that build with the run-sbcl.sh script I haven't seen any segfaults.

Comment 24 Jerry James 2023-04-03 19:32:06 UTC
Koschei informed me that the pvs-sbcl package is failing to build in Rawhide, apparently due to this issue.

Comment 25 Ben Cotton 2023-04-25 18:07:55 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 26 Ludek Smid 2023-05-25 18:29:25 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 27 Luca Giuzzi 2023-05-26 16:23:05 UTC
The bug is still present under fedora 38.
Updating sbcl to a more recent version than the packaged one solves the issue.

Comment 28 Alexander Koppe 2023-06-12 23:22:49 UTC
Is this package even actively maintained anymore? There are several open issues in regards to how outdated the version of SBCL on Fedora is and how many problems this is causing. SBCL 2.0.1 was released more than 3 years ago.

Comment 29 Jerry James 2023-06-13 22:47:20 UTC
I have opened a PR to update to version 2.3.5, which seems to fix the segfaulting issue.  See https://src.fedoraproject.org/rpms/sbcl/pull-request/1.

Comment 30 Benjamin Kreuter 2023-06-14 13:33:15 UTC
I have also attempted to contact the maintainers but have not received any response.

Comment 31 Anthony Green 2023-06-14 14:04:40 UTC
Rex did a great job of maintaining this package, and the rest of the Lisp ecosystem, for many years.  I think I may still be listed as a maintainer for sbcl, and if nobody else will, I will try to update it.  I still use sbcl daily.  
It's been many years since I've done any packaging work, and will likely need to relearn the tools/process, but I'll try.

Comment 32 Rex Dieter 2023-06-14 19:40:24 UTC
I'd be happy to hand this off to folks more interested, I've limited time/availability these days.  Thanks.

Comment 33 Jerry James 2023-06-30 21:36:43 UTC
Thank you for keeping it working for so long, Rex.  Anthony, if you would like some help, feel free to contact me.  I'm about to retire the only package I maintain that uses sbcl (pvs-sbcl), but wouldn't mind lending a hand.

Comment 34 Jerry James 2023-07-11 16:43:03 UTC
I saw that the PR mentioned in comment 29 had been merged, but not built.  Since sbcl is on the list of packages to be retired due to failing to build for so long, I did the build.  I hope I did not step on any toes.  Anyway, sbcl should no longer segfault in Rawhide.

Anthony, I am willing to walk you through the process of updating the package the next time that needs to be done.  Contact me if you want me to do so.

Comment 35 Benjamin Kreuter 2023-07-12 17:05:06 UTC
Anthony I can also work with you to help with keeping this package up to date and would be happy to be added as a maintainer (I already maintain emacs-slime which depends on this package and am familiar with the package update process).  My availability is somewhat unpredictable but in general I should be able to do the work.


Note You need to log in before you can comment on or make changes to this bug.