Bug 124074

Summary: Loading a foreign function into SBCL crashes the kernel
Product: [Fedora] Fedora Reporter: Pete Chown <2>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: deweese, rdieter, rutledge.warren
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-05-27 12:09:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pete Chown 2004-05-23 20:22:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040518 Firefox/0.8

Description of problem:
SBCL (Steel Bank Common Lisp) provides a foreign function interface. 
This worked as expected on Fedora Core 1.  On Fedora Core 2,
attempting to use it crashes the kernel.  (It stops responding to the
keyboard and mouse, but there is no "oops" or panic message.)  As well
as being a nuisance in itself, this creates a local denial of service
risk.

Version-Release number of selected component (if applicable):
kernel-2.6.5-1.358

How reproducible:
Always

Steps to Reproduce:
1. Download and install the SBCL RPM (version 0.8.10) from
http://sourceforge.net/project/showfiles.php?group_id=1373

2. Ensure that /usr/lib/libpq.so is installed.  As far as I know,
Postgres has nothing to do with the problem, but the demo code
attempts to link against this library.

3. Create a file called test.lisp, containing the following line:

(load-foreign "/usr/lib/libpq.so")

4. Save work and switch to a text console!

5. Enter the following command:

sbcl --load test.lisp

Actual Results:  System stopped responding to the keyboard and mouse.

Expected Results:  The Lisp toplevel prompt should have appeared.

Additional info:

If I had to take a guess, I would say that this problem is related to
overcommitment of memory.  SBCL and CMUCL allocate large amounts of
address space which they do not actually use.  This unusual behaviour
may have shown up a hidden kernel issue.

I have tried setting /proc/sys/vm/overcommit_memory to 1, but this
makes no difference.

Comment 1 Pete Chown 2004-05-24 12:33:28 UTC
A couple of other things I have found out:

1.  Memory is not overcommitted -- I think that was probably a red
herring.  The machine I was using has 4G of swap; while SBCL allocates
a lot of memory, it is not that much, so there is no overcommitment. 
With a more typical amount of swap overcommitment might be expected to
occur.

2.  I wanted to verify that it really was a kernel hang, and not SBCL
disabling keyboard-generated signals for whatever reason.  Accordingly
I ran the command again, this time putting it in the background:

sbcl --load test.lisp &

As expected, I got the shell prompt back.  I pressed return a few
times.  At first, each press of the return key gave me a new shell
prompt.  However after perhaps half a second, pressing return stopped
having any effect.  At this point it was impossible to get a reaction
from the system -- I couldn't even switch virtual consoles --
suggesting that the kernel had indeed hung.  The only things that
still worked were keys like Num Lock, which operated the keyboard LED
as normal.

Comment 2 Dusty DeWeese 2004-05-25 12:28:30 UTC
I have also been having trouble with SBCL.  It locks up the kernel
completely whenever loading any larger ammount of code.  Not sure if
this is related, but CLISP and CMUCL segfault under FC2 also.  CLISP
segfaults when loading code, and CMUCL segfaults immediately when ran.
 None of this happened under FC1.  This is on an i686.

Comment 3 Pete Chown 2004-05-25 12:48:23 UTC
Hi, sorry to hear you're having trouble too.  I just tried rebuilding
the SBCL RPM and again I got a lock-up.  This, together with your
experience, makes me think that it isn't specific to the foreign
function interface.  Probably anything that makes SBCL do a
significant amount of computation triggers the bug.

I also get the CMUCL segfault.  Clisp wasn't reliable for me on FC1
either, and as far as I can tell it is about the same on FC2.

Comment 4 Dusty DeWeese 2004-05-25 16:24:27 UTC
I have compiled the vanilla 2.6.6 kernel with selinux, but without the
4K stack.  SBCL, CMUCL, and Clisp work now.  Maybe its the 4K stack? 
I don't know much about lisp compilers.  I know SBCL shouldn't cause
linux to hang, though.

Comment 5 Arjan van de Ven 2004-05-25 16:26:22 UTC
it's more likely the 4g/4g split patch.

Can any of you guys try the kernel from
http://people.redhat.com/arjanv/2.6 since we fixed at least one
crasher bug with that patch ?

Comment 6 Dusty DeWeese 2004-05-25 18:29:23 UTC
I tried the kernel from the previous comment, and SBCL seems to work
correctly now.  It doesn't crash when loading defsystem-3.4i.  CMUCL
and Clisp, however, still segfault.  Maybe those problems are unrelated.

Comment 7 Pete Chown 2004-05-25 19:22:54 UTC
I'm seeing the same as Dusty -- SBCL works, CMUCL doesn't.  I don't
expect the CMUCL problem is the kernel's fault.  If the 4g/4g patch
causes some addresses to move around, it's likely that CMUCL needs
rebuilding.

Unfortunately Lisps tend to be very sensitive to the details of the
system where they are built.  Because they are loaded in effect from a
core file, if anything at all moves the addresses will be wrong.

Rebuilding CMUCL is, apparently, a big undertaking. :-( I've never
tried to do it, but difficulties in this area were the reason for SBCL
forking off in the first place.

I'm very impressed with the support -- two days to provide a patched
kernel is amazing.  Thank you.  Do you know how long it's likely to be
before a fixed kernel is made available as an official Fedora update?

Feel free to close the bug; the issue is resolved as far as I am
concerned.

Comment 8 Dave Jones 2004-05-31 21:57:40 UTC
*** Bug 124828 has been marked as a duplicate of this bug. ***