Bug 220053 - sbcl: ppc issues
Summary: sbcl: ppc issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sbcl
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Rex Dieter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F-ExcludeArch-ppc
TreeView+ depends on / blocked
 
Reported: 2006-12-18 18:10 UTC by Rex Dieter
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-07 18:57:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Always assume page size is 64KiB on PowerPC. (571 bytes, patch)
2006-12-28 19:02 UTC, David Woodhouse
no flags Details | Diff
Updated patch for 64KiB page size. (2.45 KB, patch)
2006-12-28 23:58 UTC, David Woodhouse
no flags Details | Diff
Fix handling of the page-table core entry with large page sizes (768 bytes, patch)
2006-12-29 13:52 UTC, Juho Snellman
no flags Details | Diff
working (possibly) patch (3.22 KB, patch)
2006-12-29 22:50 UTC, David Woodhouse
no flags Details | Diff
patch fixing the test failures + cleaning up the other changes (9.12 KB, patch)
2006-12-30 16:59 UTC, Juho Snellman
no flags Details | Diff

Comment 1 Rex Dieter 2006-12-22 13:02:01 UTC
Crud, now seeing similar stuff on devel/fc7 trying to (re)build maxima:
http://buildsys.fedoraproject.org/build-status/job.psp?uid=24353

Maybe the latest sbcl/ppc is simply borked.  ):  

Comment 2 David Woodhouse 2006-12-22 18:04:00 UTC
Let me know if you need access to a PowerPC machine for debugging.

Comment 4 David Woodhouse 2006-12-28 11:12:32 UTC
What's the most recent release that worked? What changed? Again, let me know if
you need access to a PowerPC machine to debug this.

Comment 5 David Woodhouse 2006-12-28 18:38:54 UTC
I haven't seen a segfault but I've seen this:

This is SBCL 1.0, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
in core: 0x40000000 - in runtime: 0x4000000
fatal error encountered in SBCL pid 29213:
core/runtime address mismatch: READ_ONLY_SPACE_START

This is because sbcl assumes a that the page size will always match the result
of getpagesize() on the build host -- but the PPC ABI says it can be 64KiB and
(stupidly, IMO) that's what we did in FC-6.

I think that setting it to 64KiB unconditionally on PPC instead of using
getpagesize() probably ought to work. Testing that hypothesis now...

Comment 6 David Woodhouse 2006-12-28 19:02:23 UTC
Created attachment 144482 [details]
Always assume page size is 64KiB on PowerPC.

Comment 7 Rex Dieter 2006-12-28 20:14:29 UTC
Thanks David, I'll send the patch upstream asap.

Afterwhich, I guess we'll have to wait until upstream produces a fixed/patched
sbcl binary for bootstrapping.

Comment 8 David Woodhouse 2006-12-28 23:58:15 UTC
Created attachment 144512 [details]
Updated patch for 64KiB page size.

Building shouldn't be a problem -- we have machines with 4KiB pages on which we
can build.

The main problem is that the patch isn't sufficient -- there are a few more
places we make assumptions about the page size. I think that in _principle_ it
ought to work if we iron out the details, but we may need help from upstream to
do that. Here's a current patch (which will break non-PPC builds but it should
be simple enough to fix that if you know the language/environment).

Unfortunately it still doesn't work -- when I install it on a 64KiB-page host
and ask it to rebuild itself, it does this...

//entering make-host-1.sh
//building cross-compiler, and doing first genesis
This is SBCL 1.0.1, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* 
5
* *** glibc detected *** sbcl: free(): invalid next size (normal): 0x10054248
***
======= Backtrace: =========
/lib/libc.so.6[0xf4ee394]
/lib/libc.so.6(cfree+0xc8)[0xf4ee5e8]
sbcl(wrapped_readlink+0x90)[0x100127b4]
sbcl(call_into_c+0x70)[0x10018c58]
[0x81a4]
[0x1]
sbcl(funcall0+0x1c)[0x1001321c]
sbcl[0x10011a24]
sbcl(main+0x218)[0x10010bd4]
/lib/libc.so.6[0xf48dd4c]
/lib/libc.so.6(__libc_start_main+0x144)[0xf48df74]
======= Memory map: ========
00100000-00120000 r-xp 00100000 00:00 0 				 [vdso]

04000000-04010000 rwxp 00010000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
04010000-08000000 rwxp 04010000 00:00 0 
08000000-08010000 rwxp 00020000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
08010000-09800000 rwxp 08010000 00:00 0 
0a000000-0b000000 rwxp 0a000000 00:00 0 
0f340000-0f350000 r-xp 00000000 08:03 47417259				
/lib/libdl-2.5.so
0f350000-0f360000 r-xp 00000000 08:03 47417259				
/lib/libdl-2.5.so
0f360000-0f370000 rwxp 00010000 08:03 47417259				
/lib/libdl-2.5.so
0f390000-0f450000 r-xp 00000000 08:03 47417258				
/lib/libm-2.5.so
0f450000-0f460000 r-xp 000b0000 08:03 47417258				
/lib/libm-2.5.so
0f460000-0f470000 rwxp 000c0000 08:03 47417258				
/lib/libm-2.5.so
0f470000-0f5d0000 r-xp 00000000 08:03 47417253				
/lib/libc-2.5.so
0f5d0000-0f5e0000 r-xp 00160000 08:03 47417253				
/lib/libc-2.5.so
0f5e0000-0f5f0000 rwxp 00170000 08:03 47417253				
/lib/libc-2.5.so
0ffc0000-0ffe0000 r-xp 00000000 08:03 47417252				
/lib/ld-2.5.so
0ffe0000-0fff0000 r-xp 00010000 08:03 47417252				
/lib/ld-2.5.so
0fff0000-10000000 rwxp 00020000 08:03 47417252				
/lib/ld-2.5.so
10000000-10020000 r-xp 00000000 08:03 15836373				
/usr/bin/sbcl
10020000-10030000 rwxp 00020000 08:03 15836373				
/usr/bin/sbcl
10030000-10080000 rwxp 10030000 00:00 0 				 [heap]

40000000-40030000 rwxp 40000000 00:00 0 
40030000-40040000 ---p 40030000 00:00 0 
40040000-40050000 rwxp 40040000 00:00 0 
40050000-40240000 rwxp 40050000 00:00 0 
40240000-40250000 r-xp 40240000 00:00 0 
40250000-404e0000 rwxp 40250000 00:00 0 
4f000000-4f020000 rwxp 00030000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f020000-4f040000 r-xp 00050000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f040000-4f050000 rwxp 00070000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f050000-4f060000 r-xp 00080000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f060000-4f070000 r-xp 00090000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f070000-4f080000 rwxp 000a0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f080000-4f090000 r-xp 000b0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f090000-4f0a0000 rwxp 000c0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f0a0000-4f0e0000 r-xp 000d0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f0e0000-4f130000 rwxp 00110000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f130000-4f170000 r-xp 00160000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f170000-4f180000 r-xp 001a0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f180000-4f1a0000 rwxp 001b0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f1a0000-4f1d0000 r-xp 001d0000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f1d0000-4f1e0000 rwxp 00200000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f1e0000-4f270000 r-xp 00210000 08:03 15835950				
/usr/lib/sbcl/sbcl.core
4f270000-4f290000 rwxp 002a0000 08:03 15835950		  fatal error
encountered in SBCL pid 2436:
%primitive halt called; the party is over.

error: Bad exit status from /var/tmp/rpm-tmp.80549 (%build)


I can provide access to hosts with both 64KiB and 4KiB pages.

Comment 9 David Woodhouse 2006-12-29 02:10:57 UTC
The same build (with my patch) also fails on the 4KiB-page kernel, this time
with a SEGV:

Program received signal SIGSEGV, Segmentation fault.
0x1000802c in load_core_file (file=0x10026120 "", file_offset=0)
    at coreparse.c:353
353                         page_table[offset++].first_object_offset = data[i++];

It seems that page_table[] isn't large enough for what we're copying into it.
Fixing that (don't write if !data[i]) leads to another segfault in a function
called from funcall0() from initial_thread_trampoline() -- although that may
just be GC. Run it normally instead of under gdb and it just seems to go into an
endless loop eating CPU time.

Without my patch it all works fine on a 4KiB-page host, and it does _build_ OK
with my patch. I suspect it's just creating and parsing the core file which
isn't working correctly. Could do with more help from someone who's familiar
with the language and the runtime environment.


Comment 10 Juho Snellman 2006-12-29 13:52:23 UTC
Created attachment 144540 [details]
Fix handling of the page-table core entry with large page sizes

Hmm. Is it possible that there's a bug in the changes you made to coreparse.c?
Your patch + the attached one for coreparse works for me.

Comment 11 David Woodhouse 2006-12-29 20:41:28 UTC
Hm, I'm rarely incompetent enough to screw something as simple as that up, but
it's possible. Building with your version now to confirm, along with the other
changes in my patch of comment #8.

What platform did you try this on? I only changed page size definitions on
PowerPC, so if you want to try using 64KiB "pages" on other platforms you'll
need to change compiler/$CPU/{backend-,}parms.lisp accordingly.

If you mail me a SSH public key, I'll give you an account on PowerPC machines
with both 4KiB and 64KiB pages. Getting it to work with 64KiB pages on a host
which really only has 4KiB pages would be a good start though.

Comment 12 David Woodhouse 2006-12-29 22:50:55 UTC
Created attachment 144568 [details]
working (possibly) patch

OK, with this patch it does at least seem to build, although the results of the
self-tests look scary -- see http://david.woodhou.se/sbcl-64KiB-build.log

Someone more clueful than I would need to fix my hard-coded '65536' in
src/code/linux-os.lisp; it should be 65536 only for PowerPC. Then, if the build
output I linked above looks OK, we can apply this patch to the package?

Comment 13 Juho Snellman 2006-12-30 16:59:40 UTC
Created attachment 144583 [details]
patch fixing the test failures + cleaning up the other changes

The attached patch fixes the new test failures on ppc and cleans up the
Lisp-side changes.

I haven't been able to test it on the 65k ppc yet due to network problems, but
assuming it works there, I think this could be applied to the package. (And
something similar to this will probably be committed to the upstream SBCL for
the next release).

Comment 14 David Woodhouse 2006-12-30 19:45:19 UTC
Looks much better; thanks. Having built with that patch I then rebuilt on a
machine with 64KiB pages; results at
http://david.woodhou.se/sbcl-build-on-64KiB-page.log

I can reboot net2-101.woodhou.se onto a 64KiB page kernel if you'd like to do
more tests.

Comment 15 Juho Snellman 2006-12-30 20:43:05 UTC
No, that looks good enough. Thanks for looking at this, and for the ppc access.

Comment 16 Rex Dieter 2006-12-31 02:56:48 UTC
Excellent.  If David (or any other trusted Fedora Contributor for that matter),
can build, and create a binary ppc bootstrap (run binary-distribution.sh), and
make it available to me/fedora-buildsystem, we should(!) be good to go.

Comment 17 David Woodhouse 2006-12-31 11:57:24 UTC
http://david.woodhou.se/sbcl-1.0.1-binary.tar.bz2

4b8d12b891a6bb9b49aa1ca8262a5a75  sbcl-1.0.1-binary.tar.bz2

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQDVAwUARZelTcKjXUokOhMpAQLKJAX/RowyKIKJOzOulhzgzgG2pkUL6XTKYcNt
znhxHc6A9Wk7AQyW3F4enoFJNi0qqX1QFUXXSPJiDudVowyI+eI3+T7gtokLBQxT
J9wlxwdwu5B4UrVpCa7TpMDIEX2AtqVlKwyAOa4bG0bsJOagxj8XAt9P14qupUvN
Fpz8t3YsNU9Mb3qBIouYpXoyOkd5tIlkow/iy1SekNCBJRG2y9TqSnNKju+eYnCV
vD3sr2h6onScIs5EIQy73FBmy08O0Oio
=QteW
-----END PGP SIGNATURE-----


Comment 18 David Woodhouse 2006-12-31 11:58:24 UTC
I wonder if we should do this for x86_64 too -- doesn't the ABI allow 16KiB
pages there?

Comment 19 Rex Dieter 2007-01-01 03:18:45 UTC
Looks like we have a winner!
http://buildsys.fedoraproject.org/build-status/job.psp?uid=24817

Comment 20 Rex Dieter 2007-01-01 04:08:35 UTC
Good news:
FC-6 build humming along nicely:
http://buildsys.fedoraproject.org/build-status/job.psp?uid=24821

Bad news:
FC-5 build failed (same ppc segfault as before) ??
http://buildsys.fedoraproject.org/build-status/job.psp?uid=24820

EL-4 build failed due to GLIBC incompatibility with provided bootstrap
http://buildsys.fedoraproject.org/build-status/job.psp?uid=24819


Comment 21 David Woodhouse 2007-01-01 17:28:16 UTC
(In reply to comment #20)
> Bad news:
> FC-5 build failed (same ppc segfault as before) ??
> http://buildsys.fedoraproject.org/build-status/job.psp?uid=24820

Nah, that's glibc incompatibility too. No binaries built on FC-6 will run on
FC-5 or below due to the ld.so hash changes. I'll rebuild the bootstrap tarball
on RHEL4.



Comment 22 David Woodhouse 2007-01-01 22:05:14 UTC
http://david.woodhou.se/sbcl-1.0.1-binary-RHEL4-ppc.tar.bz2

88995b87e548be7b850cd3600cb5628c  sbcl-1.0.1-binary-RHEL4-ppc.tar.bz2

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQDVAwUARZmFUcKjXUokOhMpAQLuPAYAk9xCmq2qwI9EYPRfom5VUcxQ2rk5iRA4
hlzaMo+ZKgtxvl4cQIHMFPGsnJkiNXIm2suwgN8n3aG4MTJqtJf369LPeX9H05Wf
nj2JgiuAFY83OTO4mAsS3igu14Igm5vSTBp82RxobqCR2jhr8f4sCdUDdKz/QAHz
FCMB4mBhkWcgoY4WptJ1nnnJ7hz8NZfPt8RiDTaNtlr31nm8ktR7d2YsMErRDfLK
SsYiI01FLwUCnOLOgQvctQFGGztTWDai
=jn97
-----END PGP SIGNATURE-----


Comment 23 Rex Dieter 2007-01-05 12:57:59 UTC
Thanks (again) David. (I should have asked/realized about the fc6 binary
compatibility issues with previous releases)

Comment 24 David Woodhouse 2007-01-07 02:38:57 UTC
Did you get it built ok for FC5/RHEL4?

Comment 25 Rex Dieter 2007-01-07 18:57:31 UTC
yup, we are no good to go, closing.

No peep from upstream yet regarding this bug (and patch). ): 
See thread starting at:
http://sourceforge.net/mailarchive/message.php?msg_id=37803381

Comment 26 David Woodhouse 2007-01-08 04:00:36 UTC
Thanks.

Regarding upstream; I was sort of assuming that Juho, who fixed up my initial
half-baked patch, was involved with upstream and would take care of that angle.

There remains the possibility that we should do something similar on x86_64,
because I have a feeling that the ELF ABI there _also_ allows for pages larger
than 4KiB.

Comment 27 Juho Snellman 2007-01-08 04:47:33 UTC
Yes, to all of that :-)

I'm part of upstream, will eventually take care of getting this merged, and
something will still need to be done for x86-64, which allows for page sizes
between 4kB-64kB. But for the latter I want to first figure out which of the
page-size dependencies actually serve some useful purpose, and which ones don't.

Comment 28 Rex Dieter 2007-01-08 04:50:42 UTC
Juho, thanks.


Note You need to log in before you can comment on or make changes to this bug.