Bug 104029 - Possible bug in Radeon DRI driver
Possible bug in Radeon DRI driver
Status: CLOSED RAWHIDE
Product: Red Hat Linux Beta
Classification: Retired
Component: XFree86 (Show other bugs)
beta1
All Linux
medium Severity medium
: ---
: ---
Assigned To: Mike A. Harris
David Lawrence
:
: 101647 (view as bug list)
Depends On:
Blocks: CambridgeBlocker
  Show dependency treegraph
 
Reported: 2003-09-09 02:15 EDT by Nils Philippsen
Modified: 2007-04-18 12:57 EDT (History)
7 users (show)

See Also:
Fixed In Version: 4.3.0-25
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-11-06 01:52:11 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backtrace of crashed glmatrix (1.19 KB, text/plain)
2003-09-09 02:17 EDT, Nils Philippsen
no flags Details
backtrace of crashed skyrocket (1.01 KB, text/plain)
2003-09-09 02:18 EDT, Nils Philippsen
no flags Details
patch provides execute permission on memory allocated for code generation (16.71 KB, patch)
2003-09-19 14:45 EDT, John Dennis
no flags Details | Diff
patch modifies Imakefiles to correctly install in "tls" directores for a tls enabled build (11.05 KB, patch)
2003-09-19 15:05 EDT, John Dennis
no flags Details | Diff

  None (edit)
Description Nils Philippsen 2003-09-09 02:15:36 EDT
Description of problem:

Some GL programs (not all -- glxgears works for example) crash with a sig11 in
the radeon dri driver (/usr/X11R6/lib/modules/dri/tls/radeon_dri.so). I believe
(but I'm not sure) this may have to with finally upgrading to the Severn kernel
(from the latest Shrike kernel). I reproduced this problem with "glmatrix" from
xscreensaver and "skyrocket" from rss_glx (rss-glx.sf.net, self-built on Severn,
grab the RPM from http://lisas.de/~nils/redhat/severn).

Version-Release number of selected component (if applicable):

XFree86-4.3.0-24
kernel-2.4.20-20.9
kernel-2.4.22-1.2030.nptl
xscreensaver-4.12-1
rss_glx-0.7.4-3

How reproducible:

Always.

Steps to Reproduce:
1. Run /usr/X11R6/lib/xscreensaver/glmatrix
2. See it dump core
    
Actual results:

Dumps core

Expected results:

Runs w/o problems

Additional info:

Will try with the old kernel to assess whether this could be a DRI<->kernel
interface problem.

Uploading the rss_glx RPMs at the moment (they're not there yet).
Comment 1 Nils Philippsen 2003-09-09 02:17:43 EDT
Created attachment 94316 [details]
backtrace of crashed glmatrix
Comment 2 Nils Philippsen 2003-09-09 02:18:48 EDT
Created attachment 94317 [details]
backtrace of crashed skyrocket
Comment 3 Nils Philippsen 2003-09-09 02:27:12 EDT
rss_glx upload now finished. Get also the OpenAL packages if you want to try it.
Comment 4 Nils Philippsen 2003-09-09 03:29:17 EDT
Hmm, functions with exec-shield is switched off. Your call if this is an
XFree86, application or kernel bug) ;-).
Comment 5 Mike A. Harris 2003-09-09 08:52:56 EDT
Unfortunately, the backtraces are useless without debugging information in them.

It's hard to determine where the problem may lay, however if I had to hazard
a guess, I would guess the Mesa DRI 3D driver.  Can you rebuild the X src.rpm
with debug symbols by changing the DebuggableBuild toggle in the spec file and
adding .dbg to the release, then upgrading to the new packages?

That would help quite a lot.

TIA
Comment 6 Bill Nottingham 2003-09-09 12:41:35 EDT
This reminds me of a bug I filed about issues with an R9000 and DRI, lemme see
if I can dig it up.
Comment 7 Bill Nottingham 2003-09-09 12:42:57 EDT
See bug 91784.
Comment 8 Mike A. Harris 2003-09-09 18:14:55 EDT
Indeed, it seems these two bugs might be the same bug.  I'll leave both open
until we can conclude for sure it's a dupe or not though.  I was hoping to
get a good backtrace of Nils, in order to query bugzilla for similarish
bugs, but I think you saved me a bit of time.  ;o)

Thanks Bill
Comment 9 Nils Philippsen 2003-09-10 02:24:33 EDT
I'd be happy to test, but I guess I'd need to tidy up my system befortehands
then -- I don't want to build XFree86 more than once if possible because this is
not exactly the "killer compilation machine" ;-).

How much space do I need in .../redhat/ to do this? Any other prereqs?
Comment 10 Mike A. Harris 2003-09-11 01:11:37 EDT
We think we know the cause of this problem now.  Try disabling exec-shield
on your machine, and running the apps that previously failed.  The DRI
3D drivers appear to be trying to execute malloc()'d code somewhere, which
wasn't mapped with PROT_EXEC.  Please let me know if disabling exec-shield
causes the problem to stop.

Compiling X requires 1-2Gb of disk space free.  rpmbuild will complain
about any missing dependancies.  Hope this helps.
Comment 11 Nils Philippsen 2003-09-11 01:49:34 EDT
As I wrote above: the stuff works as before with "kernel.exec-shield = 0".
Comment 12 Mike A. Harris 2003-09-11 04:05:26 EDT
Ok, then this is definitely the same issue.  Basically something is wrong
with the Mesa DRI drivers.  They're dlopen()'d, and so should be exec-shield
friendly, however they're not.  Something appears to be allocating memory
without PROT_EXEC and then executing it.  The problem area has yet to be
located.

I've discussed this both with Ingo and Uli earlier today, and I've got
a vague idea where the problems are (radeon_vtxfmt_c.c and r200_vtxfmt_c.c)
however tracking the problem down to the ultimate cause will have to wait
until after I have completed the XFree86 security erratum which is a remotely
exploitable hole in XFree86 font server, X server, and other local holes.

If this issue is considered an urgent blocker to fix (which it is not
flagged as currently in bugzilla), perhaps John can investigate.  If John
has the time for this, I can discuss what is known over the phone with
him quickly.  I believe he's investigated similar issues inside the ELF
loader before as well.

Basically, anywhere malloc'd memory ends up containing executable code,
that memory needs to have mprotect() called on it to mark it executable
first.  Likewise, anywhere that mmap() is used for executable code, it
must be mmapped with PROT_EXEC.  It's almost certain that mesa or some of
the DRI drivers themselves are malloc()ing or mmapping incorrectly and
this is causing the code to fail when exec-shield is used.  Note that
this has nothing at all to do with the X server ELF loader, as these
modules are not X server modules, but are Mesa DRI modules.

Additional information, is that libGL dlopen()s the DRI modules, and that
the tdfx, radeon, and r200 modules also use dlopen() internally.  It's
unlikely that the problems are caused in these areas.  It appears that
there might perhaps be memory getting malloc()'d that then gets used
as a function jump table without being mprotect()'ed, but that is just
an unverified hypothesis that jumped into my mind while studying the
complex web of token pasting, macro abuse and function pointer abuse
that is in the DRI driver code.

Bill: Basically DRI will not work at all if exec-shield is enabled.  If
you think this is a Target or Blocker, feel free to mark it as such, as
I'm prioritizing bugs based on their Blocker/Target status.

John: Do you have cycles to spare to look into this and/or are you
interested in poking at it?  I don't suspect it'd take any more than
a day or so to figure out.

TIA
Comment 13 John Dennis 2003-09-15 16:48:35 EDT
Hmm... I seem to be having problems reproducing this. Kernel is
2.4.22-1.2039.nptl, xscreensaver is xscreensaver-4.12-1, XFree86-4.3.0-29.

It does not seem to matter if /proc/sys/kernel/exec-shield is 0 or 1, xscreen
saver glmatrix works. How are you changing exec-shield, are you cat'ing "0" or
"1" into /proc/sys/kernel/exec-shield? 

Is anybody else able to reproduce this? I've coded a fix, but unless I can
reproduce the failure, but may be a moot point.
Comment 14 Bill Nottingham 2003-09-15 16:51:12 EDT
Yeah, this spontaneously started working for me in more recent builds, and I'm
not sure why.
Comment 15 John Dennis 2003-09-15 17:01:09 EDT
Hmm... that's a bit worrysome. Mesa hasn't changed, is it possible that
exec-shield got broken such that its not enabled anymore?
Comment 16 Nils Philippsen 2003-09-16 02:22:50 EDT
"WORKSFORME":

--- 8< ---
nils@wombat:~> sudo sysctl -w "kernel.exec-shield=1"
kernel.exec-shield = 1
nils@wombat:~> /usr/X11R6/lib/xscreensaver/glmatrix
Segmentation fault (core dumped)
nils@wombat:~> uname -r
2.4.22-1.2044.nptl
nils@wombat:~> rpm -q XFree86
XFree86-4.3.0-30
nils@wombat:~> rpm -q xscreensaver
xscreensaver-4.12-1
nils@wombat:~> sudo sysctl -w "kernel.exec-shield=0"
kernel.exec-shield = 0
nils@wombat:~> /usr/X11R6/lib/xscreensaver/glmatrix
nils@wombat:~>
--- >8 ---
Comment 17 John Dennis 2003-09-19 14:45:32 EDT
Created attachment 94602 [details]
patch provides execute permission on memory allocated for code generation
Comment 18 John Dennis 2003-09-19 15:05:00 EDT
Created attachment 94603 [details]
patch modifies Imakefiles to correctly install in "tls" directores for a tls enabled build

This patch fixes these patches:

XFree86-4.3.0-redhat-libGL-opt.patch
XFree86-4.3.0-redhat-libGL-opt-v2.patch

Actually I think libGL-opt-v2 supercedes the other patch.

This patch corrects two flaws in the above patch.

1) GlxUseThreadLocalStorage was referenced before it was defined. This produced
errors on every Makefile that was made.

2) The intent is to produce two binary differnt versions of libraries, one
without threading and one supporting thread local storage (TLS). The tls
variants of the libraries are installed in a "tls" subdirectory. However the
original patch never updated the destination directory for installs, instead
tls versions of the library were in installed in the non-tls (parent) directory
where the non-tls versions of the libraries lived. Then the spec file copied
the files into the the tls subdirectory. In other words "make install" is very
broken and if done outside of an rpm build will trash the libraries on the
system :-(

After applying this patch we need to fix the spec file, we need to remove the
code that copies and links the tls libraries. We need to keep the code that
adds   GlxUseThreadLocalStorage to host.def, does a make clean, make Makefiles,
make, make install. When GlxUseThreadLocalStorage is defined to YES the files
will be installed where they belong.
Comment 19 John Dennis 2003-09-19 15:12:08 EDT
Mike: I'm assigning this to you so you can apply the patches and update the spec
file. Please read the note on the TLS Imakefile patch. That patch BTW was
created after the XFree86-4.3.0-redhat-libGL-opt-v2.patch was applied, perhaps
it should be merged with that patch as its all related and undoes some of what
earlier patch did. You'll have to delete some stuff from the spec file too,
hopefully the comment above will be clear.

I will submit the other patch upstream to the DRI folks, I noticed other
potentional bugs in mem.c I'd like to bring to their attention as well.

You may be interested to know the patch contains two alternate implementations,
one that uses mprotect and one that uses anonymous mmap, both were tested, the
patch turns on the mmap variant. There is also extensive documentation in mem.c
that I added.
Comment 20 Mike A. Harris 2003-09-24 04:45:19 EDT
*** Bug 101647 has been marked as a duplicate of this bug. ***
Comment 21 Mike A. Harris 2003-09-24 05:34:47 EDT
execute permission patch applies cleanly but does not compile, failing at:

gcc -m32 -O2 -march=i386 -mcpu=i686 -fno-strict-aliasing -pipe -ansi -pedantic
-Wall -Wpointer-arith -Wundef    -fno-merge-constants
-I../../../../../exports/include -I../../../../../exports/include/X11
-I../../../../../include/extensions                
-I../../../../../extras/Mesa/include -I../../../../../lib/GL/include           
-I../../../../../extras/Mesa/src              
-I../../../../../programs/Xserver/include  -I../../../../..
-I../../../../../exports/include   -Dlinux -D__i386__ -D_POSIX_C_SOURCE=199309L
-D_POSIX_SOURCE -D_XOPEN_SOURCE -D_BSD_SOURCE -D_SVID_SOURCE  -D_GNU_SOURCE 
-DSHAPE -DXINPUT -DXKB -DLBX -DXAPPGROUP -DXCSECURITY -DTOGCUP  -DXF86BIGFONT
-DDPMSExtension  -DPIXPRIV -DPANORAMIX  -DRENDER -DRANDR -DGCCUSESGAS
-DAVOID_GLYPHBLT -DPIXPRIV -DSINGLEDEPTH -DXFreeXDGA -DXvExtension
-DXFree86LOADER  -DXFree86Server -DXF86VIDMODE -DXvMCExtension  -DSMART_SCHEDULE
 -DXResExtension -DX_BYTE_ORDER=X_LITTLE_ENDIAN -DNDEBUG  -DFUNCPROTO=15
-DNARROWPROTO  -DIN_MODULE -DXFree86Module -DGLXEXT -DXF86DRI
-DGLX_DIRECT_RENDERING -DGLX_USE_DLOPEN -DGLX_USE_MESA   -c mem.c
In file included from /usr/include/bits/types.h:29,
                 from /usr/include/unistd.h:190,
                 from mem.c:42:
/usr/lib/gcc-lib/i386-redhat-linux/3.2/include/stddef.h:201: conflicting types
for `xf86size_t'
../../../../../programs/Xserver/include/xf86_libc.h:59: previous declaration of
`xf86size_t'
In file included from mem.c:42:
/usr/include/unistd.h:193: conflicting types for `xf86ssize_t'
../../../../../programs/Xserver/include/xf86_libc.h:60: previous declaration of
`xf86ssize_t'
In file included from mem.c:42:
/usr/include/unistd.h:310: conflicting types for `xf86read'
../../../../../programs/Xserver/include/xf86_ansic.h:270: previous declaration
of `xf86read'
/usr/include/unistd.h:313: conflicting types for `xf86write'
../../../../../programs/Xserver/include/xf86_ansic.h:271: previous declaration
of `xf86write'
/usr/include/unistd.h:383: conflicting types for `xf86usleep'
../../../../../programs/Xserver/include/xf86_ansic.h:342: previous declaration
of `xf86usleep'
In file included from mem.c:42:
/usr/include/unistd.h:820:29: macro "getpagesize" passed 1 arguments, but takes
just 0
In file included from mem.c:43:
/usr/include/sys/mman.h:59: conflicting types for `xf86mmap'
../../../../../programs/Xserver/include/xf86_ansic.h:272: previous declaration
of `xf86mmap'
/usr/include/sys/mman.h:77: conflicting types for `xf86munmap'
../../../../../programs/Xserver/include/xf86_ansic.h:273: previous declaration
of `xf86munmap'
mem.c:148: conflicting types for `_mesa_malloc'
../../../../../extras/Mesa/src/mem.h:59: previous declaration of `_mesa_malloc'
mem.c:158: conflicting types for `_mesa_calloc'
../../../../../extras/Mesa/src/mem.h:60: previous declaration of `_mesa_calloc'
mem.c:188: conflicting types for `_mesa_align_malloc'
../../../../../extras/Mesa/src/mem.h:63: previous declaration of
`_mesa_align_malloc'
mem.c:215: conflicting types for `_mesa_align_calloc'
../../../../../extras/Mesa/src/mem.h:64: previous declaration of
`_mesa_align_calloc'
mem.c:494: conflicting types for `_mesa_exec_malloc'
../../../../../extras/Mesa/src/mem.h:66: previous declaration of `_mesa_exec_malloc'
mem.c:569: conflicting types for `_mesa_memset16'
../../../../../extras/Mesa/src/mem.h:141: previous declaration of `_mesa_memset16'
make[7]: *** [mem.o] Error 1
make[7]: Leaving directory
`/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver/GL/mesa/src'
make[6]: *** [all] Error 2
make[6]: Leaving directory
`/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver/GL/mesa'
make[5]: *** [mesa] Error 2
make[5]: Leaving directory
`/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver/GL'
make[4]: *** [GL] Error 2
make[4]: Leaving directory
`/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc'
make[1]: *** [World] Error 2
make[1]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc'
make: *** [World] Error 2
make: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc'
error: Bad exit status from /home/mharris/rpmbuild/tmp/rpm-tmp.4364 (%build)
Comment 22 Mike A. Harris 2003-09-24 10:03:10 EDT
Some feedback from jakub on the patch..


<jakub> mharris: better use fd -1 instead of 0 for MAP_ANONYMOUS
Comment 23 Mike A. Harris 2003-09-24 11:22:33 EDT
<jakub> mharris: and at least glapi.c should simply EXEC_MALLOC (getpagesize ()
- 128, 16) the first time it needs it
<jakub> mharris: and then return addresses from that buffer...
<jakub> mharris: and free in __attribute__((destructor))
Comment 24 John Dennis 2003-09-24 16:37:42 EDT
O.K. I'll confess I don't understand why this isn't compiling. My patch was
against rev 29 of the rpm. I suspect things have changed since I notice the gcc
args passed to mem.c are not the same as I have.

It appears the xf86size_t is only defined in xf86_libc.h and xf86_OSlib.h,
neither of which get included when I compile mem.c.

Since I can't reproduce this perhaps the best thing is to send me a pointer to
the src rpm generating the error and I'll trying debugging the build using that.
Comment 25 Behdad Esfahbod 2003-11-05 09:12:19 EST
It's more than a month now.  I guess this has been fixed. right?
Comment 26 John Dennis 2003-11-05 17:58:09 EST
I just tried building the latest package XFree86-4.3.0-42.src.rpm with
this patch enabled and it built fine on x86. From my vantage point the
bug is fixed. However I did notice the application of the patch was
disabled in the spec file, that will probably have to be remedied.

I'll just assume the compile problem was one of those mysteries of the
universe. If the rpmbuild fails again with the patch enabled, give me
the details and assign it back to me again, otherwise I'm assuming
this will sail through.
Comment 27 Mike A. Harris 2003-11-06 01:52:11 EST
Version 2 of this patch was applied to the spec file on Sept 25:

* Thu Sep 25 2003 Mike A. Harris <mharris@redhat.com> 4.3.0-34
- Updated to XFree86-4.3.0-xf-4_3-branch-2003-09-26.patch to pick up new
  security fixes from CVS
- Updated XFree86-4.3.0-redhat-libGL-exec-shield-fixes.patch to new patch
  XFree86-4.3.0-redhat-libGL-exec-shield-fixes-v2.patch which reorders
some
  includes in mem.c so it builds.  Still cambridge only.


It's been flagged to only compile in for build_cambridge previously,
and recently I renamed that flag to build_yarrow for the final
release name.  It was flagged this way because RHL 8.0/9 doesn't
have exec shield anyway so I didn't want to introduce the possibility
of regression for erratum updates for 9, or to needlessly break
8.0 if for some reason it didn't work (pedantic paranoia mostly),
and wanted it only in Fedora Core 1 until well tested enough in
the wild to apply to other builds potentially.

So this patch has been applied for over a month, but the bug report
just not updated to reflect that.  Doh.

No problems reported yet John, so your fix seems to work.  I'm
closing the bug for now, but if anyone has any problems with
exec-shield, please reopen unless you think it is a different
issue, in which case open a new bug report for us to investigate.

Feel free if you test this to add a "tested and it works for me
now" to this report also if you like....

Closing as RAWHIDE, fixed in 4.3.0-25


Note You need to log in before you can comment on or make changes to this bug.