Bug 184045 - DRI problem
Summary: DRI problem
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11
Version: rawhide
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Brian Brock
URL:
Whiteboard:
: 193567 (view as bug list)
Depends On:
Blocks: fedora-ia64
TreeView+ depends on / blocked
 
Reported: 2006-03-05 13:17 UTC by Émeric Maschino
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-10-03 15:05:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Output of strace glxinfo (28.96 KB, text/plain)
2006-06-08 18:52 UTC, Émeric Maschino
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 7770 0 None None None Never

Description Émeric Maschino 2006-03-05 13:17:44 UTC
Description of problem:

glxinfo returns:

name of display: :0.0
X Error of failed request:  GLXBadContext
  Major opcode of failed request:  143 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  20
  Current serial number in output stream:  20

glxgears returns (the glxgears window quickly appears for a few milliseconds and
then disappears):

X Error of failed request:  GLXBadContext
  Major opcode of failed request:  143 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  25
  Current serial number in output stream:  25

I've checked the permissions: as user emeric, ls -lZ /dev/dri/card0 returns:

crw-------  emeric   root     system_u:object_r:dri_device_t   /dev/dri/card0

The DRI section of my xorg.conf file looks like this (consistent with other
Fedora Core and Red Hat installations):

Section "DRI"
        Group        0
        Mode         0666
EndSection

Except for glxinfo and glxgears, I don't currently have serious application to
test DRI. Mike A. Harris don't understand what's wrong and suggests me this
could be an udev issue. Hence my bug report ;-)

Version-Release number of selected component (if applicable):

udev-084-11

How reproducible:

Always

Steps to Reproduce:
1. Try to use DRI with glxinfo or glxgears
2.
3.
  
Actual results:

GLXBadContest X error

Expected results:

GLX information with glxinfo or 3D demo with glxgears

Additional info:

I've also rebooted my system with SELinux disabled: same problem

Comment 1 Harald Hoyer 2006-03-06 12:46:09 UTC
Please change in /etc/udev/rules.d/50-udev.rules
KERNEL=="card*",                NAME="dri/card%n"
to
KERNEL=="card*",                NAME="dri/card%n", MODE="0666"

then reboot.

Comment 2 Émeric Maschino 2006-03-06 17:47:45 UTC
Thanks for your suggestion. Unfortunately, this doesn't help. Furthermore, once
rebooted and logged as user emeric, /dev/dri/card0 is still

crw-------  emeric   root     system_u:object_r:dri_device_t   /dev/dri/card0

even with MODE="0666" in the 50-udev.rules file (I've also tried MODE="666"
without success). Is there a typo somewhere preventing udev to create the
/dev/dri/card0 node with the right permissions?

I don't know if it's really a permission problem: the bits currently set on
/dev/dri/card0 should allow user emeric read and write accesses. su - to gain
root priviledges exhibits the same GLXBadContext error, both with glxinfo and
glxgears.

Comment 3 Sammy 2006-03-07 19:54:05 UTC
What I find is that if the definition in the udev.rules file contains a "*"
the permissions you set are not imposed. For those names that do not have
a star it works fine, I do this for nvram, rtc, and a few others (see ptmx)
which all work fine. But if I change permissions for "dsp*" or and other
thing with a star the new permission is ignored. Unless it is intentional
there must be syntax problem in some script.
FYI

Comment 4 Harald Hoyer 2006-03-08 11:19:55 UTC
dsp is set by pam_console and configured in /etc/security/console.perms*

Comment 5 Mike A. Harris 2006-06-06 10:12:44 UTC
*** Bug 193567 has been marked as a duplicate of this bug. ***

Comment 6 Mike A. Harris 2006-06-06 10:21:47 UTC
Harald,

There has only been a single report of this to date (3 duplicates from
same person), however it is only being reported on ia64.  DRI works fine
on other architectures.

We do not have a Fedora Core/ia64 release, and I do not have such an
installation handy to personally test/verify.

The code in X that handles the config file DRI permissions is not
architecture dependant, so it works 100% identically on all architectures.

For this reason, it is assumed that this problem is caused either by:

1) The reporter is using a custom 3rd party installation which may have
   modified something causing it to break.

or

2) A bug in udev/hal/selinux/whatever


If nobody is able to directly reproduce this and diagnose it, my
recommendation is to leave this bug open for now, until the RHEL-5
devel cycle is underway and we have RHEL-5 ISO images with which to
test.  If the problem is still present then, then it is at least
much more easy for someone to diagnose.


To the reporter:

If you seriously want to diagnose this yourself, simply install the
debuginfo packages for the X server and drivers, etc.  Then add the
NoTrapSignals option to the xorg.conf (read manpage for details) and
run the X server as root - or twiddle the file in /proc/sys/kernel/whatever
that allows non-root to debug SUID processes.  You should then be able
to log in via ssh to the machine, and run gdb on the X server, and single
step it.

Hope this helps.



Comment 7 Harald Hoyer 2006-06-06 11:53:41 UTC
If a 
$ chmod 0666 /dev/dri/card0
does not fix the glx failure, then there is nothing udev can do anyway.

Also the 0600 comes from pam_console and not udev:
$ fgrep -r dri /etc/security/console.perms*
..
/etc/security/console.perms.d/50-default.perms:<dri>=/dev/nvidia* /dev/3dfx*
/dev/dri/card*
/etc/security/console.perms.d/50-default.perms:<console> 0600 <dri>         
0600 root


Comment 8 Mike A. Harris 2006-06-06 13:37:04 UTC
$ rpm -qf /etc/security/console.perms.d/50-default.perms
pam-0.79-9.6

Then it appears this is a pam bug.

It is somewhat confusing however that the X server, running as root,
may be unable to "chmod 0666" the device, as that's exactly what it tries
to do.

The DRI devices *MUST* be mode 0666 to work, and this is not an insecure
thing to do.  Why pam has it forced to 0600 I can only guess whoever
came up with that setting, mistakenly assumed the DRI devices are
insecure and should be restricted to root only.

The nvidia and 3dfx devices should have their own private settings, as
they do not use DRI.  We can't vouch for the security of either of them
and ship neither of them with the OS.

Thanks for the info Harald, reassigning to pam.


Comment 9 Émeric Maschino 2006-06-06 15:47:26 UTC
(In reply to comment #6)
> For this reason, it is assumed that this problem is caused either by:
> 
> 1) The reporter is using a custom 3rd party installation which may have
>    modified something causing it to break.

I'm using a 100% pure ia64 Fedora Rawhide installation from May 3rd 2006 updated
daily.

> If nobody is able to directly reproduce this and diagnose it, my
> recommendation is to leave this bug open for now, until the RHEL-5
> devel cycle is underway and we have RHEL-5 ISO images with which to
> test.  If the problem is still present then, then it is at least
> much more easy for someone to diagnose.

This issue has been reproduced on an other Itanium box. Please see
https://www.redhat.com/archives/fedora-ia64-list/2006-May/msg00006.html and
https://www.redhat.com/archives/fedora-ia64-list/2006-May/msg00023.html

> To the reporter:
> 
> If you seriously want to diagnose this yourself, simply install the
> debuginfo packages for the X server and drivers, etc.  Then add the
> NoTrapSignals option to the xorg.conf (read manpage for details) and
> run the X server as root - or twiddle the file in /proc/sys/kernel/whatever
> that allows non-root to debug SUID processes.  You should then be able
> to log in via ssh to the machine, and run gdb on the X server, and single
> step it.
> 
> Hope this helps.

Yes, it did help, thanks. But once all the needed debuginfo packages are
installed, what am I supposed to track down with gdb?

Comment 10 Émeric Maschino 2006-06-06 16:05:02 UTC
(In reply to comment #8)
> Then it appears this is a pam bug.

Well, I've modified the rules in /etc/security/console.perms.d/50-default.perms
so that ls -lZ /dev/dri/card0 now gives:

crw-rw-rw-  emeric root system_u:object_r:dri_device_t   /dev/dri/card0

But I'm still getting

libGL error: open DRM failed (Operation not permitted)
libGL error: reverting to (slow) indirect rendering

when invoking glxinfo or glxgears. BTW, that's why I also opened bug #193567
(though I don't know why it was also recorded with #193566) against the xorg
component, since (1) it appears that even with sufficient permissions, the
DRI/DRM framework isn't working and (2) the supplemental error messages that I
described in this bug report are no more displayed (only the libGL errors are
still there, not the GLXBadContext, major/minor opcode and serial number ones).

> Thanks for the info Harald, reassigning to pam.

I'm not sure this will change anything w.r.t. the libGL error.

Comment 11 Tomas Mraz 2006-06-07 10:41:38 UTC
AFAIK the dri device doesn't have to be 0666 permission as the console.perms
setup (mode 0600 and ownership by the console user) works fine for everyone else
(non ia64 machines). Also see the previous two comments from the reporter. This
must be a different issue. Could you try stracing the glxinfo and find out which
calls fail?


Comment 12 Émeric Maschino 2006-06-08 18:52:29 UTC
Created attachment 130775 [details]
Output of strace glxinfo

Comment 13 Émeric Maschino 2006-06-08 18:55:45 UTC
(In reply to comment #11)
> AFAIK the dri device doesn't have to be 0666 permission as the console.perms
> setup (mode 0600 and ownership by the console user) works fine for everyone else
> (non ia64 machines). Also see the previous two comments from the reporter. This
> must be a different issue. Could you try stracing the glxinfo and find out which
> calls fail?

Please have a look at attachment #130775 [details] for the output of strace glxinfo. I
don't know how to understand these two lines:

write(2, "libGL error: open DRM failed (Op"..., 55libGL error: open DRM failed (
Operation not permitted)
) = 55
write(2, "libGL error: reverting to (slow)"..., 52libGL error: reverting to (slo
w) indirect rendering
) = 52

Comment 14 Tomas Mraz 2006-06-08 19:02:09 UTC
The interesting lines are actually these:
open("/dev/dri/card0", O_RDWR)          = 4
ioctl(4, DECODER_SET_PICTURE, 0x60000fffff632df0) = -1 EACCES (Permission denied)

So the permissions on the device node are fine - open succeeds but the ioctl
call fails. Probably a kernel issue then?


Comment 15 Émeric Maschino 2006-07-25 23:04:09 UTC
(In reply to comment #14)
> The interesting lines are actually these:
> open("/dev/dri/card0", O_RDWR)          = 4
> ioctl(4, DECODER_SET_PICTURE, 0x60000fffff632df0) = -1 EACCES (Permission denied)
> 
> So the permissions on the device node are fine - open succeeds but the ioctl
> call fails. Probably a kernel issue then?

Do you have fresh news from the kernel maintainer(s)? Should we report this
issue to the linux-ia64 list too?

Comment 16 Émeric Maschino 2006-08-02 20:08:26 UTC
From the few bug reports (exactly 12, one during the FC5 test2 times) I can find
on the web with Google and the keywords DECODER_SET_PICTURE, EACCES and
"Permission denied", it seems that such an issue is related to Mesa and not the
Linux kernel. Furthermore, recompiling a stock 2.6.17 kernel from ftp.kernel.org
doesn't solve the problem for me. Targeting against the mesa component.

Comment 17 Émeric Maschino 2006-08-31 22:53:32 UTC
FYI reported upstream: https://bugs.freedesktop.org/show_bug.cgi?id=7770

Comment 18 Émeric Maschino 2006-08-31 23:04:54 UTC
(In reply to comment #16)
> From the few bug reports (exactly 12, one during the FC5 test2 times) I can find
> on the web with Google and the keywords DECODER_SET_PICTURE, EACCES and
> "Permission denied", it seems that such an issue is related to Mesa and not the
> Linux kernel. Furthermore, recompiling a stock 2.6.17 kernel from ftp.kernel.org
> doesn't solve the problem for me. Targeting against the mesa component.

Michel Dänzer at Tungsten Graphics determined that this error is due to a PCI
domain mismatch between X server and kernel. Complete story by following the
link provided in comment #17. Thus targeting against the xorg-x11 component.


Note You need to log in before you can comment on or make changes to this bug.