Bug 1440353 - swell-foop dies with bad X event
Summary: swell-foop dies with bad X event
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: libglvnd
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nicolas Chauvet (kwizart)
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-08 00:51 UTC by Tom Horsley
Modified: 2017-04-11 08:56 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-11 08:56:14 UTC


Attachments (Terms of Use)
dnf history info for all apr 7th transactions (37.00 KB, text/plain)
2017-04-08 00:51 UTC, Tom Horsley
no flags Details
bt of all threads when swell-foop dies (2.86 KB, text/plain)
2017-04-08 19:09 UTC, Tom Horsley
no flags Details
The xorg log file (46.01 KB, text/plain)
2017-04-09 18:00 UTC, Tom Horsley
no flags Details

Description Tom Horsley 2017-04-08 00:51:14 UTC
Created attachment 1269967 [details]
dnf history info for all apr 7th transactions

Description of problem:

After an update today (which did not include swell-foop, but did include new nvidia drivers) the swell-foop program does this:

(swell-foop:13923): Gdk-ERROR **: The program 'swell-foop' received an X Window System error.
This probably reflects a bug in the program.
The error was 'GLXBadContext'.
  (Details: serial 186 error_code 167 request_code 153 (GLX) minor_code 6)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the GDK_SYNCHRONIZE environment
   variable to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
Trace/breakpoint trap

Version-Release number of selected component (if applicable):
swell-foop-3.20.0-1.fc24.x86_64

How reproducible:
100% of the time

Steps to Reproduce:
1.run swell-foop
2.see it die with above error
3.

Actual results:
see above

Expected results:
run silly game

Additional info:

I'll attach a dnf history info of the updates I did this morning, but the only package I see that seems relevant is the new nvidia drivers. The swell-foop program certainly worked prior to doing the updates on apr 7th.

I tried several other gnome programs and didn't see anything else getting funny failures.

Comment 1 Tom Horsley 2017-04-08 01:01:30 UTC
For what it is worth, I booted to my fedora 25 partition (which I also updated this morning), and swell-foop works fine there, but I'm using the nouveau drivers there rather than nvidia.

Comment 2 Tom Horsley 2017-04-08 01:22:56 UTC
I found the old bug 1392186 where similar symptoms cropped up due to a problem with libglvnd installation, and I do see a new libglvnd showed up in the apr 7 update.

Certainly there is no longer anything in the /usr/lib64/libglvnd directory where the nvidia version of libGL came from before the update (at least I think that is what was going on there).

Comment 3 Tom Horsley 2017-04-08 01:32:50 UTC
If I run swell-foop under a debugger, I find /lib64/libGL.so.1 loaded in the address space when it dies, and no mention of any files in /usr/lib64/nvidia

Comment 4 Tom Horsley 2017-04-08 15:41:03 UTC
Perhaps the component of this bug should switch to clutter. I just checked a few other programs that depend on clutter and while not all of them die, some of them get the same bad glx error:

(totem:9163): Gdk-ERROR **: The program 'totem' received an X Window System error.
This probably reflects a bug in the program.
The error was 'GLXBadContext'.
  (Details: serial 188 error_code 167 request_code 153 (GLX) minor_code 6)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the GDK_SYNCHRONIZE environment
   variable to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
Trace/breakpoint trap

(sushi-start:9309): Gdk-ERROR **: The program 'sushi-start' received an X Window System error.
This probably reflects a bug in the program.
The error was 'GLXBadContext'.
  (Details: serial 188 error_code 167 request_code 153 (GLX) minor_code 6)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the GDK_SYNCHRONIZE environment
   variable to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
Trace/breakpoint trap

Comment 5 Tom Horsley 2017-04-08 17:28:48 UTC
Lots more stuff doesn't work.

I made a brand new user and tried to login with both gnome and gnome classic sessions, both pop up the highly informative "oh no something has gone wrong" screen. (fortunately I gave up on gnome years ago - my custom fvwm session still works fine for my normal login).

Comment 6 Samuel Sieb 2017-04-08 17:32:24 UTC
Try "GDK_SYNCHRONIZE=1 swell-foop" and see if you can get a stack trace.  It will likely run quite slowly, but abrt will give you a useful stack trace if it works.  
Although if there's that much broken now, maybe downgrading a package or two would be easier and more informative.

Comment 7 Tom Horsley 2017-04-08 19:09:34 UTC
Created attachment 1270058 [details]
bt of all threads when swell-foop dies

Turned on GDK_SYNCHRONIZE=1, ran swell-foop, it dies almost immediately, so nothing much got slowed down by the sync. The backtrace from all the threads is attached in this file.

Comment 8 Tom Horsley 2017-04-08 19:19:36 UTC
I tried to see what would happen by doing

dnf downgrade libglvnd

and dnf says I have the lowest version installed already, which is strange since my near identical fedora 24 system at work (which I haven't upgraded) claims to have an older version installed on it.

work: libglvnd-0.2.999-6.git28867bb.fc24.x86_64

home: libglvnd-0.2.999-14.20170308git8e6e102.fc24.x86_64

Comment 9 Samuel Sieb 2017-04-08 20:01:33 UTC
Looks like clutter is having trouble setting up the display, which would explain why other clutter apps are having the same problem.

Comment 10 Tom Horsley 2017-04-09 18:00:38 UTC
Created attachment 1270276 [details]
The xorg log file

Just in case driver related info turns out to be useful, here's my Xorg.0.log file.

Comment 11 Tom Horsley 2017-04-10 11:37:10 UTC
This is interesting. Today's update is offering to install these never before seen packages:

 libglvnd-egl             x86_64 1:0.2.999-14.20170308git8e6e102.fc24
                                                 updates                   43 k
 libglvnd-gles            x86_64 1:0.2.999-14.20170308git8e6e102.fc24
                                                 updates                   31 k
 libglvnd-glx             x86_64 1:0.2.999-14.20170308git8e6e102.fc24
                                                 updates                  124 k
 libglvnd-opengl          x86_64 1:0.2.999-14.20170308git8e6e102.fc24
                                                 updates                   44 k

Perhaps that will fix my problem and I just picked a bad time to update with incomplete repos?

I'll see when I get home today if things work better after an update.

Comment 12 Tom Horsley 2017-04-10 13:29:47 UTC
Might as well accumulate other relevant links in this bug:

http://forums.fedoraforum.org/showthread.php?t=313886

This fedoraforum post explains that "dnf downgrade" is specifically designed to not do anything useful and gives an example of how to actually downgrade libglvnd (and possibly fix the problems).

Comment 13 Hans de Goede 2017-04-10 14:52:51 UTC
Hi,

Are you using the rpmfusion nvidia rpms ? An update for those was pushed yesterday which fixes some bad interactions with the libglvnd update.

Regards,

Hans

Comment 14 Tom Horsley 2017-04-10 16:00:51 UTC
Yep, I'm using rpmfusion. I'll definitely try a new update when I get home and see if the new packages fix things.

Comment 15 Simone Caronni 2017-04-10 18:39:09 UTC
(In reply to Tom Horsley from comment #14)
> Yep, I'm using rpmfusion. I'll definitely try a new update when I get home
> and see if the new packages fix things.

The only consumer of libglvnd in Fedora 24 are the Nvidia drivers. Unfortunately the Nvidia change got postponed along with other things and on the contrary the libglvnd update in updates-testing got shortened to a few hours.

Comment 16 Tom Horsley 2017-04-10 21:11:24 UTC
OK, I got all the new libglvnd updates installed and the new rpmfusion. I noticed that akmods did NOT rebuild the nvidia driver even though I got a new kmodsrc package from rpmfusion. I eventually found the /var/cache entries for the current kernel akmod build and removed them. That didn't work, so I did a dnf erase of the nvidia driver built by akmod for the current kernel, then I finally got "akmods --force" to rebuild things and when I rebooted, everything was actually working again.

No more error from swell-foop. No more error from gnome-control-center, etc.

I think I'm finally back to normal and maybe this is not a bug (or error "bad timing" :-).

Comment 17 Hans de Goede 2017-04-11 08:56:14 UTC
Yeah this is(In reply to Tom Horsley from comment #16)
> OK, I got all the new libglvnd updates installed and the new rpmfusion. I
> noticed that akmods did NOT rebuild the nvidia driver even though I got a
> new kmodsrc package from rpmfusion. I eventually found the /var/cache
> entries for the current kernel akmod build and removed them. That didn't
> work, so I did a dnf erase of the nvidia driver built by akmod for the
> current kernel, then I finally got "akmods --force" to rebuild things and
> when I rebooted, everything was actually working again.
> 
> No more error from swell-foop. No more error from gnome-control-center, etc.
> 
> I think I'm finally back to normal and maybe this is not a bug (or error
> "bad timing" :-).

Yeah this was a case of bad timing I'm afraid (between the Fedora and rpmfusion updates). Anyways fixed now and not a Fedora bug, so closing as such.


Note You need to log in before you can comment on or make changes to this bug.