Bug 728044 - [Arrandale] Applications die on resume with kernel 2.6.40 (i.e. 3.0.0 for F15) on ThinkPad T510 with XI_BadDevice (invalid Device parameter)
[Arrandale] Applications die on resume with kernel 2.6.40 (i.e. 3.0.0 for F15...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: gtk3 (Show other bugs)
15
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Matthias Clasen
Fedora Extras Quality Assurance
[cat:modesetting]
: Reopened, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-03 21:38 EDT by Bojan Smojver
Modified: 2011-12-10 14:38 EST (History)
15 users (show)

See Also:
Fixed In Version: gtk3-3.2.2-2.fc16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-10 14:38:31 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Programs die on resume in Gnome (53.65 KB, image/png)
2011-08-13 21:32 EDT, Bojan Smojver
no flags Details
dmesg on resume when processes die (78.16 KB, text/plain)
2011-08-18 22:58 EDT, Bojan Smojver
no flags Details
X session errors (8.83 KB, text/plain)
2011-09-04 20:07 EDT, Bojan Smojver
no flags Details
X11 log of the session where programs die on resume (kernel 2.6.40) (43.67 KB, text/plain)
2011-09-04 20:08 EDT, Bojan Smojver
no flags Details
X11 log of the session where programs do not die on resume (kernel 2.6.38) (75.57 KB, text/plain)
2011-09-04 20:09 EDT, Bojan Smojver
no flags Details
X session errors from F-16 (got completely logged out) (65.19 KB, text/plain)
2011-11-10 15:39 EST, Bojan Smojver
no flags Details
Kernel divide error when trying to replicate this problem in F-16 (48.34 KB, image/jpeg)
2011-11-27 18:29 EST, Bojan Smojver
no flags Details

  None (edit)
Description Bojan Smojver 2011-08-03 21:38:34 EDT
Description of problem:
On suspend, kernel hangs. If iwlagn is put in SUSPEND_MODULES, suspend will succeed, but other problems are encountered on resume, such as gnome-terminals going away (probably because GConf dies in the background).

Version-Release number of selected component (if applicable):
2.6.40.1-0.fc15.x86_64
2.6.40-4.fc15.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Boot into Gnome.
2. Try to suspend.
  
Actual results:
Hangs on suspend after VT switch.

Expected results:
Did not hang with 2.6.38.8-35.fc15.x86_64.

Additional info:
Listing iwlagn in SUSPEND_MODULES helps only marginally, as explained above.

Similar problems reported here:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/811214
https://bugzilla.kernel.org/show_bug.cgi?id=40072
http://lists.debian.org/debian-kernel/2011/07/msg00639.html
Comment 1 Bojan Smojver 2011-08-09 00:46:36 EDT
Kernel kernel-2.6.40.1-1.fc15.x86_64 from koji indeed fixes the suspend problem.

However, on resume, all gnome-terminals are gone. I still don't know why that is.
Comment 2 Bojan Smojver 2011-08-10 01:03:36 EDT
Kernel 2.6.40.1-2.fc15.x86_64 is not much better. On resume, nautius died this time. On second resume/suspend cycle, gnome-panel died and was automatically restarted (i.e. I can see a different PID for it).

So, something is still very wrong on suspend/resume here.
Comment 3 Hans de Goede 2011-08-13 01:58:08 EDT
Not sure of this is an oversight, or deliberate since this bug is still being worked on, but judging from the kernel spec changelog , this patch has not been added to F-16, I guess we want it there too?
Comment 4 Bojan Smojver 2011-08-13 02:28:57 EDT
(In reply to comment #3)
> Not sure of this is an oversight, or deliberate since this bug is still being
> worked on, but judging from the kernel spec changelog , this patch has not been
> added to F-16, I guess we want it there too?

I'm having a feeling upstream is working on a different, more complicated fix for this. So, maybe that will go into F-16 when upstream finalises it?

Anyway, as I said before, even with the fix, I'm losing processes on resume, so things are still not as good as on .38.
Comment 5 Bojan Smojver 2011-08-13 21:31:27 EDT
Just tried 2.6.40.2-0.fc15.x86_64. Still no good. It comes back from suspend and a whole lot of stuff dies. I'll attach a screenshot of what popped up, but Evo died, gnome-terminals did too and more.
Comment 6 Bojan Smojver 2011-08-13 21:32:22 EDT
Created attachment 518164 [details]
Programs die on resume in Gnome
Comment 7 Fedora Update System 2011-08-16 08:46:48 EDT
kernel-2.6.40.3-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15
Comment 8 Fedora Update System 2011-08-16 21:17:24 EDT
Package kernel-2.6.40.3-0.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.40.3-0.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15
then log in and leave karma (feedback).
Comment 9 Fedora Update System 2011-08-17 22:29:42 EDT
kernel-2.6.40.3-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 10 Bojan Smojver 2011-08-17 22:41:54 EDT
Reopening, changing what the problem actually is.
Comment 11 Bojan Smojver 2011-08-17 22:43:21 EDT
Note that with kernel-2.6.40.3-0.fc15 the problem is less severe, but it still happens. I have seen gnome-panel and nautilus die on resume thus far. The widespread process death (as seen in the screenshots) is not happening any more, but there is no doubt that the problem still persists.
Comment 12 Dave Jones 2011-08-18 14:01:03 EDT
please attach a dmesg from after the resume when things have been killed.
Comment 13 Bojan Smojver 2011-08-18 22:58:52 EDT
Created attachment 518969 [details]
dmesg on resume when processes die
Comment 14 Bojan Smojver 2011-08-18 22:59:37 EDT
Note that in this particular case, I had gnome-terminal, nautilus, gnome-panel and a whole bunch of applets die.
Comment 15 Dave Jones 2011-08-24 15:16:55 EDT
I don't see anything obvious in that log to explain why things are dying.

abrtd[968]: Unrecognized variable 'DumpLocation' in '/etc/abrt/abrt.conf'

You might want to fix that up, to see if that makes abrt start catching the crashes, perhaps that'll give some clues.
Comment 16 Bojan Smojver 2011-08-24 19:17:19 EDT
(In reply to comment #15)
> I don't see anything obvious in that log to explain why things are dying.
> 
> abrtd[968]: Unrecognized variable 'DumpLocation' in '/etc/abrt/abrt.conf'
> 
> You might want to fix that up, to see if that makes abrt start catching the
> crashes, perhaps that'll give some clues.

Haven't touched this file. Didn't even know I had one there, to be honest.

BTW, abrt does work for me. I logged plenty of bugs with it. Will look a bit more.
Comment 17 Bojan Smojver 2011-08-27 04:45:20 EDT
I just tried kernel-2.6.40.3-2.fc15.x86_64 from koji. I know I'm going to regret saying this, but haven't seen any crashes/restarts except for pidgin, which appears to be reloaded on resume. Note sure whether anything from stable queue should make a difference here.

Will keep testing...
Comment 18 Bojan Smojver 2011-08-27 20:32:20 EDT
(In reply to comment #17)
> I know I'm going to regret saying this

Yup. Resumed after the machine has been asleep overnight and many Gnome apps are gone, including nautilus, gnome-panel etc. Back to square one.
Comment 19 Bojan Smojver 2011-08-28 21:07:52 EDT
From ~/.xsession-errors:
----------------------------------------
(gnome-terminal:1915): Gdk-WARNING **: The program 'gnome-terminal' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 10608 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(gnome-power-manager:1845): Gdk-WARNING **: The program 'gnome-power-manager' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 1540 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(gnome-panel:1781): Gdk-WARNING **: The program 'gnome-panel' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 6140 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(nautilus:1863): Gdk-WARNING **: The program 'nautilus' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 1724 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)

** Message: applet now removed from the notification area
** Message: applet now embedded in the notification area
----------------------------------------

This _only_ happens with 3.0.x (i.e. 2.6.40.x) kernels. Looks like other folks are seeing this too, in bug #720792.
Comment 20 Bojan Smojver 2011-08-28 22:06:09 EDT
Maybe we need the latest Intel graphics drivers (http://intellinuxgraphics.org/2011Q3.html) with kernel 3.0? If there is a build for F-15 floating around, I wouldn't mind trying it.
Comment 21 Dave Jones 2011-08-29 13:35:16 EDT
I'm going to reassign this to X to see if those guys have any insight.
It may still be a kernel change that causes the problem, but X should be able to handle devices coming and going without apps caring.
Comment 22 Bojan Smojver 2011-08-30 21:24:12 EDT
X folks,

Any ideas here? Look from comment #19 onwards.
Comment 23 Matěj Cepl 2011-09-01 20:24:35 EDT
Well, I would guess more data could help.

Please attach

* your X server config file (/etc/X11/xorg.conf, if available), and
* X server log file (/var/log/Xorg.*.log*; check with grep Backtrace /var/log/Xorg* which logs might be the most interesting ones, send us at least Xorg.0.log)

to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.
Comment 24 Bojan Smojver 2011-09-04 20:06:45 EDT
That grep doesn't match anything in any of the files, so I'm just going to attach Xorg.0.log and Xorg.0.log.old (this is with the .38 kernel), for comparison. I will also attach the latest .xession-errors file, for reference.

It took 3 suspend/resume cycles this time to trigger this. Kernel was kernel-2.6.40.4-5.fc15.x86_64.
Comment 25 Bojan Smojver 2011-09-04 20:07:46 EDT
Created attachment 521414 [details]
X session errors
Comment 26 Bojan Smojver 2011-09-04 20:08:47 EDT
Created attachment 521415 [details]
X11 log of the session where programs die on resume (kernel 2.6.40)
Comment 27 Bojan Smojver 2011-09-04 20:09:22 EDT
Created attachment 521416 [details]
X11 log of the session where programs do not die on resume (kernel 2.6.38)
Comment 28 Bojan Smojver 2011-09-04 20:12:08 EDT
Oh, and I don't have /etc/X11/xorg.conf file. It's all configured automatically.
Comment 29 Bojan Smojver 2011-09-05 06:57:58 EDT
(In reply to comment #15)
> abrtd[968]: Unrecognized variable 'DumpLocation' in '/etc/abrt/abrt.conf'
> 
> You might want to fix that up, to see if that makes abrt start catching the
> crashes, perhaps that'll give some clues.

FYI - it's a bug #715456.
Comment 30 Bojan Smojver 2011-09-07 03:58:04 EDT
And now that I have new abrt, the darn thing won't crash on me. Will keep trying.
Comment 31 Bojan Smojver 2011-09-07 18:26:09 EDT
(In reply to comment #30)
> And now that I have new abrt, the darn thing won't crash on me. Will keep
> trying.

Just happened again. As usual, I got:
-----------------------------
Window manager warning: Log level 8: meta_window_focus: assertion `!window->override_redirect' failed

(nm-applet:1866): Gdk-WARNING **: The program 'nm-applet' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 53493 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(nautilus:1932): Gdk-WARNING **: The program 'nautilus' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 3083 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)

** (notification-daemon:1918): DEBUG: Adding id 3
** (notification-daemon:1918): DEBUG: Bubble destroyed
** (notification-daemon:1918): DEBUG: No queued notifications
-----------------------------

Nothing significant in the Xorg.0.log. Nothing significant in /var/log/messages. New abrt didn't pick up the crashes at all.

What do I do next?
Comment 32 Bojan Smojver 2011-09-07 19:35:47 EDT
One more piece of info here. I tried stracing the apps that crash and I even tried debugging them with GDB on other VTs. Found nothing from strace and they wouldn't crash when being debugged with GDB.
Comment 33 Bojan Smojver 2011-09-09 00:28:13 EDT
Some more FYI, I switched to xorg-x11-drv-intel-2.16.0-2.fc15.x86_64, which I built locally form the F-16 RPM. Also running xorg-x11-server-Xorg-1.10.4-1.fc15.x86_64. We'll see how that combo goes.
Comment 34 Bojan Smojver 2011-09-09 09:54:48 EDT
(In reply to comment #33)
> Some more FYI, I switched to xorg-x11-drv-intel-2.16.0-2.fc15.x86_64, which I
> built locally form the F-16 RPM. Also running
> xorg-x11-server-Xorg-1.10.4-1.fc15.x86_64. We'll see how that combo goes.

Didn't help.
Comment 35 Sergio Monteiro Basto 2011-09-09 11:54:45 EDT
since I got an arrandale 
[    91.820] (II) intel(0): Integrated Graphics Chipset: Intel(R) Arrandale
[    91.820] (--) intel(0): Chipset: "Arrandale"

I don't had any of this problems , with F15 updated (in kde).
Comment 36 Bojan Smojver 2011-09-09 19:11:01 EDT
(In reply to comment #35)
> since I got an arrandale 
> [    91.820] (II) intel(0): Integrated Graphics Chipset: Intel(R) Arrandale
> [    91.820] (--) intel(0): Chipset: "Arrandale"
> 
> I don't had any of this problems , with F15 updated (in kde).

Then this must be a Gnome specific problem. Just resumed and lost gnome-terminal this time:

(gnome-terminal:27963): Gdk-WARNING **: The program 'gnome-terminal' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 42085 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
Comment 37 Bojan Smojver 2011-09-09 19:13:21 EDT
Assigning back to xorg-x11-server. Maybe folks will have some more pointers for me.
Comment 38 Bojan Smojver 2011-09-09 19:24:37 EDT
It just occurred to me that I run an unusual bit of configuration: Gnome fallback mode with Mutter as a window manager (which obviously was not a problem before 2.6.40 kernel).

Could it be that I'm hitting some bugs in Mutter that otherwise never show (i.e. because it's normally run under gnome-shell)?
Comment 39 Bojan Smojver 2011-09-09 19:31:11 EDT
I'll bet this fixed it:

http://pkgs.fedoraproject.org/gitweb/?p=libXi.git;a=commitdiff;h=e9e6be20f0f57a75c8a70eb52b40a005af1afac9

Rebooting in a few...
Comment 40 Bojan Smojver 2011-09-11 00:50:06 EDT
(In reply to comment #35)

> I don't had any of this problems , with F15 updated (in kde).

Or, maybe you just have different input devices.
Comment 41 Bojan Smojver 2011-09-11 00:58:37 EDT
In the meantime, I had an X crash on resume (got gdm login screen again). Not sure whether it's related to this in any way. Nothing useful in the logs, abrt didn't pick anything up either.
Comment 42 Bojan Smojver 2011-09-13 04:02:48 EDT
(In reply to comment #39)
> I'll bet this fixed it:
> 
> http://pkgs.fedoraproject.org/gitweb/?p=libXi.git;a=commitdiff;h=e9e6be20f0f57a75c8a70eb52b40a005af1afac9
> 
> Rebooting in a few...

Nope:
------------------------------
(notification-daemon:1890): Gdk-WARNING **: The program 'notification-daemon' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 11024 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(gnome-panel:1825): Gdk-WARNING **: The program 'gnome-panel' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 45770 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(gnome-screensaver:1892): Gdk-WARNING **: The program 'gnome-screensaver' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 11026 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
------------------------------

Back to square one.
Comment 43 Bojan Smojver 2011-09-19 22:41:32 EDT
No progress, so let's see what libXi people think of this.
Comment 44 Bojan Smojver 2011-09-25 20:05:35 EDT
Anyone has any idea on how to debug and eventually fix this?
Comment 45 Bojan Smojver 2011-09-28 18:47:21 EDT
This must be something in one of the libraries. For the first time I had Evo die as well:
--------------------------
(evolution:1995): Gdk-WARNING **: The program 'evolution' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 236980 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
--------------------------
Comment 46 Bojan Smojver 2011-09-29 19:29:38 EDT
And again, X crash on resume. Nothing worth reporting in the log. Abrt didn't catch anything.
Comment 47 Bojan Smojver 2011-10-05 17:40:05 EDT
Ditto with 2.6.40.6-0.fc15.x86_64.
Comment 48 Bojan Smojver 2011-10-05 23:02:00 EDT
Not sure whether this is a fluke, but I cannot replicate this with F-16 Beta Live on the same hardware. Suspended/resumed many times. Hmm...
Comment 49 Bojan Smojver 2011-10-10 17:23:59 EDT
(In reply to comment #46)
> And again, X crash on resume. Nothing worth reporting in the log. Abrt didn't
> catch anything.

Another crash of X on resume, this time with no Gnome sessions opened (i.e. at GDM login screen).
Comment 50 Fedora Admin XMLRPC Client 2011-10-11 19:10:40 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 51 Peter Hutterer 2011-10-12 00:48:03 EDT
I'm just going to reassign this to gtk for now. XI_BadDevice is the standard error clients will get when they use an invalid device parameter for an X Input extension request. Even if there's a bug in the server or libXi, since devices can disappear at any time, any client must expect such an error to happen and handle it appropriately.
Comment 52 Matthias Clasen 2011-10-12 11:38:10 EDT
(In reply to comment #45)
> This must be something in one of the libraries. For the first time I had Evo
> die as well:
> --------------------------
> (evolution:1995): Gdk-WARNING **: The program 'evolution' received an X Window
> System error.
> This probably reflects a bug in the program.
> The error was 'XI_BadDevice (invalid Device parameter)'.
>   (Details: serial 236980 error_code 149 request_code 141 minor_code 48)
>   (Note to programmers: normally, X errors are reported asynchronously;
>    that is, you will receive the error a while after causing it.
>    To debug your program, run it with the --sync command line
>    option to change this behavior. You can then get a meaningful
>    backtrace from your debugger if you break on the gdk_x_error() function.)
> --------------------------


Can you do what GTK+ is asking you to do in those messages you posted ?
Comment 53 Bojan Smojver 2011-10-12 18:11:39 EDT
(In reply to comment #52)

> Can you do what GTK+ is asking you to do in those messages you posted ?

I actually tried this (although some programs that are crashing were not aware of --sync option), but was unable to replicate the problem. I'll try again.
Comment 54 Bojan Smojver 2011-10-13 03:00:08 EDT
nautilus: doesn't understand --sync
gnome-terminal: doesn't understand --sync
gnome-panel: doesn't understand --sync

These crash most often. Any other ideas?
Comment 55 Matthias Clasen 2011-10-13 13:12:30 EDT
Oh bummer. Looks like that error message is outdated in gdk.

You can set the GDK_SYNCHRONIZE env var to achieve the same, nowadays.
Comment 56 Bojan Smojver 2011-10-16 18:07:59 EDT
Running things like that now. I set break gdk_x_error() in gdb instances that are tracing several binaries that usually crash. Of course, now they won't crash - isn't it always like this? :-(

Will keep running my system like this in the hope that they do eventually crash. Keep you posted.
Comment 57 Luca Villa 2011-10-17 05:09:02 EDT
I think I'm hitting this bug on my Thinkpad T510 with F15 and gnome. It happens frequently after resuming from suspend, sometimes the whole Xorg session crashes while some others just some program gets terminated.

Excerpt from my .xsession-errors:

(gnome-power-manager:12475): Gdk-WARNING **: The program 'gnome-power-manager' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 24339 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)


(gnome-screensaver:12420): Gdk-WARNING **: The program 'gnome-screensaver' received an X Window System error.
This probably reflects a bug in the program.
The error was 'XI_BadDevice (invalid Device parameter)'.
  (Details: serial 25521 error_code 149 request_code 141 minor_code 48)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)

gnome-session[12202]: WARNING: Detected that screensaver has left the bus

(gnome-settings-daemon:12366): media-keys-plugin-WARNING **: Unable to get default sink
** (deja-dup-monitor:12421): DEBUG: monitor.vala:263: Invalid next run date.  Not scheduling a backup.
Error getting primary device: GDBus.Error:org.gnome.PowerManager.Failed: There is no primary device to reflect system state (don't show any UI)
gnome-session[12202]: WARNING: Detected that screensaver has left the bus
gnome-session[12202]: WARNING: Detected that screensaver has left the bus
gnome-session[12202]: WARNING: Detected that screensaver has left the bus

Please let me know if there is anything I can do to help identifying the root cause of the issue.
Comment 58 Bojan Smojver 2011-10-18 17:53:40 EDT
I was tracing notification area applet in gdb with that environment variable set to 1, but got nothing - the program did not break on gdk_x_error() function at all - just existed. No stack.

No idea what to do next.
Comment 59 Bojan Smojver 2011-10-18 17:54:10 EDT
I meant to say, just exited, not just existed.
Comment 60 Dag Wieers 2011-10-26 07:17:22 EDT
Hans de Goede forwarded me to this problem, I am not sure if this is related.

On RHEL 6.2 Beta I have had a few incidents that after a resume I immediately get a complaint of Gnome that was XKB related. The window reappears immediately after close for about 15 times.

Meanwhile, any Gnome application complains about unable to write (eg. LibreOffice etc...). While inside a terminal window there are no I/O related issues, all file systems are mounted r/w and I can still perform writes to disk. So it's as if the gnome-vfs back-end died or something ?

If there's anything I can do to increase verbosity in order to analyse this better next time, let me know. If this deserves another bug report, I'll do that as well.

PS Looking at .xsession-errors.old I find (among others) the below, but I am not entirely sure that this was a direct result from the problem (ie. it might have been due to a shutdown). The fact that this file has no timestamps makes it hard to relate it with incidents after the fact:

----

 (polkit-gnome-authentication-agent-1:2627): polkit-gnome-1-WARNING **: Error enumerating temporary authorizations: Remote Exception invoking  org.freedesktop.PolicyKit1.Authority.EnumerateTemporaryAuthorizations() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.Po licyKit1: org.freedesktop.PolicyKit1.Error.Failed: Cannot determine session the caller is in
 gnome-settings-daemon: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 gpk-update-icon: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 gnome-volume-control-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 gdu-notification-daemon: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 local:  fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0.0"
 polkit-gnome-authentication-agent-1: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 bluetooth-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 seapplet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 applet.py: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 abrt-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 gnome-screensaver: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 nm-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
 Stopping Bluetooth ObexFTP server failed: Did not receive a reply. Possible causes include: the remote application did not send a reply, the  message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
----
Comment 61 Hans de Goede 2011-10-26 07:31:37 EDT
While trying to find this bug for Dag, I did a google search and I also found this archlinux bug, which almost certainly is the same issue, and which *may* contain relevant info:

https://bugs.archlinux.org/task/24096
Comment 62 Bojan Smojver 2011-10-27 00:40:35 EDT
(In reply to comment #61)
> While trying to find this bug for Dag, I did a google search and I also found
> this archlinux bug, which almost certainly is the same issue, and which *may*
> contain relevant info:
> 
> https://bugs.archlinux.org/task/24096

Yeah, that looks like the same thing.

Dag's RHEL 6.2 problem, don't think so.
Comment 63 Bojan Smojver 2011-11-02 00:29:55 EDT
I upgraded my laptop to F-16 now. Didn't have this problem on resume yet, but it might be just a fluke.

PS. I am running metacity again, because of bug #750476. As we've seen from other people's reports here, it shouldn't matter, but just for the record.
Comment 64 Ernesto Revilla 2011-11-03 10:09:42 EDT
Hi.

I had a similar issue:
https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/882956

(see https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/882956/comments/4)

I could resolve it by unloading kernel modules before suspend. That removes X input devices. After wakeup, I just load modules again.

Regards.
Comment 65 Bojan Smojver 2011-11-04 20:01:53 EDT
(In reply to comment #64)
> Hi.
> 
> I had a similar issue:
> https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/882956
> 
> (see
> https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/882956/comments/4)
> 
> I could resolve it by unloading kernel modules before suspend. That removes X
> input devices. After wakeup, I just load modules again.

Thanks for the tips. These sound like workarounds more than solutions to me, to be honest.

Anyhow, I'm now on Fedora 16 and I haven't had this happen once. So, maybe there is hope. :-)
Comment 66 Bojan Smojver 2011-11-09 15:47:48 EST
(In reply to comment #65)
 
> Anyhow, I'm now on Fedora 16 and I haven't had this happen once. So, maybe
> there is hope. :-)

Still haven't seen this in F-16. I know I'm going to regret saying this, but it looks like it's been fixed there.
Comment 67 Bojan Smojver 2011-11-10 15:38:50 EST
(In reply to comment #66)
> I know I'm going to regret saying this, but it
> looks like it's been fixed there.

I knew I'd eat my words here. Yeah, it happened - essentially everything went down and I got logged off. I'll attach .xsession-errors form F-16.
Comment 68 Bojan Smojver 2011-11-10 15:39:42 EST
Created attachment 532925 [details]
X session errors from F-16 (got completely logged out)
Comment 69 Matthias Clasen 2011-11-10 22:39:49 EST
Unfortunately, all those xsession-errors, and X logs are of no use here. I really need a stack trace from a crashing client, to show which X request is triggering the BadDevice.
Comment 70 Bojan Smojver 2011-11-10 22:48:27 EST
(In reply to comment #69)
> Unfortunately, all those xsession-errors, and X logs are of no use here. I
> really need a stack trace from a crashing client, to show which X request is
> triggering the BadDevice.

As you can see from comment #58 and above, I tried. Got absolutely nothing in gdb. No idea what to do next.
Comment 71 Bojan Smojver 2011-11-23 18:01:48 EST
Just FYI, new Intel X graphics drivers didn't make a difference here.
Comment 72 Ernesto Revilla 2011-11-23 18:18:32 EST
No, it's a input device. Does the computer has a fingerprint reader?
Comment 73 Bojan Smojver 2011-11-23 18:37:23 EST
(In reply to comment #72)
> No, it's a input device. Does the computer has a fingerprint reader?

It does.
Comment 74 Bojan Smojver 2011-11-27 18:29:35 EST
Created attachment 537274 [details]
Kernel divide error when trying to replicate this problem in F-16

This happened probably after about 50 suspend/resume cycles. The box hung. 3.1.2-1.fc16.x86_64. Just FYI.
Comment 75 Samuel Sieb 2011-11-29 00:07:32 EST
I get this regularly with F16.  It may be be aggravated by the fact that I'm suspending it while docked and resuming without.  The laptop has a fingerprint reader, but it isn't supported, so that's most likely not relevant.
Comment 76 Bojan Smojver 2011-11-29 01:33:56 EST
(In reply to comment #74)
> Created attachment 537274 [details]
> Kernel divide error when trying to replicate this problem in F-16
> 
> This happened probably after about 50 suspend/resume cycles. The box hung.
> 3.1.2-1.fc16.x86_64. Just FYI.

And again. Looks like newer kernel are even more rotten on this hardware.
Comment 77 Matthias Clasen 2011-11-29 10:08:17 EST
The only thing that will help fix this bug is an actual stacktrace...
Comment 78 Samuel Sieb 2011-11-29 12:38:48 EST
I have no idea what to get a stacktrace on.  The last time this happened there was no indication in any of the logs to suggest what happened.
Comment 79 Bojan Smojver 2011-11-29 16:28:59 EST
(In reply to comment #77)
> The only thing that will help fix this bug is an actual stacktrace...

Been running nautilus, gnome-terminal and gnome-panel like this for a while, under the gdb, with GDK_SYNCHRONIZE=1. It either doesn't happen (did suspend/resume in the vicinity of 300 times like that) or when it does, there is no usable trace.

Will keep trying...
Comment 80 Bojan Smojver 2011-11-29 16:49:23 EST
(In reply to comment #79)

> Will keep trying...

For instance, I just switched user a few times, which crashed my X session. Got no usable traces from that.
Comment 81 Samuel Sieb 2011-11-29 23:55:34 EST
Upstream bug for the chipset I have:
https://bugs.freedesktop.org/show_bug.cgi?id=40625
Comment 82 Fedora Update System 2011-11-30 00:41:54 EST
gtk3-3.2.2-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/gtk3-3.2.2-2.fc16
Comment 83 Fedora Update System 2011-12-02 16:27:57 EST
Package gtk3-3.2.2-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing gtk3-3.2.2-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-16634/gtk3-3.2.2-2.fc16
then log in and leave karma (feedback).
Comment 84 Bojan Smojver 2011-12-03 03:07:51 EST
Thank you for pushing this update. Of course, it will take a while to verify that the problem is fixed. Will keep you posted.
Comment 85 Fedora Update System 2011-12-10 14:38:31 EST
gtk3-3.2.2-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.