Bug 394071 - X hangs in SIGALRM/rt_sigreturn loop
X hangs in SIGALRM/rt_sigreturn loop
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nv (Show other bugs)
10
x86_64 Linux
low Severity low
: ---
: ---
Assigned To: Adam Jackson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-21 07:34 EST by Paul Howarth
Modified: 2013-12-08 09:01 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-11-18 07:58:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/etc/X11/xorg.conf for affected machine (787 bytes, text/plain)
2007-11-21 07:34 EST, Paul Howarth
no flags Details
Xorg.0.log (61.64 KB, text/plain)
2007-11-21 08:35 EST, Paul Howarth
no flags Details
Xorg.0.log from run with no xorg.conf (50.75 KB, text/plain)
2007-11-21 19:27 EST, Paul Howarth
no flags Details
Xorg.0.log from work machine (52.83 KB, text/plain)
2008-04-08 08:36 EDT, Paul Howarth
no flags Details

  None (edit)
Description Paul Howarth 2007-11-21 07:34:47 EST
This is a problem I used to see on F7 but happens much more frequently since
doing a fresh install of F8 yesterday (using a browser for a few minutes is
generally enough to trigger it).

The symptom is that X locks up, consuming 100% CPU, e.g.

top - 23:43:37 up 13:36,  9 users,  load average: 0.93, 0.78, 0.56
Tasks: 239 total,   2 running, 237 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.0%us,  1.5%sy,  0.4%ni, 91.9%id,  1.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2061728k total,  2004608k used,    57120k free,     6032k buffers
Swap:  4194296k total,      144k used,  4194152k free,  1209692k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
25746 root      20   0  371m  29m 8868 R  100  1.4   7:24.22 X                  
    1 root      20   0 10308  656  556 S    0  0.0   0:02.38 init               
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd           
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.28 migration/0        
    4 root      15  -5     0    0    0 S    0  0.0   0:00.13 ksoftirqd/0        
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0         
    6 root      RT  -5     0    0    0 S    0  0.0   0:00.28 migration/1        
    7 root      15  -5     0    0    0 S    0  0.0   0:00.11 ksoftirqd/1        
    8 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1         
    9 root      15  -5     0    0    0 S    0  0.0   0:00.15 events/0           
   10 root      15  -5     0    0    0 S    0  0.0   0:00.15 events/1           
   11 root      15  -5     0    0    0 S    0  0.0   0:00.00 khelper            
   54 root      15  -5     0    0    0 S    0  0.0   0:00.04 kblockd/0          
   55 root      15  -5     0    0    0 S    0  0.0   0:00.01 kblockd/1          
   58 root      15  -5     0    0    0 S    0  0.0   0:00.00 kacpid             
   59 root      15  -5     0    0    0 S    0  0.0   0:00.00 kacpi_notify       
...


I have no problem logging in remotely but the local machine is unusable until
"kill -9" is applied to the X server.

Attaching strace to the X process shows a never-ending sequence of
SIGALRM/rt_sigreturn:

Process 25756 attached - interrupt to quit
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 46912535113796
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 46912535113796
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 46912535113796
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 46912535113796
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 46912535113796
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 46912535113796
--- SIGALRM (Alarm clock) @ 0 (0) ---
Process 25756 detached

I'm using:
xorg-x11-drv-nv-2.1.5-2.fc8
xorg-x11-drivers-7.2-9.fc8
xorg-x11-server-Xorg-1.3.0.0-33.fc8

The machine's smolt profile is:
http://smolt.fedoraproject.org/show?UUID=1eb81574-1847-4692-b440-fddafb2b875d

Please let me know if there is any further information I can provide to help
track down this problem.
Comment 1 Paul Howarth 2007-11-21 07:34:47 EST
Created attachment 265951 [details]
/etc/X11/xorg.conf for affected machine
Comment 2 Matěj Cepl 2007-11-21 08:08:40 EST
Can we get /var/log/Xorg.0.log from your computer as well, please? Also, could
you try to run X while renaming /etc/X11/xorg.conf (i.e., running X withouth the
configuration file at all)? What happens?
Comment 3 Paul Howarth 2007-11-21 08:35:32 EST
Created attachment 265981 [details]
Xorg.0.log

Here's the Xorg.0.log from a session that locked up.

I'll try running without xorg.conf but it'll run at 800x600 because I use
component video cables to connect the monitor and hence it won't get auto
detected.
Comment 4 Paul Howarth 2007-11-21 19:27:53 EST
Created attachment 266381 [details]
Xorg.0.log from run with no xorg.conf

Regarding Comment #2, running without xorg.conf results in an 800z600 display
and the problem occurs just the same.
Comment 5 Paul Howarth 2007-11-27 11:07:46 EST
FWIW, adding `Option "NoAccel" "true"' to the "Device" section for the graphics
card results in a horribly slow but lockup-free display.
Comment 6 Paul Howarth 2008-04-08 08:36:33 EDT
Created attachment 301626 [details]
Xorg.0.log from work machine

Yesterday I swapped out an old radeon card for an nvidia card on my work
machine, which has completely different hardware, as I was having problems
getting my shiny new HP LCD panel to work at its native 1680x1050 resolution.
Today I had a lockup on the work machine, and strace revealed that it was a
very similar problem:

 --- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()				= ? (mask now [])

I'm running with no xorg.conf on this machine.
Comment 7 Orion Poplawski 2008-04-08 18:47:28 EDT
I'm seeing similar on logout of KDE4 in rawhide with a nVidia NV34M GeForce FX
Go5200 64M.
Comment 8 Orion Poplawski 2008-04-11 12:51:41 EDT
Completely reproducible for me.  Anything I should test?
Comment 9 Paul Howarth 2008-05-20 04:58:46 EDT
FWIW I'm using the nv driver in Fedora 9 on both my home and work boxes for a
week now and I haven't been hit by this problem (yet). Given the frequency with
which it happened on F8, I suspect the problem is gone in F9.
Comment 10 Paul Howarth 2008-05-20 17:49:14 EDT
Famous last words...

OK so I got hit by it on Fedora 9 now:

--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 14468
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 14468
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 14468
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 14468
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 14468
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = 14468

and so on...
Comment 11 Bug Zapper 2008-11-26 03:37:56 EST
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 12 Paul Howarth 2008-11-27 05:19:38 EST
This is still an issue on Fedora 9. Haven't scheduled an update of my home box to Fedora 10 yet but it'll happen before the new year and we'll see if the problem is still there.
Comment 13 Orion Poplawski 2008-11-27 10:16:54 EST
Any chance this is the same as https://bugs.freedesktop.org/show_bug.cgi?id=6111 ?  That seems to have been a AGP aperature size mismatch issue.
Comment 14 Paul Howarth 2009-04-23 10:56:51 EDT
(In reply to comment #13)
> Any chance this is the same as
> https://bugs.freedesktop.org/show_bug.cgi?id=6111 ?  That seems to have been a
> AGP aperature size mismatch issue.  

Don't think so. My card is PCI Express and there are no BIOS settings for AGP aperature size as there's no AGP slot on the motherboard.

Still an issue on F-10.

Will try again on F-11 when it's released.
Comment 15 Bug Zapper 2009-06-09 19:13:34 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 16 Paul Howarth 2009-06-10 02:48:36 EDT
Bumping version to 10 as it's still a problem on F-10.
Comment 17 Bug Zapper 2009-11-18 07:23:21 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 18 Paul Howarth 2009-11-18 07:58:35 EST
In view of the problems I had with nvidia drivers, I have replaced all of my nvidia cards with ATI ones and am currently using those without problems.

I can therefore no longer provide any help on this issue.
Comment 19 mdidomenico 2013-12-08 09:01:35 EST
I'm seeing this same problem with RHEL 6.4 and an Nvidia graphics card.

I see the original poster swapped out cards, I'm not willing to do that.  Even though this is a fairly old bug, does the backtrace point to anything specific.  My machine will lock up randomly, it doesn't happen all the time, but it most often happens when firefox is called right after a rdesktop session is closed

[143259.190] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x0000ee90, 0x00004b84)
[143266.190] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x0000ee90, 0x00004b84)
[143269.192] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x0000ee90, 0x0000c664)
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
(EE) 
(EE) Backtrace:
(EE) 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x46ced6]
(EE) 1: /usr/bin/Xorg (mieqEnqueue+0x273) [0x593df3]
(EE) 2: /usr/bin/Xorg (QueuePointerEvents+0x4e) [0x44f8de]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f1df070e000+0x4731) [0x7f1df0712731]
(EE) 4: /usr/bin/Xorg (0x400000+0x8b6f7) [0x48b6f7]
(EE) 5: /usr/bin/Xorg (0x400000+0xb62db) [0x4b62db]
(EE) 6: /lib64/libpthread.so.0 (0x3a7f200000+0xf500) [0x3a7f20f500]
(EE) 7: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x63dcb) [0x7f1df1519dcb]
(EE) 8: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x64791) [0x7f1df151a791]
(EE) 9: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0xdca0a) [0x7f1df1592a0a]
(EE) 10: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x93ca2) [0x7f1df1549ca2]
(EE) 11: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x4d310c) [0x7f1df198910c]
(EE) 12: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x4ae459) [0x7f1df1964459]
(EE) 13: /usr/bin/Xorg (BlockHandler+0x4a) [0x43b70a]
(EE) 14: /usr/bin/Xorg (WaitForSomething+0x15c) [0x46a5bc]
(EE) 15: /usr/bin/Xorg (0x400000+0x379d2) [0x4379d2]
(EE) 16: /usr/bin/Xorg (0x400000+0x7cd2a) [0x47cd2a]
(EE) 17: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3a7ee1ecdd]
(EE) 18: /usr/bin/Xorg (0x400000+0x260b9) [0x4260b9]
(EE) 
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause.  It is a victim.
(EE) [mi] EQ overflow continuing.  100 events have been dropped.
(EE) 
(EE) Backtrace:
(EE) 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x46ced6]
(EE) 1: /usr/bin/Xorg (QueuePointerEvents+0x4e) [0x44f8de]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f1df070e000+0x4731) [0x7f1df0712731]
(EE) 3: /usr/bin/Xorg (0x400000+0x8b6f7) [0x48b6f7]
(EE) 4: /usr/bin/Xorg (0x400000+0xb62db) [0x4b62db]
(EE) 5: /lib64/libpthread.so.0 (0x3a7f200000+0xf500) [0x3a7f20f500]
(EE) 6: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x63dcb) [0x7f1df1519dcb]
(EE) 7: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x64791) [0x7f1df151a791]
(EE) 8: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0xdca0a) [0x7f1df1592a0a]
(EE) 9: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x93ca2) [0x7f1df1549ca2]
(EE) 10: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x4d310c) [0x7f1df198910c]
(EE) 11: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x4ae459) [0x7f1df1964459]
(EE) 12: /usr/bin/Xorg (BlockHandler+0x4a) [0x43b70a]
(EE) 13: /usr/bin/Xorg (WaitForSomething+0x15c) [0x46a5bc]
(EE) 14: /usr/bin/Xorg (0x400000+0x379d2) [0x4379d2]
(EE) 15: /usr/bin/Xorg (0x400000+0x7cd2a) [0x47cd2a]
(EE) 16: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3a7ee1ecdd]
(EE) 17: /usr/bin/Xorg (0x400000+0x260b9) [0x4260b9]
(EE) 
(EE) [mi] EQ overflow continuing.  200 events have been dropped.
(EE) 
(EE) Backtrace:
(EE) 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x46ced6]
(EE) 1: /usr/bin/Xorg (QueuePointerEvents+0x4e) [0x44f8de]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f1df070e000+0x4731) [0x7f1df0712731]
(EE) 3: /usr/bin/Xorg (0x400000+0x8b6f7) [0x48b6f7]
(EE) 4: /usr/bin/Xorg (0x400000+0xb62db) [0x4b62db]
(EE) 5: /lib64/libpthread.so.0 (0x3a7f200000+0xf500) [0x3a7f20f500]
(EE) 6: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x63dcb) [0x7f1df1519dcb]
(EE) 7: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x64791) [0x7f1df151a791]
(EE) 8: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0xdca0a) [0x7f1df1592a0a]
(EE) 9: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x93ca2) [0x7f1df1549ca2]
(EE) 10: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x4d310c) [0x7f1df198910c]
(EE) 11: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1df14b6000+0x4ae459) [0x7f1df1964459]
(EE) 12: /usr/bin/Xorg (BlockHandler+0x4a) [0x43b70a]
(EE) 13: /usr/bin/Xorg (WaitForSomething+0x15c) [0x46a5bc]
(EE) 14: /usr/bin/Xorg (0x400000+0x379d2) [0x4379d2]
(EE) 15: /usr/bin/Xorg (0x400000+0x7cd2a) [0x47cd2a]
(EE) 16: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3a7ee1ecdd]
(EE) 17: /usr/bin/Xorg (0x400000+0x260b9) [0x4260b9]
(EE) 
[143276.191] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x0000ee90, 0x0000c664)
[143276.191] [mi] Increasing EQ size to 1024 to prevent dropped events.
[143276.191] [mi] EQ processing has resumed after 227 dropped events.
[143276.191] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.

Note You need to log in before you can comment on or make changes to this bug.