Bug 120065 - X server hangs in mutex
Summary: X server hangs in mutex
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: XFree86
Version: rawhide
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-04-05 19:14 UTC by Steve Knodle
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-10-12 18:14:28 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
combined strace, and gdb output, and XFree86.0.log (51.18 KB, text/plain)
2004-04-05 19:16 UTC, Steve Knodle
no flags Details

Description Steve Knodle 2004-04-05 19:14:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040312

Description of problem:
X server hangs in a mutex wait daily at around 4am.
X server log indicates a crash of some sort, but X server
is still spinning in mutex wait.

XFree log, "strace -p PID" output, and "gdb -p PID" output
are attached.
Note: my machine has nvidia card.  This has happened to one
of my coworkers who also has an nvidia card.

Version-Release number of selected component (if applicable):
xorg-x11-0.0.6.6-0.0.2004_03_11.9

How reproducible:
Sometimes

Steps to Reproduce:
1. This happens regularly at about the same time of day.
2.
3.
    

Actual Results:  X screen has died, system is at the console login
prompt on pty1
"ps ax" shows X process still running.
killing primary gdm-binary process restarts it all correctly.

Additional info:

# strace -p PID
Process 5062 attached - interrupt to quit
futex(0xd20760, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system
call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [SEGV IO])
futex(0xd20760, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system
call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [SEGV IO])

#### gdb -p PID
(gdb) where
#0  0x00bf07a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00cd004e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
#2  0x00c6c710 in _L_mutex_lock_7373 () from /lib/tls/libc.so.6
#3  0x08fcc480 in ?? ()
#4  0x00000000 in ?? ()
(gdb) quit

Comment 1 Steve Knodle 2004-04-05 19:16:26 UTC
Created attachment 99120 [details]
combined strace, and gdb output, and XFree86.0.log

Comment 2 Mike A. Harris 2004-04-06 10:50:35 UTC
You can not debug the X server via gdb nor strace the X server from
a terminal inside the X server you are debugging.  That will never
work, and is not a bug.

The only way to strace or debug the X server, is by having 2
computers via ethernet or serial cable or similar, and debugging
the X server via remote shell to the computer running the X server.

Comment 3 Steve Knodle 2004-04-06 16:02:36 UTC
Comment #2 is entirely irrelevant to the situation described.
The X server is waiting for a mutex apparently held by a
process/thread that has either died, or forgotten it held the mutex lock.
No commands were running from terminal sessions run by the
X server at the times of the problem.  (I am NOT in working 
at 4am.  Sorry, my body is not 24/7).  Very likely the problem
is cron related, but how?

My goal is to:
1. find out which process/thread holds the lock,
2. Find out why it died.
3. Find out why the kernel didn't clean up the lock if the process died.

The "strace" and "gdb" commands described here were run from a 
different virtual terminal. Even gdb works fine, as long as you
don't stop the X server at a breakpoint and then hot-key to the
X server's window :-)

Comment 4 Steve Knodle 2004-04-07 16:18:47 UTC
To preclude confusion:
My X server was again hung when I came in this morning, as was
my colleague's.
This time I ssh'd in from another machine, ran strace and gdb
remotely, as described in comment#2.
The X server was once again waiting in a mutex, the output the
same as in the previous logs.
By the way, neither of us is using the binary nvidia driver.
Our installations are both pure test2.

Comment 5 Mike A. Harris 2004-10-12 18:14:28 UTC
We are unable to reproduce this problem in any OS release.  Users
who have experienced this problem are encouraged to upgrade to the
latest version of Fedora Core, which can be obtained from:

        http://fedora.redhat.com/download

If this issue turns out to still be reproduceable in the latest
version of Fedora Core, please file a bug report in the X.Org
bugzilla located at http://bugs.freedesktop.org in the "xorg"
component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes that
become available for consideration in future updates.


Note You need to log in before you can comment on or make changes to this bug.