From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040312 Description of problem: X server hangs in a mutex wait daily at around 4am. X server log indicates a crash of some sort, but X server is still spinning in mutex wait. XFree log, "strace -p PID" output, and "gdb -p PID" output are attached. Note: my machine has nvidia card. This has happened to one of my coworkers who also has an nvidia card. Version-Release number of selected component (if applicable): xorg-x11-0.0.6.6-0.0.2004_03_11.9 How reproducible: Sometimes Steps to Reproduce: 1. This happens regularly at about the same time of day. 2. 3. Actual Results: X screen has died, system is at the console login prompt on pty1 "ps ax" shows X process still running. killing primary gdm-binary process restarts it all correctly. Additional info: # strace -p PID Process 5062 attached - interrupt to quit futex(0xd20760, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call) --- SIGALRM (Alarm clock) @ 0 (0) --- sigreturn() = ? (mask now [SEGV IO]) futex(0xd20760, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call) --- SIGALRM (Alarm clock) @ 0 (0) --- sigreturn() = ? (mask now [SEGV IO]) #### gdb -p PID (gdb) where #0 0x00bf07a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00cd004e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6 #2 0x00c6c710 in _L_mutex_lock_7373 () from /lib/tls/libc.so.6 #3 0x08fcc480 in ?? () #4 0x00000000 in ?? () (gdb) quit
Created attachment 99120 [details] combined strace, and gdb output, and XFree86.0.log
You can not debug the X server via gdb nor strace the X server from a terminal inside the X server you are debugging. That will never work, and is not a bug. The only way to strace or debug the X server, is by having 2 computers via ethernet or serial cable or similar, and debugging the X server via remote shell to the computer running the X server.
Comment #2 is entirely irrelevant to the situation described. The X server is waiting for a mutex apparently held by a process/thread that has either died, or forgotten it held the mutex lock. No commands were running from terminal sessions run by the X server at the times of the problem. (I am NOT in working at 4am. Sorry, my body is not 24/7). Very likely the problem is cron related, but how? My goal is to: 1. find out which process/thread holds the lock, 2. Find out why it died. 3. Find out why the kernel didn't clean up the lock if the process died. The "strace" and "gdb" commands described here were run from a different virtual terminal. Even gdb works fine, as long as you don't stop the X server at a breakpoint and then hot-key to the X server's window :-)
To preclude confusion: My X server was again hung when I came in this morning, as was my colleague's. This time I ssh'd in from another machine, ran strace and gdb remotely, as described in comment#2. The X server was once again waiting in a mutex, the output the same as in the previous logs. By the way, neither of us is using the binary nvidia driver. Our installations are both pure test2.
We are unable to reproduce this problem in any OS release. Users who have experienced this problem are encouraged to upgrade to the latest version of Fedora Core, which can be obtained from: http://fedora.redhat.com/download If this issue turns out to still be reproduceable in the latest version of Fedora Core, please file a bug report in the X.Org bugzilla located at http://bugs.freedesktop.org in the "xorg" component. Once you've filed your bug report to X.Org, if you paste the new bug URL here, Red Hat will continue to track the issue in the centralized X.Org bug tracker, and will review any bug fixes that become available for consideration in future updates.