Description of Problem: At 02:00:31 this morning my X server hung. Strangely the mouse was still working even though none of my current X processes could output any changes to the screen. I could not select windows or do anything on the desktop. Attempts to start new X processes, while logged in via SSH from annother machine were unsuccessful. Also the keyboard was non-functional, and I was not able to even switch to a different VT. After logging in via SSH from annother workstation, I noticed that the system load was hovering around 5 (on a VA Linux 420, i810 m/b, Celeron 550MHz, 256MB RAM). TOP showed that "X" was soaking up all the available CPU, but luckilly only had an RSS of 17MB. An strace of the X process showed: 02:21:04.499334 [4010f581] gettimeofday({1017908464, 500095}, NULL) = 0 <0.000044> 02:21:04.500281 [4010f581] gettimeofday({1017908464, 500531}, NULL) = 0 <0.000886> 02:21:04.501463 [4010f581] gettimeofday({1017908464, 501555}, NULL) = 0 <0.000012> 02:21:04.501630 [4010f581] gettimeofday({1017908464, 501682}, NULL) = 0 <0.000013> 02:21:04.501755 [4010f581] gettimeofday({1017908464, 501807}, NULL) = 0 <0.000011> 02:21:04.501880 [4010f581] gettimeofday({1017908464, 501933}, NULL) = 0 <0.000012> 02:21:04.502007 [4010f581] gettimeofday({1017908464, 502059}, NULL) = 0 <0.000012> 02:21:04.502133 [4010f581] gettimeofday({1017908464, 502185}, NULL) = 0 <0.000012> 02:21:04.502260 [4010f581] gettimeofday({1017908464, 502313}, NULL) = 0 <0.000011> 02:21:04.502387 [4010f581] gettimeofday({1017908464, 502439}, NULL) = 0 <0.000012> Note: I don't show it here, but the X server process was responding to SIGIO, generated when I moved the mouse. I don't know why the keyboard didn't work similarily. Searching around for annother report/solution for this problem turned up this relevant link at xfree86.org: http://www.xfree86.org/pipermail/xpert/2001-January/004736.html This thread shows that annother person had their X server hang, likely because the system time changed underneath the X server causing the X protocol messages to have inconsistant timestamps. In their case it was XFree86 4.0.2 looping in the WaitForSomething function, called from Dispatch(). In my case I am running DJB's clockspeed to synch the clock, it seems possible to me that clockspeed corrected some minor drift in the system time and caused some sort of internal race within the X server process. Just my guess. We do run clockadj to manually reset the system time, but that runs out of cron.daily which doesn't kick off until 04:02 AM, and according to gkrellm the X server stopped at 02:00:31 AM. ps output: root 6313 2.9 6.7 113820 17232 ? R< Mar26 384:25 /etc/X11/X :0 -auth /var/gdm/:0.Xauth Note: I run the X server at nice -10, I also run artsd at nice -20, and I was playing MP3's over NFS when this happened. It's all Rob Zombie's fault 8^). root 5396 0.0 0.1 1352 320 pts/9 S 02:32 0:00 supervise /var/service/clockspeed root 5398 0.0 0.1 1348 300 pts/9 S 02:32 0:00 /etc/clockspeed/bin/clockspeed Note: I restarted clockspeed myself while diagnosing this problem. Version-Release number of selected component (if applicable): XFree86-4.1.0-15 kernel-2.4.9-21 (I have 2.4.9-31 installed but I haven't rebooted yet) How Reproducible: Not sure, probably putz with the system time while X is running. I may try an experiment but I also want to go home and catch some ZZzzzz's. 8^) Steps to Reproduce: 1. 2. 3. Actual Results: X server hangs, mouse works but it does not accept input, programs do not output and any new programs that try to talk to the X server just sit there looking dumb. Expected Results: It should keep on trucking. Additional Information: URLs and file snippets listed above
Created attachment 52166 [details] Strace summary
This problem is often reported, and is common to all releases of XFree86 and X11. The problem only occurs of course when someone or some event changes the time out from underneath X. It is indeed considered a bug, but it is something that the upstream X and XFree86 maintainers should be made aware of, and something they should address. This sort of problem should be fixed by someone extremely familiar with the areas of code which cause the problem. As such, I suggest reporting the problem upstream in hopes it gets fixed in a future release, and hopefully fixes make it into current releases as well. Due to infamiliarity with the code, I'll leave the bug report open, and apply any fixes upstream can provide once they've fixed the problem.
Thank you very much for your response and your candor. I filed it with the bug form at http://www.xfree86.org/cgi-bin/bugform.cgi. Unfortunately they don't seem to hand out serial numbers for bug reports. Thanks --Mark
Please read the bug report I'm duping this against, and follow testing instructions at the bottom of it. All followups in the dupe report please, thanks. *** This bug has been marked as a duplicate of 63509 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.