Bug 62678

Summary: X hangs, loops on gettimeofday()
Product: [Retired] Red Hat Linux Reporter: Mark Tinberg <mtinberg>
Component: XFree86Assignee: Mike A. Harris <mharris>
Status: CLOSED DUPLICATE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-21 18:48:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Strace summary none

Description Mark Tinberg 2002-04-04 09:36:58 UTC
Description of Problem:

At 02:00:31 this morning my X server hung.  Strangely the mouse was still working even though none of my current X processes could output any changes to the screen.  I could not select windows or do anything on the desktop.  Attempts to start new X processes, while logged in via SSH from annother machine were unsuccessful.  Also the keyboard was non-functional, and I was not able to even switch to a different VT.

After logging in via SSH from annother workstation, I noticed that the system load was hovering around 5 (on a VA Linux 420, i810 m/b, Celeron 550MHz, 256MB RAM).  TOP showed that "X" was soaking up all the available CPU, but luckilly only had an RSS of 17MB.  An strace of the X process showed:


02:21:04.499334 [4010f581] gettimeofday({1017908464, 500095}, NULL)             = 0 <0.000044>
02:21:04.500281 [4010f581] gettimeofday({1017908464, 500531}, NULL)             = 0 <0.000886>
02:21:04.501463 [4010f581] gettimeofday({1017908464, 501555}, NULL)             = 0 <0.000012>
02:21:04.501630 [4010f581] gettimeofday({1017908464, 501682}, NULL)             = 0 <0.000013>
02:21:04.501755 [4010f581] gettimeofday({1017908464, 501807}, NULL)             = 0 <0.000011>
02:21:04.501880 [4010f581] gettimeofday({1017908464, 501933}, NULL)             = 0 <0.000012>
02:21:04.502007 [4010f581] gettimeofday({1017908464, 502059}, NULL)             = 0 <0.000012>
02:21:04.502133 [4010f581] gettimeofday({1017908464, 502185}, NULL)             = 0 <0.000012>
02:21:04.502260 [4010f581] gettimeofday({1017908464, 502313}, NULL)             = 0 <0.000011>
02:21:04.502387 [4010f581] gettimeofday({1017908464, 502439}, NULL)             = 0 <0.000012>

Note:  I don't show it here, but the X server process was responding to SIGIO, generated when I moved the mouse.  I don't know why the keyboard didn't work similarily.

Searching around for annother report/solution for this problem turned up this relevant link at xfree86.org:

http://www.xfree86.org/pipermail/xpert/2001-January/004736.html

This thread shows that annother person had their X server hang, likely because the system time changed underneath the X server causing the X protocol messages to have inconsistant timestamps.   In their case it was XFree86 4.0.2 looping in the WaitForSomething function, called from Dispatch().

In my case I am running DJB's clockspeed to synch the clock, it seems possible to me that clockspeed corrected some minor drift in the system time and caused some sort of internal race within the X server process.  Just my guess.

We do run clockadj to manually reset the system time, but that runs out of cron.daily which doesn't kick off until 04:02 AM, and according to gkrellm the X server stopped at 02:00:31 AM.

ps output:

root      6313  2.9  6.7 113820 17232 ?      R<   Mar26 384:25 /etc/X11/X :0 -auth /var/gdm/:0.Xauth

Note:  I run the X server at nice -10, I also run artsd at nice -20, and I was playing MP3's over NFS when this happened.  It's all Rob Zombie's fault 8^).

root      5396  0.0  0.1  1352  320 pts/9    S    02:32   0:00 supervise /var/service/clockspeed
root      5398  0.0  0.1  1348  300 pts/9    S    02:32   0:00 /etc/clockspeed/bin/clockspeed

Note:  I restarted clockspeed myself while diagnosing this problem.

Version-Release number of selected component (if applicable):

XFree86-4.1.0-15
kernel-2.4.9-21 (I have 2.4.9-31 installed but I haven't rebooted yet)

How Reproducible:

Not sure, probably putz with the system time while X is running.  I may try an experiment but I also want to go home and catch some ZZzzzz's. 8^)

Steps to Reproduce:
1. 
2. 
3. 

Actual Results:

X server hangs, mouse works but it does not accept input, programs do not output and any new programs that try to talk to the X server just sit there looking dumb.

Expected Results:

It should keep on trucking.

Additional Information:
	
URLs and file snippets listed above

Comment 1 Mark Tinberg 2002-04-04 09:38:10 UTC
Created attachment 52166 [details]
Strace summary

Comment 2 Mike A. Harris 2002-04-19 22:10:13 UTC
This problem is often reported, and is common to all releases of XFree86
and X11.  The problem only occurs of course when someone or some event
changes the time out from underneath X.  It is indeed considered a bug, but
it is something that the upstream X and XFree86 maintainers should be made
aware of, and something they should address.  This sort of problem should
be fixed by someone extremely familiar with the areas of code which cause
the problem.

As such, I suggest reporting the problem upstream in hopes it gets fixed
in a future release, and hopefully fixes make it into current releases
as well.  Due to infamiliarity with the code, I'll leave the bug report
open, and apply any fixes upstream can provide once they've fixed the
problem.



Comment 3 Mark Tinberg 2002-04-20 01:41:27 UTC
Thank you very much for your response and your candor.  I filed it with the 
bug form at http://www.xfree86.org/cgi-bin/bugform.cgi.  Unfortunately they 
don't seem to hand out serial numbers for bug reports.

Thanks

--Mark

Comment 4 Mike A. Harris 2002-12-19 10:05:55 UTC
Please read the bug report I'm duping this against, and follow testing
instructions at the bottom of it.  All followups in the dupe report please,
thanks.

*** This bug has been marked as a duplicate of 63509 ***

Comment 5 Red Hat Bugzilla 2006-02-21 18:48:40 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.