Bug 62678 - X hangs, loops on gettimeofday()
X hangs, loops on gettimeofday()
Status: CLOSED DUPLICATE of bug 63509
Product: Red Hat Linux
Classification: Retired
Component: XFree86 (Show other bugs)
7.2
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Mike A. Harris
David Lawrence
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-04-04 04:36 EST by Mark Tinberg
Modified: 2007-04-18 12:41 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 13:48:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Strace summary (557 bytes, text/plain)
2002-04-04 04:38 EST, Mark Tinberg
no flags Details

  None (edit)
Description Mark Tinberg 2002-04-04 04:36:58 EST
Description of Problem:

At 02:00:31 this morning my X server hung.  Strangely the mouse was still working even though none of my current X processes could output any changes to the screen.  I could not select windows or do anything on the desktop.  Attempts to start new X processes, while logged in via SSH from annother machine were unsuccessful.  Also the keyboard was non-functional, and I was not able to even switch to a different VT.

After logging in via SSH from annother workstation, I noticed that the system load was hovering around 5 (on a VA Linux 420, i810 m/b, Celeron 550MHz, 256MB RAM).  TOP showed that "X" was soaking up all the available CPU, but luckilly only had an RSS of 17MB.  An strace of the X process showed:


02:21:04.499334 [4010f581] gettimeofday({1017908464, 500095}, NULL)             = 0 <0.000044>
02:21:04.500281 [4010f581] gettimeofday({1017908464, 500531}, NULL)             = 0 <0.000886>
02:21:04.501463 [4010f581] gettimeofday({1017908464, 501555}, NULL)             = 0 <0.000012>
02:21:04.501630 [4010f581] gettimeofday({1017908464, 501682}, NULL)             = 0 <0.000013>
02:21:04.501755 [4010f581] gettimeofday({1017908464, 501807}, NULL)             = 0 <0.000011>
02:21:04.501880 [4010f581] gettimeofday({1017908464, 501933}, NULL)             = 0 <0.000012>
02:21:04.502007 [4010f581] gettimeofday({1017908464, 502059}, NULL)             = 0 <0.000012>
02:21:04.502133 [4010f581] gettimeofday({1017908464, 502185}, NULL)             = 0 <0.000012>
02:21:04.502260 [4010f581] gettimeofday({1017908464, 502313}, NULL)             = 0 <0.000011>
02:21:04.502387 [4010f581] gettimeofday({1017908464, 502439}, NULL)             = 0 <0.000012>

Note:  I don't show it here, but the X server process was responding to SIGIO, generated when I moved the mouse.  I don't know why the keyboard didn't work similarily.

Searching around for annother report/solution for this problem turned up this relevant link at xfree86.org:

http://www.xfree86.org/pipermail/xpert/2001-January/004736.html

This thread shows that annother person had their X server hang, likely because the system time changed underneath the X server causing the X protocol messages to have inconsistant timestamps.   In their case it was XFree86 4.0.2 looping in the WaitForSomething function, called from Dispatch().

In my case I am running DJB's clockspeed to synch the clock, it seems possible to me that clockspeed corrected some minor drift in the system time and caused some sort of internal race within the X server process.  Just my guess.

We do run clockadj to manually reset the system time, but that runs out of cron.daily which doesn't kick off until 04:02 AM, and according to gkrellm the X server stopped at 02:00:31 AM.

ps output:

root      6313  2.9  6.7 113820 17232 ?      R<   Mar26 384:25 /etc/X11/X :0 -auth /var/gdm/:0.Xauth

Note:  I run the X server at nice -10, I also run artsd at nice -20, and I was playing MP3's over NFS when this happened.  It's all Rob Zombie's fault 8^).

root      5396  0.0  0.1  1352  320 pts/9    S    02:32   0:00 supervise /var/service/clockspeed
root      5398  0.0  0.1  1348  300 pts/9    S    02:32   0:00 /etc/clockspeed/bin/clockspeed

Note:  I restarted clockspeed myself while diagnosing this problem.

Version-Release number of selected component (if applicable):

XFree86-4.1.0-15
kernel-2.4.9-21 (I have 2.4.9-31 installed but I haven't rebooted yet)

How Reproducible:

Not sure, probably putz with the system time while X is running.  I may try an experiment but I also want to go home and catch some ZZzzzz's. 8^)

Steps to Reproduce:
1. 
2. 
3. 

Actual Results:

X server hangs, mouse works but it does not accept input, programs do not output and any new programs that try to talk to the X server just sit there looking dumb.

Expected Results:

It should keep on trucking.

Additional Information:
	
URLs and file snippets listed above
Comment 1 Mark Tinberg 2002-04-04 04:38:10 EST
Created attachment 52166 [details]
Strace summary
Comment 2 Mike A. Harris 2002-04-19 18:10:13 EDT
This problem is often reported, and is common to all releases of XFree86
and X11.  The problem only occurs of course when someone or some event
changes the time out from underneath X.  It is indeed considered a bug, but
it is something that the upstream X and XFree86 maintainers should be made
aware of, and something they should address.  This sort of problem should
be fixed by someone extremely familiar with the areas of code which cause
the problem.

As such, I suggest reporting the problem upstream in hopes it gets fixed
in a future release, and hopefully fixes make it into current releases
as well.  Due to infamiliarity with the code, I'll leave the bug report
open, and apply any fixes upstream can provide once they've fixed the
problem.

Comment 3 Mark Tinberg 2002-04-19 21:41:27 EDT
Thank you very much for your response and your candor.  I filed it with the 
bug form at http://www.xfree86.org/cgi-bin/bugform.cgi.  Unfortunately they 
don't seem to hand out serial numbers for bug reports.

Thanks

--Mark
Comment 4 Mike A. Harris 2002-12-19 05:05:55 EST
Please read the bug report I'm duping this against, and follow testing
instructions at the bottom of it.  All followups in the dupe report please,
thanks.

*** This bug has been marked as a duplicate of 63509 ***
Comment 5 Red Hat Bugzilla 2006-02-21 13:48:40 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.