Bug 849587 - exceptions.OverflowError: timeout is too large
exceptions.OverflowError: timeout is too large
Status: CLOSED DUPLICATE of bug 639255
Product: Beaker
Classification: Community
Component: beah (Show other bugs)
0.9
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: Bill Peck
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-20 05:40 EDT by Jan Stancek
Modified: 2014-08-12 00:34 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-14 08:55:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jan Stancek 2012-08-20 05:40:22 EDT
Description of problem:
While running RHEL7 recipe, harness appeared to be stuck in a loop and kept emitting following error to console.log:

2012-08-20 03:34:49,093 rhts_task.twisted emit: ERROR Unhandled Error 
Traceback (most recent call last): 
  File "/bin/beah-rhts-task", line 9, in <module> 
    load_entry_point('beah==0.6.40.dev201208031240', 'console_scripts', 'beah-rhts-task')() 
  File "/usr/lib/python2.7/site-packages/beah/tasks/rhts_xmlrpc.py", line 677, in main 
    reactor.run() 
  File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1169, in run 
    self.mainLoop() 
--- <exception caught here> --- 
  File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1181, in mainLoop 
    self.doIteration(t) 
  File "/usr/lib64/python2.7/site-packages/twisted/internet/epollreactor.py", line 362, in doPoll 
    l = self._poller.poll(timeout, len(self._selectables)) 
exceptions.OverflowError: timeout is too large 

After ~30min I rebooted host from WEBUI and recipe started to progress.

Version-Release number of selected component (if applicable):
rhts-test-env.noarch 0:4.51-1.el7
RHEL-7.0-20120711.2 Server ppc64

How reproducible:
sporadic

Steps to Reproduce:
1. provision RHEL-7.0-20120711.2 Server ppc64

Actual results:
harness was emitting same error for ~30min (presumably stuck)

Expected results:
harness should not get stuck and keep filling logs with this message

Additional info:
Comment 2 Marian Csontos 2012-09-14 08:51:30 EDT
Likely the problem is related to "unstable" system time:

    02/14/14 05:40:09  testID:7516670 start: 
    2012-08-20 03:34:49,093 rhts_task.twisted emit: ERROR Unhandled Error 
    ...

So the time was shift 2 years back. Apparently it is too much for select/epoll...

We had seen the issue in other job where the console log got stuck on:

    09/14/12 12:42:55  testID:7940822 start: 
    2012-09-14 12:42:55,774 backend _send_cmd: INFO Command ['Command', 'forward', 'efec2e09-190b-462b-a349-188d3c2a25a0', {'event': ['Event', 'extend_watchdog', '5ab5aa1a-dc94-4be2-aa39-7b4781eaed1b', {'source': 'beah_beaker_backend', 'id': 'c5b0053b-1a4a-4b79-8d1b-eec56280399f'}, None, {'timeout': 2994}]}] sent. 
    [-- MARK -- Fri Sep 14 04:45:00 2012] 
    2012-09-14 12:45:08,723 rhts_task checkin_finish: INFO resetting nohup 
    09/14/12 12:45:08  testID:7940822 finish: 
    2012-09-14 12:45:08,772 rhts_task task_exited: INFO task_exited([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. 
]) 
    2012-09-14 12:45:08,773 rhts_task on_exit: INFO quitting... 
    2012-09-14 12:45:08,773 rhts_task task_ended: INFO task_ended([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. 
]) 

`beah-rhts-task` process was hang here, waiting for "1 second", but the time was shifted few hours back during the second, so it was actually waiting for few hours.

Shifting time 4 hours to the future has unstucked the job.

*Solution?*

twisted handles delayedCall using system time (time.time) - see _timeFunction in twisted.python.runtime.

Not sure changing the _timeFunction for unix systems to use uptime would be safe.

Function getting uptime in seconds has to be (somehow?) passed to:
- all DelayedCall (easy: we could pass it as seconds argument?)[1]
  - but it does not deal with any internal ocurences
- all reactor.callLater calls :-/
  - by setting reactor.seconds to the function?

Also time.time is used elsewhere to get time-delta.

[1] http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.base.DelayedCall.html#__init__
Comment 3 Marian Csontos 2012-09-14 08:55:14 EDT

*** This bug has been marked as a duplicate of bug 639255 ***

Note You need to log in before you can comment on or make changes to this bug.