Bug 849587
| Summary: | exceptions.OverflowError: timeout is too large | ||
|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | Jan Stancek <jstancek> |
| Component: | beah | Assignee: | Bill Peck <bpeck> |
| Status: | CLOSED DUPLICATE | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 0.9 | CC: | bpeck, dcallagh, gozen, jburke, kbaker, mcsontos, pbunyan, rmancy |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-09-14 12:55:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Likely the problem is related to "unstable" system time:
02/14/14 05:40:09 testID:7516670 start:
2012-08-20 03:34:49,093 rhts_task.twisted emit: ERROR Unhandled Error
...
So the time was shift 2 years back. Apparently it is too much for select/epoll...
We had seen the issue in other job where the console log got stuck on:
09/14/12 12:42:55 testID:7940822 start:
2012-09-14 12:42:55,774 backend _send_cmd: INFO Command ['Command', 'forward', 'efec2e09-190b-462b-a349-188d3c2a25a0', {'event': ['Event', 'extend_watchdog', '5ab5aa1a-dc94-4be2-aa39-7b4781eaed1b', {'source': 'beah_beaker_backend', 'id': 'c5b0053b-1a4a-4b79-8d1b-eec56280399f'}, None, {'timeout': 2994}]}] sent.
[-- MARK -- Fri Sep 14 04:45:00 2012]
2012-09-14 12:45:08,723 rhts_task checkin_finish: INFO resetting nohup
09/14/12 12:45:08 testID:7940822 finish:
2012-09-14 12:45:08,772 rhts_task task_exited: INFO task_exited([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0.
])
2012-09-14 12:45:08,773 rhts_task on_exit: INFO quitting...
2012-09-14 12:45:08,773 rhts_task task_ended: INFO task_ended([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0.
])
`beah-rhts-task` process was hang here, waiting for "1 second", but the time was shifted few hours back during the second, so it was actually waiting for few hours.
Shifting time 4 hours to the future has unstucked the job.
*Solution?*
twisted handles delayedCall using system time (time.time) - see _timeFunction in twisted.python.runtime.
Not sure changing the _timeFunction for unix systems to use uptime would be safe.
Function getting uptime in seconds has to be (somehow?) passed to:
- all DelayedCall (easy: we could pass it as seconds argument?)[1]
- but it does not deal with any internal ocurences
- all reactor.callLater calls :-/
- by setting reactor.seconds to the function?
Also time.time is used elsewhere to get time-delta.
[1] http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.base.DelayedCall.html#__init__
*** This bug has been marked as a duplicate of bug 639255 *** |
Description of problem: While running RHEL7 recipe, harness appeared to be stuck in a loop and kept emitting following error to console.log: 2012-08-20 03:34:49,093 rhts_task.twisted emit: ERROR Unhandled Error Traceback (most recent call last): File "/bin/beah-rhts-task", line 9, in <module> load_entry_point('beah==0.6.40.dev201208031240', 'console_scripts', 'beah-rhts-task')() File "/usr/lib/python2.7/site-packages/beah/tasks/rhts_xmlrpc.py", line 677, in main reactor.run() File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1169, in run self.mainLoop() --- <exception caught here> --- File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1181, in mainLoop self.doIteration(t) File "/usr/lib64/python2.7/site-packages/twisted/internet/epollreactor.py", line 362, in doPoll l = self._poller.poll(timeout, len(self._selectables)) exceptions.OverflowError: timeout is too large After ~30min I rebooted host from WEBUI and recipe started to progress. Version-Release number of selected component (if applicable): rhts-test-env.noarch 0:4.51-1.el7 RHEL-7.0-20120711.2 Server ppc64 How reproducible: sporadic Steps to Reproduce: 1. provision RHEL-7.0-20120711.2 Server ppc64 Actual results: harness was emitting same error for ~30min (presumably stuck) Expected results: harness should not get stuck and keep filling logs with this message Additional info: