Description of problem: Beaker job is aborted when test rpm is not downloaded Version-Release number of selected component (if applicable): Version - 0.11.3 How reproducible: rarely Steps to Reproduce: 1, sometime (for unknown reason to me) yum doesn't download rpm: https://beaker.engineering.redhat.com/recipes/844894#task11845642 #Dont know why rpm cannot be downloaded. It's possible that someone could create another rpm in that time but beaker shouldn't abort the whole job. Actual results: console.log 2006-12-31 10:00:40,254 backend.twisted emit: ERROR Unhandled Error Traceback (most recent call last): File "/usr/bin/beah-beaker-backend", line 9, in <module> load_entry_point('beah==0.6.43.dev201303102204', 'console_scripts', 'beah-beaker-backend')() File "/usr/lib/python2.7/site-packages/beah/backends/beakerlc.py", line 2007, in main debug.runcall(reactor.run) Fi ? wait_for_xmitr+0xa0/0xa0 [ 266 [-- MARK -- Wed Apr 10 03:55:00 2013] [-- MARK -- Wed Apr 10 04:00:00 2013] ... [-- MARK -- Wed Apr 10 06:10:00 2013] [-- MARK -- Wed Apr 10 06:15:00 2013] -------- job aborted Expected results: no abort for the whole job Additional info: https://beaker.engineering.redhat.com/recipes/842046#task11796119 https://beaker.engineering.redhat.com/recipes/844894
(In reply to comment #0) > Description of problem: > Beaker job is aborted when test rpm is not downloaded > > Version-Release number of selected component (if applicable): > Version - 0.11.3 > > How reproducible: > rarely > > Steps to Reproduce: > 1, sometime (for unknown reason to me) yum doesn't download rpm: > https://beaker.engineering.redhat.com/recipes/844894#task11845642 > #Dont know why rpm cannot be downloaded. It's possible that someone could > create another rpm in that time but beaker shouldn't abort the whole job. The util-linux-ng package wasn't installed because it's not present in the RHEL7 tree you used. $ repoquery --disablerepo=* --enablerepo=RHEL-7.0-20130306.0 --repofrompath=RHEL-7.0-20130306.0,http://download.eng.bos.redhat.com/rel-eng/RHEL-7.0-20130306.0/compose/Server/x86_64/os/ util-linux-ng $ repoquery --disablerepo=* --enablerepo=RHEL-7.0-20130306.0 --repofrompath=RHEL-7.0-20130306.0,http://download.eng.bos.redhat.com/rel-eng/RHEL-7.0-20130306.0/compose/Server/x86_64/os/ util-linux util-linux-0:2.22.1-2.4.el7.x86_64 But that didn't abort your job. The actual error seems to be here: 2013-04-09 21:49:25,697 backend async_proc: INFO Extending Watchdog for task 11845649 by 9000.. 04/09/13 21:49:25 JobID:402207 Test:/CoreOS/vixie-cron/Regression/bug-232439_fail_on_first_Jan Response:1 2013-04-09 21:49:25,804 rhts_task checkin_start: INFO setting nohup 04/09/13 21:49:25 testID:11845649 start: 2006-12-31 09:56:00,282 backend.twisted emit: ERROR Unhandled Error Traceback (most recent call last): File "/usr/bin/beah-beaker-backend", line 9, in <module> load_entry_point('beah==0.6.43.dev201303102204', 'console_scripts', 'beah-beaker-backend')() File "/usr/lib/python2.7/site-packages/beah/backends/beakerlc.py", line 2007, in main debug.runcall(reactor.run) File "/usr/lib/python2.7/site-packages/beah/core/debug.py", line 11, in runcall a_callable(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1169, in run self.mainLoop() --- <exception caught here> --- File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1181, in mainLoop self.doIteration(t) File "/usr/lib64/python2.7/site-packages/twisted/internet/epollreactor.py", line 362, in doPoll l = self._poller.poll(timeout, len(self._selectables)) exceptions.OverflowError: timeout is too large The OverflowError is repeated forever until the watchdog aborted the job. I'm not sure why this would happen, it seems like it must be a harness bug. Particularly since you had the same thing happen at the same point in your recipe on another system. I also noticed on the console log for R:844894 a very large number of RAID and SCSI offline errors from the kernel. Are those expected as part of the util-linux tests?
Hi Petr, as per Dan's question above, could you provide a bit more info on the expected impact of the util-linux tests?
Hi, I thing that this is not due to util-linux(-ng ) on the rhel7. We have set of tier tests with +-100 tests for the whole team. Some user will create more updates in one of the test during scheduling job and bump the version more times .... Then the whole job is aborted instead of one fail. I will try it to be sure, I let you know.
Petr, bug 880855 affected versions prior to Beaker 0.13 and could result in jobs failing due to new task versions being uploaded. That's not the bug covered by this issue though - we're interested in the OverflowError noted above.
I was trying to reproduce but I didn't succeed with it. I tried the same sets of tests and I works now. ( J:506814 or J:506813 ) FYI util-linux(|-ng) test cases does not expect any raid/scsi error.
OK, we made a few reliability improvements to both beah and task repo creation over the last few releases, so it's quite plausible that this has been fixed since it was first encountered. Closing this one - please file a new bug report if you have anything similar recur.