All, Notably - the hosts are s390/s390x. best, -pbunyan
Looking on the console log, it seems this is yet another case of the following exception: 2013-04-26 11:40:44,925 backend: ERROR Encoutnered problem while running task '12172480'. Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/beah/backends/beakerlc.py", line 510, in simple_recipe result = task.getResult() File "/usr/lib64/python2.3/site-packages/twisted/internet/defer.py", line 584, in getResult self.result.raiseException() File "/usr/lib64/python2.3/site-packages/twisted/python/failure.py", line 326, in raiseException raise self.type, self.value, self.tb TypeError: exceptions must be classes, instances, or strings (deprecated), not type which we have seen occasionally before. It happens because something is raising an exception inherited from object, which doesn't work in Python < 2.5. So there would have been a *real* error, but the details are lost now. I've audited the beah source for exceptions inheriting from object, and I can't find any. It could be originating in Twisted. I also checked the server logs to see if I could find a corresponding error that might have caused the exception on the harness side, but I couldn't find anything there either. Right now I don't have any other ideas. I am going to try cloning your job, Paul, and assuming I can reproduce the beah crash then there might be some clues in the local beah logs.
(In reply to comment #3) > Right now I don't have any other ideas. I am going to try cloning your job, > Paul, and assuming I can reproduce the beah crash then there might be some > clues in the local beah logs. I cloned job 409857 three times and they all ran successfully without hitting this error. So I think it must be very intermittent, or else there is some other key to reproducing it which I am missing. Paul, the next time you see this problem could you please grab the beah logs /var/log/beah* and output /mnt/testarea/beah* for me to have a look at?
Created attachment 775302 [details] beah.log
Created attachment 775303 [details] beah_beaker_backend.log
Created attachment 775304 [details] beah_forwarder_backend.log
Created attachment 775305 [details] mnt_testarea_beah_
Thanks for grabbing these logs, Paul. I was hoping something in there might shed more light but unfortunately not. I have checked the server logs at the time of the failure, the call which the harness attempted never reached the server. I've requested a copy of the LC logs for the same time period to check there too, but I suspect this was actually a network connectivity problem. It's being misreported as TypeError on RHEL4 (probably RHEL3 as well) due to the different versions of Python and Twisted. I will do some more digging to see if I can any network-related exceptions in the Python 2.3 standard library which could cause this TypeError when raised by Twisted.
Realistically the best way to fix the exception reporting is probably just to build Python 2.6 for RHEL4 and start running beah in that instead (like we do for RHEL3 already).
I've split out bug 986153 to track migrating to running beah in a separately installed Python 2.6 runtime for both RHEL 4 and 5 (as we already do for RHEL 3). It's not an especially elegant solution, but on the upside it does raise the possibility of upgrading to a more recent version of Twisted as well, which could improve beah's IPv6 story.
The issue with TypeError: exceptions must be classes, instances, or strings (deprecated), not type is fixed in beah 0.7.5 with this commit: https://git.beaker-project.org/cgit/beah/commit/?id=0304055c9a76a360b6e911ca76200e0fde8c75e9 However that was just masking the real exception, which in this case was probably also the same issue as bug 908354, namely that beah tasks would get into an inconsistent state depending on the order that processes are killed during reboot. That is also fixed in beah 0.7.6.