Bug 732998
Summary: | test_io failure during build on PPC | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Karsten Hopp <karsten> | ||||
Component: | python | Assignee: | Dave Malcolm <dmalcolm> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 16 | CC: | dmalcolm, ivazqueznet, jonathansteffan | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | ppc64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-10-28 10:22:57 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 718269 | ||||||
Attachments: |
|
Description
Karsten Hopp
2011-08-24 12:24:34 UTC
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278191 python-2.7.2-4.1.fc16 failed on ppc64: http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278200&name=build.log which shows: test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock which appears to be a timeout, due to an unhandled SIGALRM. http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278193 python3-3.2.1-2.1.fc16 failed on ppc: http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278201&name=build.log which shows: [152/346] test_io test test_io failed -- multiple errors occurred; run in verbose mode for details ... snip .. [157/346/1] test_itertools /var/tmp/rpm-tmp.gab4vH: line 32: 3533 Alarm clock LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -x $EXCLUDED_TESTS which again appears to be a timeout; this is happening whilst test_itertools is running, not test_io. So there appear to be two things going on here: (i) a genuine failure seen within test_io, on ppc, at least (ii) timeouts of the build (possibly a stuck test, but I'm not convinced about that) I'm investigating (i), and I've added --verbose3 to the invocation in python3 in rawhide to better identify failing tests in future. As for (ii), is there anything different about the ppc.koji.fedoraproject.org that might be setting up timeouts? I'm able to reliably run test_io on python on the ppc64 test chroot given at https://bugzilla.redhat.com/show_bug.cgi?id=698726#c7 which makes me think that perhaps there's some kind of external timer killing the build. (In reply to comment #1) > http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278191 > python-2.7.2-4.1.fc16 failed on ppc64: > http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278200&name=build.log > which shows: > test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock > which appears to be a timeout, due to an unhandled SIGALRM. Looks like I may have snipped too much here, a test _is_ failing with test_io for python, but it looks like the SIGALRM kills it. test_io /builddir/build/BUILD/Python-2.7.2/Lib/test/test_io.py:2613: DeprecationWarning: classic int division 1/0 test test_io failed -- multiple errors occurred; run in verbose mode for details I tried a scratch build of python3 for ppc with verbosity turned up: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280329 This shows 4 specific errors inside test_io: the 4 variants of check_interrupted_write_retry, which was introduced in: http://hg.python.org/cpython/rev/675b48c4c88a in response to: http://bugs.python.org/issue10956 Those tests (and others in that file) are indeed wiring up handlers to SIGALRM, and calling signal.alarm() which could explain why we're seeing the default glibc SIGALRM handler in the test logs ("Alarm clock") I'm not sure exactly what the issue is, though. I ran another scratch build, this time of python, with --verbose $ koji -s http://ppc.koji.fedoraproject.org/kojihub --weburl=http://ppc.koji.fedoraproject.org/koji build dist-f16 --scratch python-2.7.2-8.with_verbose_tests.fc17.src.rpm Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280359 This shows the same 4 tests failing as on the python3 scratch build in comment #2: http://ppc.koji.fedoraproject.org/koji/getfile?taskID=280360&name=build.log which is followed some time later by a SIGALRM - apparently during test_iterlen, though I suspect that some of the SIGALRM handling from test_io is "leaking", and killing that test. The specfile is testing the debug configuration, then would test the optimized build, but is erroring out on the failures on the debug configuration. I tried a scratch build without the debug configuration, just the optimized one: $ koji -s http://ppc.koji.fedoraproject.org/kojihub --weburl=http://ppc.koji.fedoraproject.org/koji build dist-f16 --scratch python-2.7.2-8.with_verbose_tests.without_debug_build.fc17.src.rpm Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280362 The same 4 tests within test_io failed with an optimized build: http://ppc.koji.fedoraproject.org/koji/getfile?taskID=280364&name=build.log [This leads me to be reassured that the problem is not due to the patch for bug 698726) On the grounds that something else might be broken, I tried a scratch build with test_io disabled. It successfully ran to completion on both ppc/ppc64: python-2.7.2-8.with_verbose_tests.without_debug_build.without_test_io.fc17.src.rpm Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280365 I then re-enabled test_io, and added some debug logging. python-2.7.2-8.with_verbose_tests.without_debug_build.with_verbose_test_io.2.fc17.src.rpm Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280371 It failed, logging some more information on the failures. All four failing tests are variants of: check_interrupted_write_retry with slightly different parameters, and with Python vs C implementations of the underlying io library. This test case sets a callback to respond to SIGALRM, calls signal.alarm(1), and writes 1MB of data to a pipe. AIUI, PIPE_MAX is 64k on Linux, so the pipe ought to fill, and the writing call ought to block, leading the SIGALRM to be delivered about 1 second later. The test case is written to respond to the SIGALRM by triggering another SIGALRM 1 second later, which it responds to by starting a reader thread, which reads from the other end of the pipe. However, what appears to be happening is that the first SIGALRM handler is never called; the write call runs to completion, and claims it wrote 1MB of data (this assumes that I wrote my logging correctly). The test case then tracebacks when it waits for the reader thread to finish, because the reader thread was never started. Created attachment 521111 [details] Log from test run, with verbosity turned up I tried another version, with a patch to turn up the verbosity, and to only join() if the other thread was started, which works OK (albeit very verbosely) on my local builds: python-2.7.2-8.with_verbose_tests.without_debug_build.with_verbose_test_io.4.fc17.src.rpm Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280422 Am attaching the log from the above, showing that indeed no data has been read from the pipes. In order to have an up-to-date ppc build, I'm applying a patch on ppc/ppc64 which disables those 4 test_io tests. Successful scratch build: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280473 Committed to "master" for python Rawhide/f17 as: http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commitdiff;h=e252525e39cf5e0755986490d367ddcf79430875 Building python-2.7.2-9.fc17 for dist-rawhide Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319638 (succeeded) Committed to "f16" for python as: http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commit;h=1e06160ce665a8500d35e8f7e736793187157e52 Building python-2.7.2-4.2.fc16 for f16-candidate Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319863 Committed to "master" for python3 Rawhide/f17 as: http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commitdiff;h=28391cb8f7ca7456115eb2a853f0b57775e7d1eb Building python3-3.2.1-7.fc17 for dist-rawhide Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319885 Merged to "f16" for python3 as: http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commit;h=72dcf3b6227ffeb687a202024d0d7749c1c5ea11 Building python3-3.2.1-2.2.fc16 for f16-candidate Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319887 python-2.7.2-4.2.fc16 and python3-3.2.1-2.2.fc16 built in F16 on PPC. http://ppc.koji.fedoraproject.org/mash/branched-20110902/ already includes these packages. Thanks Dave ! I'll leave this bugzilla open just in case you're still looking into the test_io failures. Feel free to close it if you're sure that the testcases were borked and it wasn't an issue in python itself. Marking as MODIFIED to note that we have up-to-date python and python3 builds with a workaround - albeit without a full analysis/fix of the issue. |