Bug 732998 - test_io failure during build on PPC
Summary: test_io failure during build on PPC
Alias: None
Product: Fedora
Classification: Fedora
Component: python
Version: 16
Hardware: ppc64
OS: Linux
Target Milestone: ---
Assignee: Dave Malcolm
QA Contact: Fedora Extras Quality Assurance
Depends On:
Blocks: F16Alphappc
TreeView+ depends on / blocked
Reported: 2011-08-24 12:24 UTC by Karsten Hopp
Modified: 2012-03-14 13:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-10-28 10:22:57 UTC
Type: ---

Attachments (Terms of Use)
Log from test run, with verbosity turned up (5.01 KB, text/plain)
2011-09-01 22:27 UTC, Dave Malcolm
no flags Details

Description Karsten Hopp 2011-08-24 12:24:34 UTC
Description of problem:
python and python3 both fail during the test_io self check:

test test_io failed -- multiple errors occurred; run in verbose mode for details
Re-running test 'test_io' in verbose mode
test_BufferedIOBase_destructor (test.test_io.CIOTest) ... ok
test_IOBase_destructor (test.test_io.CIOTest) ... ok
test_RawIOBase_destructor (test.test_io.CIOTest) ... ok
test_RawIOBase_read (test.test_io.CIOTest) ... ok
test_TextIOBase_destructor (test.test_io.CIOTest) ... ok
test_append_mode_tell (test.test_io.CIOTest) ... ok
test_array_writes (test.test_io.CIOTest) ... ok
test_buffered_file_io (test.test_io.CIOTest) ... ok
test_close_flushes (test.test_io.CIOTest) ... ok
test_closefd (test.test_io.CIOTest) ... ok
test_closefd_attr (test.test_io.CIOTest) ... ok
test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock

test test_io failed -- multiple errors occurred; run in verbose mode for details
[153/346/1] test_ioctl
test_ioctl skipped -- Neither the process group nor the session are attached to /dev/tty
[154/346/1] test_isinstance
[155/346/1] test_iter
[156/346/1] test_iterlen
[157/346/1] test_itertools
/var/tmp/rpm-tmp.gab4vH: line 32:  3533 Alarm clock             LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -x $EXCLUDED_TESTS

Actual results:

Comment 1 Dave Malcolm 2011-08-31 19:24:35 UTC
  python-2.7.2-4.1.fc16 failed on ppc64:
  which shows:
    test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock
  which appears to be a timeout, due to an unhandled SIGALRM.

  python3-3.2.1-2.1.fc16 failed on ppc:
  which shows:
    [152/346] test_io
    test test_io failed -- multiple errors occurred; run in verbose mode for details
    ... snip ..
    [157/346/1] test_itertools
    /var/tmp/rpm-tmp.gab4vH: line 32:  3533 Alarm clock             LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -x $EXCLUDED_TESTS
  which again appears to be a timeout; this is happening whilst test_itertools is running, not test_io.

So there appear to be two things going on here:
  (i) a genuine failure seen within test_io, on ppc, at least
  (ii) timeouts of the build (possibly a stuck test, but I'm not convinced about that)

I'm investigating (i), and I've added --verbose3 to the invocation in python3 in rawhide to better identify failing tests in future.

As for (ii), is there anything different about the ppc.koji.fedoraproject.org that might be setting up timeouts?

I'm able to reliably run test_io on python on the ppc64 test chroot given at 
which makes me think that perhaps there's some kind of external timer killing the build.

Comment 2 Dave Malcolm 2011-08-31 23:04:26 UTC
(In reply to comment #1)
> http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278191
>   python-2.7.2-4.1.fc16 failed on ppc64:
>     http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278200&name=build.log
>   which shows:
>     test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock
>   which appears to be a timeout, due to an unhandled SIGALRM.

Looks like I may have snipped too much here, a test _is_ failing with test_io for python, but it looks like the SIGALRM kills it.
/builddir/build/BUILD/Python-2.7.2/Lib/test/test_io.py:2613: DeprecationWarning: classic int division
test test_io failed -- multiple errors occurred; run in verbose mode for details

I tried a scratch build of python3 for ppc with verbosity turned up:

This shows 4 specific errors inside test_io: the 4 variants of check_interrupted_write_retry, which was introduced in:
in response to:

Those tests (and others in that file) are indeed wiring up handlers to SIGALRM, and calling
which could explain why we're seeing the default glibc SIGALRM handler in the test logs ("Alarm clock")

I'm not sure exactly what the issue is, though.

Comment 3 Dave Malcolm 2011-08-31 23:40:28 UTC
I ran another scratch build, this time of python, with --verbose
$ koji -s http://ppc.koji.fedoraproject.org/kojihub --weburl=http://ppc.koji.fedoraproject.org/koji build dist-f16 --scratch python-2.7.2-8.with_verbose_tests.fc17.src.rpm 
Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280359

This shows the same 4 tests failing as on the python3 scratch build in comment #2:
which is followed some time later by a SIGALRM - apparently during test_iterlen, though I suspect that some of the SIGALRM handling from test_io is "leaking", and killing that test.

Comment 4 Dave Malcolm 2011-09-01 00:02:57 UTC
The specfile is testing the debug configuration, then would test the optimized build, but is erroring out on the failures on the debug configuration.

I tried a scratch build without the debug configuration, just the optimized one:
$ koji -s http://ppc.koji.fedoraproject.org/kojihub --weburl=http://ppc.koji.fedoraproject.org/koji build dist-f16 --scratch python-2.7.2-8.with_verbose_tests.without_debug_build.fc17.src.rpm 
Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280362

The same 4 tests within test_io failed with an optimized build:

[This leads me to be reassured that the problem is not due to the patch for bug 698726)

Comment 5 Dave Malcolm 2011-09-01 15:31:39 UTC
On the grounds that something else might be broken, I tried a scratch build with test_io disabled.

It successfully ran to completion on both ppc/ppc64:
  Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280365

I then re-enabled test_io, and added some debug logging.
  Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280371

It failed, logging some more information on the failures.

All four failing tests are variants of:
with slightly different parameters, and with Python vs C implementations of the underlying io library.

This test case sets a callback to respond to SIGALRM, calls signal.alarm(1), and writes 1MB of data to a pipe.  AIUI, PIPE_MAX is 64k on Linux, so the pipe ought to fill, and the writing call ought to block, leading the SIGALRM to be delivered about 1 second later.

The test case is written to respond to the SIGALRM by triggering another SIGALRM 1 second later, which it responds to by starting a reader thread, which reads from the other end of the pipe.

However, what appears to be happening is that the first SIGALRM handler is never called; the write call runs to completion, and claims it wrote 1MB of data (this assumes that I wrote my logging correctly).

The test case then tracebacks when it waits for the reader thread to finish, because the reader thread was never started.

Comment 6 Dave Malcolm 2011-09-01 22:27:17 UTC
Created attachment 521111 [details]
Log from test run, with verbosity turned up

I tried another version, with a patch to turn up the verbosity, and to only join() if the other thread was started, which works OK (albeit very verbosely) on my local builds:
Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280422

Am attaching the log from the above, showing that indeed no data has been read from the pipes.

Comment 7 Dave Malcolm 2011-09-01 22:27:47 UTC
In order to have an up-to-date ppc build, I'm applying a patch on ppc/ppc64 which disables those 4 test_io tests.

Successful scratch build:

Committed to "master" for python Rawhide/f17 as:
  Building python-2.7.2-9.fc17 for dist-rawhide
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319638 (succeeded)

Committed to "f16" for python as:
  Building python-2.7.2-4.2.fc16 for f16-candidate
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319863

Comment 8 Dave Malcolm 2011-09-01 22:50:47 UTC
Committed to "master" for python3 Rawhide/f17 as:
  Building python3-3.2.1-7.fc17 for dist-rawhide
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319885

Merged to "f16" for python3 as:
  Building python3-3.2.1-2.2.fc16 for f16-candidate
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319887

Comment 9 Karsten Hopp 2011-09-02 10:45:52 UTC
python-2.7.2-4.2.fc16 and python3-3.2.1-2.2.fc16 built in F16 on PPC.
http://ppc.koji.fedoraproject.org/mash/branched-20110902/ already includes these packages.

Thanks Dave !
I'll leave this bugzilla open just in case you're still looking into the test_io failures. Feel free to close it if you're sure that the testcases were borked and it wasn't an issue in python itself.

Comment 10 Dave Malcolm 2011-09-02 14:12:28 UTC
Marking as MODIFIED to note that we have up-to-date python and python3 builds with a workaround - albeit without a full analysis/fix of the issue.

Note You need to log in before you can comment on or make changes to this bug.