Bug 732998 - test_io failure during build on PPC
test_io failure during build on PPC
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: python (Show other bugs)
16
ppc64 Linux
high Severity high
: ---
: ---
Assigned To: Dave Malcolm
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F16Alphappc
  Show dependency treegraph
 
Reported: 2011-08-24 08:24 EDT by Karsten Hopp
Modified: 2012-03-14 09:06 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-10-28 06:22:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log from test run, with verbosity turned up (5.01 KB, text/plain)
2011-09-01 18:27 EDT, Dave Malcolm
no flags Details

  None (edit)
Description Karsten Hopp 2011-08-24 08:24:34 EDT
Description of problem:
python and python3 both fail during the test_io self check:

python-2.7.2-4.1.fc16:
test test_io failed -- multiple errors occurred; run in verbose mode for details
Re-running test 'test_io' in verbose mode
test_BufferedIOBase_destructor (test.test_io.CIOTest) ... ok
test_IOBase_destructor (test.test_io.CIOTest) ... ok
test_RawIOBase_destructor (test.test_io.CIOTest) ... ok
test_RawIOBase_read (test.test_io.CIOTest) ... ok
test_TextIOBase_destructor (test.test_io.CIOTest) ... ok
test_append_mode_tell (test.test_io.CIOTest) ... ok
test_array_writes (test.test_io.CIOTest) ... ok
test_buffered_file_io (test.test_io.CIOTest) ... ok
test_close_flushes (test.test_io.CIOTest) ... ok
test_closefd (test.test_io.CIOTest) ... ok
test_closefd_attr (test.test_io.CIOTest) ... ok
test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock

python3-3.2.1-2.1.fc16:
test test_io failed -- multiple errors occurred; run in verbose mode for details
[153/346/1] test_ioctl
test_ioctl skipped -- Neither the process group nor the session are attached to /dev/tty
[154/346/1] test_isinstance
[155/346/1] test_iter
[156/346/1] test_iterlen
[157/346/1] test_itertools
/var/tmp/rpm-tmp.gab4vH: line 32:  3533 Alarm clock             LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -x $EXCLUDED_TESTS

  
Actual results:
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278191
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278193
Comment 1 Dave Malcolm 2011-08-31 15:24:35 EDT
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278191
  python-2.7.2-4.1.fc16 failed on ppc64:
    http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278200&name=build.log
  which shows:
    test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock
  which appears to be a timeout, due to an unhandled SIGALRM.

http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278193
  python3-3.2.1-2.1.fc16 failed on ppc:
    http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278201&name=build.log
  which shows:
    [152/346] test_io
    test test_io failed -- multiple errors occurred; run in verbose mode for details
    ... snip ..
    [157/346/1] test_itertools
    /var/tmp/rpm-tmp.gab4vH: line 32:  3533 Alarm clock             LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -x $EXCLUDED_TESTS
  which again appears to be a timeout; this is happening whilst test_itertools is running, not test_io.

So there appear to be two things going on here:
  (i) a genuine failure seen within test_io, on ppc, at least
  (ii) timeouts of the build (possibly a stuck test, but I'm not convinced about that)

I'm investigating (i), and I've added --verbose3 to the invocation in python3 in rawhide to better identify failing tests in future.

As for (ii), is there anything different about the ppc.koji.fedoraproject.org that might be setting up timeouts?

I'm able to reliably run test_io on python on the ppc64 test chroot given at 
  https://bugzilla.redhat.com/show_bug.cgi?id=698726#c7
which makes me think that perhaps there's some kind of external timer killing the build.
Comment 2 Dave Malcolm 2011-08-31 19:04:26 EDT
(In reply to comment #1)
> http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=278191
>   python-2.7.2-4.1.fc16 failed on ppc64:
>     http://ppc.koji.fedoraproject.org/koji/getfile?taskID=278200&name=build.log
>   which shows:
>     test_destructor (test.test_io.CIOTest) ... make: *** [test] Alarm clock
>   which appears to be a timeout, due to an unhandled SIGALRM.

Looks like I may have snipped too much here, a test _is_ failing with test_io for python, but it looks like the SIGALRM kills it.
test_io
/builddir/build/BUILD/Python-2.7.2/Lib/test/test_io.py:2613: DeprecationWarning: classic int division
  1/0
test test_io failed -- multiple errors occurred; run in verbose mode for details

I tried a scratch build of python3 for ppc with verbosity turned up:
  http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280329

This shows 4 specific errors inside test_io: the 4 variants of check_interrupted_write_retry, which was introduced in:
  http://hg.python.org/cpython/rev/675b48c4c88a
in response to:
  http://bugs.python.org/issue10956

Those tests (and others in that file) are indeed wiring up handlers to SIGALRM, and calling
  signal.alarm()
which could explain why we're seeing the default glibc SIGALRM handler in the test logs ("Alarm clock")

I'm not sure exactly what the issue is, though.
Comment 3 Dave Malcolm 2011-08-31 19:40:28 EDT
I ran another scratch build, this time of python, with --verbose
$ koji -s http://ppc.koji.fedoraproject.org/kojihub --weburl=http://ppc.koji.fedoraproject.org/koji build dist-f16 --scratch python-2.7.2-8.with_verbose_tests.fc17.src.rpm 
Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280359

This shows the same 4 tests failing as on the python3 scratch build in comment #2:
  http://ppc.koji.fedoraproject.org/koji/getfile?taskID=280360&name=build.log
which is followed some time later by a SIGALRM - apparently during test_iterlen, though I suspect that some of the SIGALRM handling from test_io is "leaking", and killing that test.
Comment 4 Dave Malcolm 2011-08-31 20:02:57 EDT
The specfile is testing the debug configuration, then would test the optimized build, but is erroring out on the failures on the debug configuration.

I tried a scratch build without the debug configuration, just the optimized one:
$ koji -s http://ppc.koji.fedoraproject.org/kojihub --weburl=http://ppc.koji.fedoraproject.org/koji build dist-f16 --scratch python-2.7.2-8.with_verbose_tests.without_debug_build.fc17.src.rpm 
Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280362

The same 4 tests within test_io failed with an optimized build:
  http://ppc.koji.fedoraproject.org/koji/getfile?taskID=280364&name=build.log

[This leads me to be reassured that the problem is not due to the patch for bug 698726)
Comment 5 Dave Malcolm 2011-09-01 11:31:39 EDT
On the grounds that something else might be broken, I tried a scratch build with test_io disabled.

It successfully ran to completion on both ppc/ppc64:
  python-2.7.2-8.with_verbose_tests.without_debug_build.without_test_io.fc17.src.rpm
  Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280365

I then re-enabled test_io, and added some debug logging.
  python-2.7.2-8.with_verbose_tests.without_debug_build.with_verbose_test_io.2.fc17.src.rpm
  Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280371

It failed, logging some more information on the failures.

All four failing tests are variants of:
   check_interrupted_write_retry
with slightly different parameters, and with Python vs C implementations of the underlying io library.

This test case sets a callback to respond to SIGALRM, calls signal.alarm(1), and writes 1MB of data to a pipe.  AIUI, PIPE_MAX is 64k on Linux, so the pipe ought to fill, and the writing call ought to block, leading the SIGALRM to be delivered about 1 second later.

The test case is written to respond to the SIGALRM by triggering another SIGALRM 1 second later, which it responds to by starting a reader thread, which reads from the other end of the pipe.

However, what appears to be happening is that the first SIGALRM handler is never called; the write call runs to completion, and claims it wrote 1MB of data (this assumes that I wrote my logging correctly).

The test case then tracebacks when it waits for the reader thread to finish, because the reader thread was never started.
Comment 6 Dave Malcolm 2011-09-01 18:27:17 EDT
Created attachment 521111 [details]
Log from test run, with verbosity turned up

I tried another version, with a patch to turn up the verbosity, and to only join() if the other thread was started, which works OK (albeit very verbosely) on my local builds:
python-2.7.2-8.with_verbose_tests.without_debug_build.with_verbose_test_io.4.fc17.src.rpm
Task info: http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280422

Am attaching the log from the above, showing that indeed no data has been read from the pipes.
Comment 7 Dave Malcolm 2011-09-01 18:27:47 EDT
In order to have an up-to-date ppc build, I'm applying a patch on ppc/ppc64 which disables those 4 test_io tests.

Successful scratch build:
  http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=280473

Committed to "master" for python Rawhide/f17 as:
  http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commitdiff;h=e252525e39cf5e0755986490d367ddcf79430875
  Building python-2.7.2-9.fc17 for dist-rawhide
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319638 (succeeded)

Committed to "f16" for python as:
  http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commit;h=1e06160ce665a8500d35e8f7e736793187157e52
  Building python-2.7.2-4.2.fc16 for f16-candidate
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319863
Comment 8 Dave Malcolm 2011-09-01 18:50:47 EDT
Committed to "master" for python3 Rawhide/f17 as:
  http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commitdiff;h=28391cb8f7ca7456115eb2a853f0b57775e7d1eb
  Building python3-3.2.1-7.fc17 for dist-rawhide
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319885

Merged to "f16" for python3 as:
  http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commit;h=72dcf3b6227ffeb687a202024d0d7749c1c5ea11
  Building python3-3.2.1-2.2.fc16 for f16-candidate
  Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3319887
Comment 9 Karsten Hopp 2011-09-02 06:45:52 EDT
python-2.7.2-4.2.fc16 and python3-3.2.1-2.2.fc16 built in F16 on PPC.
http://ppc.koji.fedoraproject.org/mash/branched-20110902/ already includes these packages.

Thanks Dave !
I'll leave this bugzilla open just in case you're still looking into the test_io failures. Feel free to close it if you're sure that the testcases were borked and it wasn't an issue in python itself.
Comment 10 Dave Malcolm 2011-09-02 10:12:28 EDT
Marking as MODIFIED to note that we have up-to-date python and python3 builds with a workaround - albeit without a full analysis/fix of the issue.

Note You need to log in before you can comment on or make changes to this bug.