Bug 1045193 - python3 fails test_faulthandler test_gdb tests on aarch64
Summary: python3 fails test_faulthandler test_gdb tests on aarch64
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Siddhesh Poyarekar
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1045187
Blocks: ARM64, F-ExcludeArch-aarch64
TreeView+ depends on / blocked
 
Reported: 2013-12-19 20:33 UTC by Peter Robinson
Modified: 2019-08-14 22:19 UTC (History)
16 users (show)

Fixed In Version: python3-3.4.1-3.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-03 12:42:08 UTC


Attachments (Terms of Use)
Reproducer demonstrating broken behaviour of tcgetpgrp/tcsetpgrp and ioctl (415 bytes, text/plain)
2014-05-28 16:36 UTC, Bohuslav "Slavek" Kabrda
no flags Details


Links
System ID Priority Status Summary Last Updated
Python 21131 None None None Never

Description Peter Robinson 2013-12-19 20:33:15 UTC
python3-3.3.2-7.fc21 fails on aarch64 for the test_faulthandler test_gdb tests

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2189314

Ran 47 tests in 2.342s
OK (skipped=3)
343 tests OK.
2 tests failed:
    test_faulthandler test_gdb
2 tests altered the execution environment:
    test_site test_urllib2_localnet
26 tests skipped:
    test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll
    test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277
    test_smtpnet test_socketserver test_startfile test_systemtap
    test_timeout test_tk test_ttk_guionly test_unicode_file
    test_urllib2net test_urllibnet test_winreg test_winsound
    test_xmlrpc_net test_zipfile64
4 skips unexpected on linux:
    test_ioctl test_systemtap test_tk test_ttk_guionly
[2377399 refs]

Comment 1 Dave Malcolm 2013-12-19 21:29:20 UTC
blc gave me access to the build chroot.

For "test_gdb", I saw similar noise from gdb:
  "Failed to read a valid object file image from memory."
as seen in bug 1045187, with the pretty-printers appearing to otherwise be functioning normally.

Comment 2 Dave Malcolm 2013-12-19 21:35:52 UTC
test_faulthandler failed thusly according to the build logs in comment #0:

======================================================================
FAIL: test_register_chain (test.test_faulthandler.FaultHandlerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 588, in test_register_chain
    self.check_register(chain=True)
  File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 566, in check_register
    self.assertRegex(trace, regex)
AssertionError: Regex didn't match: '^Traceback \\(most recent call first\\):\n  File "<string>", line 7 in func\n  File "<string>", line 28 in <module>$' not found in 'Traceback (most recent call first):\n  File "<string>", line 7 in func\n  File "<string>", line 28 in <module>\npython: /builddir/build/BUILD/Python-3.3.2/Modules/gcmodule.c:332: update_refs: Assertion `gc->gc.gc_refs == (-3)\' failed.'
----------------------------------------------------------------------

However, on attempting to reproduce in the chroot, I get:

======================================================================
FAIL: test_register_chain (__main__.FaultHandlerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 588, in test_register_chain
    self.check_register(chain=True)
  File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 572, in check_register
    self.assertEqual(exitcode, 0)
AssertionError: -11 != 0

Comment 3 Brendan Conoboy 2013-12-19 22:11:57 UTC
The test_gdb case is covered in 1045187, so this BZ specifically concerns the test_faulthandler issue.

Comment 4 Peter Robinson 2014-02-03 08:11:41 UTC
python-2.7.5-11.fc21 built fine with the 3.13 kernel and gcc-4.8.2-14.fc21

Comment 5 Peter Robinson 2014-02-03 10:39:12 UTC
Closed the wrong bug.

Comment 6 Peter Robinson 2014-03-17 14:07:24 UTC
So still seeing the test_faulthandler issue.

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2254074

Ran 47 tests in 2.520s
OK (skipped=3)
344 tests OK.
1 test failed:
    test_faulthandler
2 tests altered the execution environment:
    test_site test_urllib2_localnet
26 tests skipped:
    test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll
    test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277
    test_smtpnet test_socketserver test_startfile test_systemtap
    test_timeout test_tk test_ttk_guionly test_unicode_file
    test_urllib2net test_urllibnet test_winreg test_winsound
    test_xmlrpc_net test_zipfile64
4 skips unexpected on linux:
    test_ioctl test_systemtap test_tk test_ttk_guionly
[2380528 refs]

Comment 7 Bohuslav "Slavek" Kabrda 2014-03-28 09:50:46 UTC
I don't have access to an aarch64 machine, so I can't debug this. Is there a possibility of getting a testing machine for this?

Comment 8 Peter Robinson 2014-03-28 09:56:47 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #7)
> I don't have access to an aarch64 machine, so I can't debug this. Is there a
> possibility of getting a testing machine for this?

I believe there's access to devices by beaker. Brendan can you confirm this with Bohuslav please?

Comment 9 Brendan Conoboy 2014-03-28 14:19:22 UTC
Bohuslav, hardware is available, I will send you information.

Comment 10 Bohuslav "Slavek" Kabrda 2014-04-02 09:19:12 UTC
I managed to track this down a bit and created upstream bug report - all the relevant information I've come up with so far are summarized there: http://bugs.python.org/issue21131

Comment 11 Peter Robinson 2014-04-25 16:37:11 UTC
Any status update on this? The last scratch build I tried I get the following on aarch64:

Ran 47 tests in 2.591s
OK (skipped=3)
343 tests OK.
2 tests failed:
    test_faulthandler test_sqlite
2 tests altered the execution environment:
    test_site test_urllib2_localnet
26 tests skipped:
    test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll
    test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277
    test_smtpnet test_socketserver test_startfile test_systemtap
    test_timeout test_tk test_ttk_guionly test_unicode_file
    test_urllib2net test_urllibnet test_winreg test_winsound
    test_xmlrpc_net test_zipfile64
4 skips unexpected on linux:
    test_ioctl test_systemtap test_tk test_ttk_guionly
[2373339 refs]

Comment 12 Brendan Conoboy 2014-05-05 18:12:33 UTC
Slavek, any update?

Comment 13 Bohuslav "Slavek" Kabrda 2014-05-06 09:50:27 UTC
Hi Brendan,

- the test_sqlite failure has already been solved upstream and the fix for it will be part of upcoming Python 3.4 (it's already being built in Koji side tag and will be merged for F21)
- as ofr the test_faulthandler, I think that noone actually knows what's going on there - not even upstream - I guess the best thing to do here would be disabling the test on aarch64. If that's ok with you, we'll do it for Python 3.4 which will eventually get merged to F21.

Comment 14 Brendan Conoboy 2014-05-06 22:16:19 UTC
If it's okay with you it's okay with me.

Comment 15 Bohuslav "Slavek" Kabrda 2014-05-07 13:21:17 UTC
(In reply to Brendan Conoboy from comment #14)
> If it's okay with you it's okay with me.

Good. I'll let you know when we merge Python 3.4 with the fix and disabled test into Koji Rawhide tag.

Comment 16 Peter Robinson 2014-05-07 16:04:39 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #15)
> (In reply to Brendan Conoboy from comment #14)
> > If it's okay with you it's okay with me.
> 
> Good. I'll let you know when we merge Python 3.4 with the fix and disabled
> test into Koji Rawhide tag.

What's the timeframe for 3.4 landing in rawhide main repos?

Comment 17 Bohuslav "Slavek" Kabrda 2014-05-13 09:36:20 UTC
My current guess is by the end of month. The problem is that although we announced on fedora-devel and python-devel mailing lists, most of the maintainers don't rebuild their packages.
While we don't require all packages to be rebuilt before we merge f21-python into Rawhide, we'd like to have at least the "important" ones (e.g. big frameworks, important build tools, etc) - that's a lot of packages to rebuild. So far it's been going fine, but we may still hit some obstacles, so I can't give you a better estimate right now, sorry.

Comment 18 Peter Robinson 2014-05-13 09:55:32 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #17)
> My current guess is by the end of month. The problem is that although we
> announced on fedora-devel and python-devel mailing lists, most of the
> maintainers don't rebuild their packages.
> While we don't require all packages to be rebuilt before we merge f21-python
> into Rawhide, we'd like to have at least the "important" ones (e.g. big
> frameworks, important build tools, etc) - that's a lot of packages to
> rebuild. So far it's been going fine, but we may still hit some obstacles,
> so I can't give you a better estimate right now, sorry.

The fact of the matter is what you have now is what you'll get from maintainers. You will need to do it yourself.... like pretty much all the other maintainers of the core of big stacks do. If you wait for them to do it the fact is it won't happen so just get on with it, the longer you wait the more in the main rawhide repos will change and the more screwed you'll be.

Comment 19 Bohuslav "Slavek" Kabrda 2014-05-13 10:12:54 UTC
(In reply to Peter Robinson from comment #18)
> (In reply to Bohuslav "Slavek" Kabrda from comment #17)
> > My current guess is by the end of month. The problem is that although we
> > announced on fedora-devel and python-devel mailing lists, most of the
> > maintainers don't rebuild their packages.
> > While we don't require all packages to be rebuilt before we merge f21-python
> > into Rawhide, we'd like to have at least the "important" ones (e.g. big
> > frameworks, important build tools, etc) - that's a lot of packages to
> > rebuild. So far it's been going fine, but we may still hit some obstacles,
> > so I can't give you a better estimate right now, sorry.
> 
> The fact of the matter is what you have now is what you'll get from
> maintainers. You will need to do it yourself.... like pretty much all the
> other maintainers of the core of big stacks do. If you wait for them to do
> it the fact is it won't happen so just get on with it, the longer you wait
> the more in the main rawhide repos will change and the more screwed you'll
> be.

I *am* doing it myself, that's exactly why I'm saying that I have a very bad time estimate.

Comment 20 Peter Robinson 2014-05-13 10:15:44 UTC
> I *am* doing it myself, that's exactly why I'm saying that I have a very bad
> time estimate.

Why aren't you using the mass rebuild scripts then and automating it? Ask the perl team what they use, or possibly the ruby team. There's ways and means to automate this

Comment 21 Bohuslav "Slavek" Kabrda 2014-05-13 10:55:54 UTC
(In reply to Peter Robinson from comment #20)
> > I *am* doing it myself, that's exactly why I'm saying that I have a very bad
> > time estimate.
> 
> Why aren't you using the mass rebuild scripts then and automating it? Ask
> the perl team what they use, or possibly the ruby team. There's ways and
> means to automate this

There are also many circular dependencies in Python stack, which means automated script aren't much help until certain packages are rebuilt. Once I manage to rebuild these, I'll use automated scripts.

Comment 22 Bohuslav "Slavek" Kabrda 2014-05-15 05:42:37 UTC
Just BTW, Dennis Gilmore announced that relengs will be merging all Koji side tags on 2014-05-26 [1], because of Fedora mass rebuild, so if we don't manage to do it sooner, this is the date.

[1] https://lists.fedoraproject.org/pipermail/devel-announce/2014-May/001404.html

Comment 23 Bohuslav "Slavek" Kabrda 2014-05-27 14:18:59 UTC
f21-python has just been merged to rawhide. Could you please re-test with python3-3.4.1-3.fc21?

Comment 24 Peter Robinson 2014-05-28 05:09:22 UTC
Retested, we've regressed. 4 test failures. Because of all the noarch packages built against 3.4 this is now blocking _ALL_ aarch64 builds and is now critical path. 

Ran 47 tests in 2.364s
OK (skipped=3)
358 tests OK.
4 tests failed:
    test_ensurepip test_faulthandler test_os test_venv
1 test altered the execution environment:
    test_site
26 tests skipped:
    test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll
    test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277
    test_smtpnet test_socketserver test_startfile test_systemtap
    test_timeout test_tk test_ttk_guionly test_unicode_file
    test_urllib2net test_urllibnet test_winreg test_winsound
    test_xmlrpc_net test_zipfile64
error: Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check)
    Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check)
RPM build errors:
Child return code was: 1

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2356557

Comment 25 Bohuslav "Slavek" Kabrda 2014-05-28 07:29:53 UTC
(In reply to Peter Robinson from comment #24)
> Retested, we've regressed. 4 test failures. Because of all the noarch
> packages built against 3.4 this is now blocking _ALL_ aarch64 builds and is
> now critical path. 
> 
> Ran 47 tests in 2.364s
> OK (skipped=3)
> 358 tests OK.
> 4 tests failed:
>     test_ensurepip test_faulthandler test_os test_venv

I've disabled test_faulthandler for now, it's reported in the linked Python upstream issue.
As for the others:
- test_ensurepip and test_venv are caused by the same root issue - Python doesn't find python3-pip and/or python3-setuptools package(s) where it should, I don't know why right now
- test_os failure seems to be new, I'll need to investigate

I'm working on this right now with the highest priority. I'll let you know as soon as I figure something out.

> 1 test altered the execution environment:
>     test_site
> 26 tests skipped:
>     test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
>     test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll
>     test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277
>     test_smtpnet test_socketserver test_startfile test_systemtap
>     test_timeout test_tk test_ttk_guionly test_unicode_file
>     test_urllib2net test_urllibnet test_winreg test_winsound
>     test_xmlrpc_net test_zipfile64
> error: Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check)
>     Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check)
> RPM build errors:
> Child return code was: 1
> 
> http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2356557

Comment 26 Bohuslav "Slavek" Kabrda 2014-05-28 16:34:40 UTC
So, it seems that the test_os problem is caused by a regression in glibc - I tested with older glibc-2.18.90-17.fc21.aarch64 and it doesn't fail compared to current glibc-2.19.90-18.fc21.aarch64, where it fails.

gdb debugging gave me this:
- the test calls os.tcgetpgrp() with an invalid file descriptor (*) (this is done on purpose to test proper errno)
- this calls cPython's posix_tcgetpgrp() function, which is just a simple wrapper over tcgetpgrp()
- tcgetpgrp() returns -1, but errno is still set to 0 (but should be set to EBADF), which makes the test fail (and is actually a bug)

I'm ccing main glibc maintainer - could you please have a look at this? It seems that this is affecting not only tcgetpgrp, but also tcsetpgrp and ioctl. Since (if I'm not mistaken) tcgetpgrp and tcsetpgrp make underlying calls to ioctl, my guess is that the bug is actually just in one place - ioctl (not sure though).

I'm attaching a reproducer that demonstrates this behaviour when used with glibc-2.19.90-18.fc21.aarch64.

(*) invalid is, in this case, a descriptor of a closed file

Comment 27 Bohuslav "Slavek" Kabrda 2014-05-28 16:36:09 UTC
Created attachment 900077 [details]
Reproducer demonstrating broken behaviour of tcgetpgrp/tcsetpgrp and ioctl

Comment 28 Carlos O'Donell 2014-05-28 23:52:55 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #26)
> gdb debugging gave me this:
> - the test calls os.tcgetpgrp() with an invalid file descriptor (*) (this is
> done on purpose to test proper errno)
> - this calls cPython's posix_tcgetpgrp() function, which is just a simple
> wrapper over tcgetpgrp()
> - tcgetpgrp() returns -1, but errno is still set to 0 (but should be set to
> EBADF), which makes the test fail (and is actually a bug)

The tcgetpgrp() function is a thin wrapper around the __ioctl() function which is itself a syscall wrapper.

If downgrading glibc fixes the issue then it's more likely a problem with the syscall wrapper than the kernel (which I assume remained constant and is returning the right errno).

In the -18 release we pulled in Richard Henderson's changes to sysdep.h and that is likely the problem.

Richard, Would you be able to have a look at this?

Comment 29 Peter Robinson 2014-05-29 15:59:27 UTC
For reference we're seeing the same issue with python2 
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2357787

Comment 30 Matej Stuchlik 2014-06-02 08:49:34 UTC
(In reply to Peter Robinson from comment #24)
> Retested, we've regressed. 4 test failures. Because of all the noarch
> packages built against 3.4 this is now blocking _ALL_ aarch64 builds and is
> now critical path. 
> 
> Ran 47 tests in 2.364s
> OK (skipped=3)
> 358 tests OK.
> 4 tests failed:
>     test_ensurepip test_faulthandler test_os test_venv
> 1 test altered the execution environment:
>     test_site
> 26 tests skipped:
>     test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
>     test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll
>     test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277
>     test_smtpnet test_socketserver test_startfile test_systemtap
>     test_timeout test_tk test_ttk_guionly test_unicode_file
>     test_urllib2net test_urllibnet test_winreg test_winsound
>     test_xmlrpc_net test_zipfile64
> error: Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check)
>     Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check)
> RPM build errors:
> Child return code was: 1
> 
> http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2356557

The original test_venv failure was likely caused by python-pip being built against Python 3.3 in your repo, rewheel therefore couldn't find it and it failed. It appears you have since rebuilt pip against Python 3.4, so that is no longer an issue, however test_venv still fails because the rewheel patch was not updated to reflect changed directory structure in Lib/ensurepip, I'll fix that today.

I've also added a note about how to bootstrap python 3.4 with the rewheel module to Python3 spec, hopefully that should help us avoid the initial issue with test_venv in the future.

Comment 31 Richard Henderson 2014-06-02 18:32:30 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #27)
> Created attachment 900077 [details]
> Reproducer demonstrating broken behaviour of tcgetpgrp/tcsetpgrp and ioctl

Presumably you meant

-  pid_t pgid = tcgetpgrp((int)fp);
+  pid_t pgid = tcgetpgrp(fileno(fp));

But either way, with glibc-2.17-55.9.sa1.3.aarch64 I get

$ ./a.out
tcgetpgrp returned: -1
errno is: 9

ioctl returned: -1
errno is: 9

which appears to be exactly what you were looking for.
It's certainly the same results as I get on x86_64.

Comment 32 Richard Henderson 2014-06-02 18:36:07 UTC
(In reply to Carlos O'Donell from comment #28)
> In the -18 release we pulled in Richard Henderson's changes to sysdep.h and
> that is likely the problem.

Pardon?  The -18 release was

* Wed Jul 31 2013 Siddhesh Poyarekar <siddhesh@redhat.com> - 2.17-18

Further, I have yet to push my sysdep.h changes to any RH branch, so
I'm not really certain which patches to which you are referring...

Comment 33 Carlos O'Donell 2014-06-02 18:54:41 UTC
(In reply to Richard Henderson from comment #32)
> (In reply to Carlos O'Donell from comment #28)
> > In the -18 release we pulled in Richard Henderson's changes to sysdep.h and
> > that is likely the problem.
> 
> Pardon?  The -18 release was
> 
> * Wed Jul 31 2013 Siddhesh Poyarekar <siddhesh@redhat.com> - 2.17-18
> 
> Further, I have yet to push my sysdep.h changes to any RH branch, so
> I'm not really certain which patches to which you are referring...

Keep in mind this is rawhide, and -18 is:

* Mon May 26 2014 Siddhesh Poyarekar <siddhesh@redhat.com> - 2.19.90-18
- Sync with upstream master.
- Adjust rtkaio patches to build with upstream master.

The glibc team rebases the rawhide branches against upstream master on a weekly basis. Therefore you need not do anything to get your upstream patches into rawhide. I noted that -18 was the upstream rebase release which included your changes, and noted that those changes touched code in the errno handling paths. I haven't debugged any further.

Does that clarify the situation?

Comment 34 Richard Henderson 2014-06-02 19:09:08 UTC
Ah, wonderful.  We're now on the same page.

And yes, I broke ioctl here:

ca3cfa40c16ef34c74951a07a57cfcbcd58898b1

committed on May 25th, and fixed it here:

74f31c18593111725478a991b395ae45661985a3

committed on May 30th, which is after the 2.19.90-18 revision cited.
So in theory everything should be fixed in the next pull.

Comment 35 Peter Robinson 2014-06-02 19:15:34 UTC
> committed on May 30th, which is after the 2.19.90-18 revision cited.
> So in theory everything should be fixed in the next pull.

Can we expedite that pull? It's currently blocking all aarch64 builds

Comment 36 Carlos O'Donell 2014-06-02 19:49:29 UTC
(In reply to Peter Robinson from comment #35)
> > committed on May 30th, which is after the 2.19.90-18 revision cited.
> > So in theory everything should be fixed in the next pull.
> 
> Can we expedite that pull? It's currently blocking all aarch64 builds

I'm assigning to Siddhesh. We'll do the rawhide update on Wednesday after which a rebuild of python3 should just work. We'll work in the background to see if we can get this done sooner by promoting rth to packager so he can do it himself.

Comment 37 Siddhesh Poyarekar 2014-06-03 12:42:08 UTC
glibc rawhide has now been rebased to upstream master.

Comment 38 Victor Stinner 2019-08-14 22:19:07 UTC
This bug has been reported upstream and I just fixed it: https://bugs.python.org/issue21131 "test_faulthandler.test_register_chain fails on 64bit ppc/arm with kernel >= 3.10" So ppc64 is also affected, not only ARM.

It was a bug in the size of the stack allocated by faulthandler for its signal handlers. The bug depends on the CPU model and the FPU state size: faulthandler uses a too small stack.


Note You need to log in before you can comment on or make changes to this bug.