Hide Forgot
Created attachment 439024 [details] List of packages David Malcolm requested. Description of problem: When doing something like reposync -c fedora.repo --repoid=fedora-source-13 --source After the last package downloads I get a supposed 'core dump' [fedora-source-13: 9152 of 9152 ] Downloading zzuf-0.13-1.fc13.src.rpm zzuf-0.13-1.fc13.src.rpm | 456 kB 00:00 Fatal Python error: deallocating None Aborted (core dumped) No core seems to be dumped though.. not sure why. Version-Release number of selected component (if applicable): # rpm -qf /usr/bin/reposync /usr/bin/python yum-utils-1.1.26-11.el6.noarch python-2.6.5-3.el6.x86_64 How reproducible: 80% with EL6 beta2. Large downloads cause it to occur often. Short 10-30 package ones it does not occur.
Something is messing up the reference counts for the "None" singleton: either libpython, or an extension module; my guess is an extension module. This is the tp_dealloc for PyNone_Type: static void none_dealloc(PyObject* ignore) { /* This should never get called, but we also don't want to SEGV if * we accidentally decref None out of existence. */ Py_FatalError("deallocating None"); }
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. **
The None singleton normally has a fairly large reference count. For example, starting a python process under gdb shows this refcount at the first interactive prompt: (gdb) p _Py_NoneStruct $2 = {ob_refcnt = 877, ob_type = 0x77f2a0} What I suspect is happening is that something isn't doing a Py_INCREF when it should on None, and so when you have enough of these in a loop, None ends up with a lower refcount than it should. At some point your presumably issuing lots of Py_DECREF on the singleton, and this drives it below one, hitting the Py_FatalError. My hope is that a backtrace might suggest the rough location of the bug (by analogy between the cleanup vs the setup)
Having a look at the obvious candidate, pycurl. I see that the .reset() function does: /* Last, free the options */ for (i = 0; i < OPTIONS_SIZE; i++) { if (self->options[i] != NULL) { free(self->options[i]); self->options[i] = NULL; } } return Py_None; } ...unlike every other usage which is the (I assume) correct: Py_INCREF(Py_None); return Py_None; ...urlgrabber calls .reset() for every download.
(In reply to comment #5) > Having a look at the obvious candidate, pycurl. I see that the .reset() > function does: [snip] > ...urlgrabber calls .reset() for every download. Thanks! Yes: a trivial reproducer: >>> import pycurl >>> c = pycurl.Curl() >>> while True: c.reset() ... Fatal Python error: deallocating None Aborted (core dumped) This is with: python-pycurl-7.19.0-5.el6.x86_64
Fixed upstream in http://pycurl.cvs.sourceforge.net/viewvc/pycurl/pycurl/src/pycurl.c?r1=1.148&r2=1.149
Created attachment 439117 [details] One-liner version of the patch, plus a test case
I've attached a one-liner version of the patch, plus a test case. Without this patch, it's trivial to crash python: >>> import pycurl >>> c = pycurl.Curl() >>> for i in xrange(20000): ... c.reset() ... Fatal Python error: deallocating None Aborted (core dumped) With the patch, the test case passes (it takes a minute or so to run; we could tone down the number of iterations from 100000 to 20000 if that's an issue, but I don't think it should be).
urlgrabber's PyCurlFileObject calls "reset" in its _do_open method, so the reference issue happens once per file downloaded by yum. So it appears that a "yum update" or "yum install" that downloads many thousands of packages will trip this error, but N may need to be very high. Testing "debuginfo-install" with a breakpoint on do_curl_reset shows: (gdb) p _Py_NoneStruct $1 = {ob_refcnt = 23146, ob_type = 0x3b73385ca0} Having said that, I'm not sure about the structure of the reference graph keeping ob_refcnt that high
Fatal Python error: deallocating None Program received signal SIGABRT, Aborted. 0x0000003fa2e329c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-8.el6.x86_64 elfutils-libelf-0.148-1.el6.x86_64 file-libs-5.04-5.el6.x86_64 glib2-2.22.5-5.el6.x86_64 gpgme-1.1.8-3.el6.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.8.2-2.el6.x86_64 libacl-2.2.49-4.el6.x86_64 libattr-2.4.44-4.el6.x86_64 libcap-2.16-5.2.el6.x86_64 libcom_err-1.41.12-3.el6.x86_64 libcurl-7.19.7-16.el6.x86_64 libgcc-4.4.4-13.el6.x86_64 libgpg-error-1.7-3.el6.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-2.el6.x86_64 libssh2-1.2.2-7.el6.x86_64 libxml2-2.7.6-1.el6.x86_64 lua-5.1.4-4.1.el6.x86_64 nspr-4.8.4-2.el6.x86_64 nss-3.12.6-3.el6.x86_64 nss-softokn-3.12.4-19.el6.x86_64 nss-softokn-freebl-3.12.4-19.el6.x86_64 nss-util-3.12.6-1.el6.x86_64 openldap-2.4.19-15.el6.x86_64 popt-1.13-7.el6.x86_64 pygpgme-0.1-18.20090824bzr68.el6.x86_64 python-pycurl-7.19.0-5.el6.x86_64 rpm-libs-4.8.0-12.el6.x86_64 rpm-python-4.8.0-12.el6.x86_64 xz-libs-4.999.9-0.3.beta.20091007git.el6.x86_64 yum-metadata-parser-1.1.2-14.1.el6.x86_64 (gdb) where #0 0x0000003fa2e329c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003fa2e341a5 in abort () at abort.c:92 #2 0x0000003fa52fad7e in Py_FatalError (msg=<value optimized out>) at Python/pythonrun.c:1661 #3 0x0000003fa528124b in PyDict_Clear (op=<value optimized out>) at Objects/dictobject.c:817 #4 0x0000003fa52f2f80 in PyImport_Cleanup () at Python/import.c:516 #5 0x0000003fa52fc3eb in Py_Finalize () at Python/pythonrun.c:438 #6 0x0000003fa5308ce8 in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:596 #7 0x0000003fa2e1ec5d in __libc_start_main (main=0x400710 <main>, argc=6, ubp_av=0x7fffffffe658, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffffffe648) at libc-start.c:226 #8 0x0000000000400649 in _start ()
(In reply to comment #11) > urlgrabber's PyCurlFileObject calls "reset" in its _do_open method, so the > reference issue happens once per file downloaded by yum. > > So it appears that a "yum update" or "yum install" that downloads many > thousands of packages will trip this error, but N may need to be very high. I attempted to trigger this with a very large "yum install"; this was on a RHEL6 box within our internal "beaker" lab, with lots of repositories. # yum install * --skip-broken (snip) Transaction Summary ============================================================================================================================================================= Install 9042 Package(s) Upgrade 0 Package(s) Total download size: 7.7 G Installed size: 25 G During the downloads: (gdb) p _Py_NoneStruct $1 = {ob_refcnt = 365835, ob_type = 0x3b73385ca0} at (1287/9042): (gdb) p _Py_NoneStruct $1 = {ob_refcnt = 364095, ob_type = 0x3b73385ca0} at (2186/9042): (gdb) p _Py_NoneStruct $2 = {ob_refcnt = 362644, ob_type = 0x3b73385ca0} at (4143/9042): (gdb) p _Py_NoneStruct $3 = {ob_refcnt = 359517, ob_type = 0x3b73385ca0} at (5331/9042): (gdb) p _Py_NoneStruct $4 = {ob_refcnt = 357365, ob_type = 0x3b73385ca0} at (8531/9042): (gdb) p _Py_NoneStruct $5 = {ob_refcnt = 351284, ob_type = 0x3b73385ca0} So it's clearly slowly losing references to None. As it happened, it failed with some 404s (misconfigured test repository), with "Error Downloading Packages:" and it managed to exit cleanly (with sane error messages) I wonder if it's possibly to trigger the fatal error when using mirrors: if you have say 15 mirrors, most of which are failing, does this dramatically increase the number of PyCurlFileObject._do_open calls (and thus "Curl.reset" calls) ? That might be a situation in which the bug manifests. Seth: does yum's mirror handling work this way?
the mirror handling is all inside of urlgrabber. Yum just passes in the list of mirrors to the mirrorgrab instance for that repo. It does retry more so if you have a bad mirror you'd get that many more retries.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When the reset() method was called, the number of references to the "Py_None" object was not counted properly. Consequent to this, Python could terminate unexpectedly with the following error message: Fatal Python error: deallocating None Aborted (core dumped) With this update, the underlying source code has been modified to address this issue, and references to the "Py_None" object are now counted as expected.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0295.html