Bug 1058297 - SystemError: error return without exception set
Summary: SystemError: error return without exception set
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: yum
Version: 7.0
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: James Antill
QA Contact: Karel Srot
URL:
Whiteboard: abrt_hash:3cc8cff1107febe84a2e5e457c7...
: 1061664 (view as bug list)
Depends On:
Blocks: 782468 1094654 1086308
TreeView+ depends on / blocked
 
Reported: 2014-01-27 13:23 UTC by Martin
Modified: 2014-09-15 00:04 UTC (History)
16 users (show)

Fixed In Version: yum-3.4.3-118.el7
Doc Type: Known Issue
Doc Text:
Under certain rare circumstances, the anaconda installer does not interact correctly with yum and returns an error with no exception set. Since this issue occurs rarely, reattempt the installation work around this problem. Alternatively, use the text mode installation, where this bug does not occur.
Clone Of:
Environment:
Last Closed: 2014-06-16 10:22:31 UTC
Target Upstream Version:


Attachments (Terms of Use)
File: anaconda-tb (382.24 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: anaconda.log (6.89 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: environ (404 bytes, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: lsblk_output (1.45 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: nmcli_dev_list (1.78 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: os_info (495 bytes, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: program.log (21.41 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: storage.log (58.73 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: syslog (99.03 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: ifcfg.log (1.65 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
File: packaging.log (111.99 KB, text/plain)
2014-01-27 13:23 UTC, Martin
no flags Details
anaconda-tb (345.23 KB, text/plain)
2014-03-12 07:56 UTC, Michal Kovarik
no flags Details
patch fixing the issue (998 bytes, patch)
2014-04-08 09:17 UTC, Vratislav Podzimek
no flags Details | Diff
anaconda.log (7.19 KB, text/plain)
2014-04-11 10:33 UTC, Peter Kotvan
no flags Details
anaconda-tb-PxxtnE (271.98 KB, text/plain)
2014-04-11 10:33 UTC, Peter Kotvan
no flags Details
anaconda-tb-VDWZWJ (270.84 KB, text/plain)
2014-04-11 10:34 UTC, Peter Kotvan
no flags Details
program.log (23.70 KB, text/plain)
2014-04-11 10:34 UTC, Peter Kotvan
no flags Details
storage.log (89.27 KB, text/plain)
2014-04-11 10:34 UTC, Peter Kotvan
no flags Details
storage.state (20.00 KB, application/octet-stream)
2014-04-11 10:34 UTC, Peter Kotvan
no flags Details
syslog (62.85 KB, text/plain)
2014-04-11 10:34 UTC, Peter Kotvan
no flags Details
proposed patch from the version 5 of the updates.img (940 bytes, patch)
2014-04-14 08:59 UTC, Vratislav Podzimek
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1104108 None None None Never

Internal Links: 1104108

Description Martin 2014-01-27 13:23:00 UTC
Description of problem:
Keep default English language and US layout
Accept my fate.

Hub is displayed, then crash.

Version-Release number of selected component:
anaconda-19.31.51-1

The following was filed automatically by anaconda:
anaconda 19.31.51-1 exception report
Traceback (most recent call first):
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1770, in _checkMD
    l_csum = self._checksum(r_ctype, file, datasize=size)
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 280, in _check_uncompressed_db_gen
    check_can_fail=True):
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 231, in populate
    db_un_fn = self._check_uncompressed_db_gen(repo, mydbtype)
  File "/usr/lib/python2.7/site-packages/yum/repos.py", line 383, in populateSack
    sack.populate(repo, mdtype, callback, cacheonly)
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 776, in _getSacks
    self.repos.populateSack(which=repos)
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1071, in <lambda>
    pkgSack = property(fget=lambda self: self._getSacks(),
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 910, in _getGroups
    self.pkgSack
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1094, in <lambda>
    comps = property(fget=lambda self: self._getGroups(),
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 3774, in selectGroup
    thesegroups = self.comps.return_groups(grpid)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/packaging/yumpayload.py", line 1258, in _selectYumGroup
    self._yum.selectGroup(groupid, group_package_types=pkg_types)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/packaging/yumpayload.py", line 1366, in _applyYumSelections
    self._selectYumGroup("core")
  File "/usr/lib64/python2.7/site-packages/pyanaconda/packaging/yumpayload.py", line 1440, in checkSoftwareSelection
    self._applyYumSelections()
  File "/usr/lib64/python2.7/site-packages/pyanaconda/ui/gui/spokes/software.py", line 104, in checkSoftwareSelection
    self.payload.checkSoftwareSelection()
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 192, in run
    threading.Thread.run(self, *args, **kwargs)
SystemError: error return without exception set

Additional info:
cmdline:        /usr/bin/python  /sbin/anaconda
cmdline_file:   initrd=/distrotrees/65715/initrd method=http://download-01.eng.brq.redhat.com/pub/rhel/rel-eng/RHEL-7.0-20140127.0/compose/Workstation/x86_64/os/ repo=http://download-01.eng.brq.redhat.com/pub/rhel/rel-eng/RHEL-7.0-20140127.0/compose/Workstation/x86_64/os/  BOOT_IMAGE=/distrotrees/65715/kernel 
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         3.10.0-78.el7.x86_64
product:        Red Hat Enterprise Linux
release:        Red Hat Enterprise Linux Client release 7.0 Beta (Maipo)
type:           anaconda
version:        7.0

Comment 1 Martin 2014-01-27 13:23:03 UTC
Created attachment 856051 [details]
File: anaconda-tb

Comment 2 Martin 2014-01-27 13:23:07 UTC
Created attachment 856052 [details]
File: anaconda.log

Comment 3 Martin 2014-01-27 13:23:09 UTC
Created attachment 856053 [details]
File: environ

Comment 4 Martin 2014-01-27 13:23:12 UTC
Created attachment 856054 [details]
File: lsblk_output

Comment 5 Martin 2014-01-27 13:23:18 UTC
Created attachment 856055 [details]
File: nmcli_dev_list

Comment 6 Martin 2014-01-27 13:23:20 UTC
Created attachment 856056 [details]
File: os_info

Comment 7 Martin 2014-01-27 13:23:23 UTC
Created attachment 856057 [details]
File: program.log

Comment 8 Martin 2014-01-27 13:23:25 UTC
Created attachment 856058 [details]
File: storage.log

Comment 9 Martin 2014-01-27 13:23:27 UTC
Created attachment 856059 [details]
File: syslog

Comment 10 Martin 2014-01-27 13:23:28 UTC
Created attachment 856060 [details]
File: ifcfg.log

Comment 11 Martin 2014-01-27 13:23:31 UTC
Created attachment 856061 [details]
File: packaging.log

Comment 13 David Shea 2014-01-27 16:57:03 UTC
Exception appears to be happening in yum

Comment 14 James Antill 2014-03-05 21:40:32 UTC
 What does:

SystemError: error return without exception set

...mean? And while anaconda is pretty special, this bit of code is run pretty much all the time. I'd also guess it's impossible for yum to traceback at the line given, for any problem ... almost like the lowest part of the traceback is missing?
 I mean this line:

    l_csum = self._checksum(r_ctype, file, datasize=size)

...just calls a function with 3 args. as long as the function name is correct and the dito. the arg. names, it just can't fail.

Comment 15 James Antill 2014-03-05 21:40:49 UTC
*** Bug 1061664 has been marked as a duplicate of this bug. ***

Comment 16 David Shea 2014-03-05 21:54:14 UTC
(In reply to James Antill from comment #14)
>  What does:
> 
> SystemError: error return without exception set
> 
> ...mean?

Usually it means that a C-function called from Python isn't setting an error correctly, but the code that checks for this condition isn't specific to the C bindings. If it isn't a problem in functions being called by the yum checksum code, maybe there was a problem in the compiled yumRepo.pyc on the install media?

Comment 17 James Antill 2014-03-12 06:36:49 UTC
(In reply to David Shea from comment #16)
> If it isn't a problem in functions being called by the yum
> checksum code, maybe there was a problem in the compiled yumRepo.pyc on the
> install media?

That guess is as good to me as anything else ... which is to say I've no idea how it could happen or what could cause it. Going to NaB it on the yum side.

Comment 18 Michal Kovarik 2014-03-12 07:56:15 UTC
Created attachment 873363 [details]
anaconda-tb

I saw this issue on RHEL-7.0-20140311.n.0 with anaconda 19.31.66-1.

Please don't close this bug as notabug, this is bug and it should be investigated. Please change component if you think that it's not yum issue.

Comment 19 David Shea 2014-03-31 21:11:46 UTC
Reassigning to python as this appears to be an error caused by Python itself.

Comment 20 Bohuslav "Slavek" Kabrda 2014-04-03 10:47:19 UTC
- Can anyone provide a working reproducer? I can't reproduce this bug.
- I have to admit I'm a bit confused about the line "threading.Thread.run(self, *args, **kwargs)" - how is this supposed to work? "threading.Thread.run" only takes "self" as an argument AFAICS. Does anaconda change this in some way?

Comment 22 Dave Malcolm 2014-04-07 15:26:47 UTC
The exception in question comes from this safety-check within the core bytecode evaluation loop, in Python/ceval.c:

  2838          /* Double-check exception status */
  2839  
  2840          if (why == WHY_EXCEPTION || why == WHY_RERAISE) {
  2841              if (!PyErr_Occurred()) {
  2842                  PyErr_SetString(PyExc_SystemError,
  2843                      "error return without exception set");
  2844                  why = WHY_EXCEPTION;
  2845              }
  2846          }

Comment 23 Dave Malcolm 2014-04-07 15:29:53 UTC
How is _checkMD implemented?  Is it in a C extension module?

Comment 24 Dave Malcolm 2014-04-07 15:35:34 UTC
I've seen this kind of exception before: within the internals of CPython, functions return a PyObject*, and the NULL value is used to indicate an exception has been thrown.  The exception itself is IIRC stored in a field of a per-thread struct.  So what has happened is a function has returned NULL (as if an exception has been thrown), and control has returned to ceval.c's bytecode evaluation loop, but the per-thread state doesn't have an exception set on it.  Hence when ceval.c goes into exception-throwing mode, but finds there isn't one, it synthesizes the exception you're seeing.

How could this happen:
(a) code in a C extension module is returning NULL, without an API call to set an exception having occurred (most likely IMHO)
(b) perhaps (I'm speculating) some weird threading issue that's causing the interpreter to pick up the wrong interpreter-state struct for the current thread - or maybe the exception is being set on the wrong thread?
(c) something else I'm not thinking of

Comment 25 Dave Malcolm 2014-04-07 15:38:48 UTC
(In reply to Dave Malcolm from comment #23)
> How is _checkMD implemented?  Is it in a C extension module?

Sorry, self._checksum, I think.

Comment 26 James Antill 2014-04-07 16:01:06 UTC
(In reply to Dave Malcolm from comment #25)
> (In reply to Dave Malcolm from comment #23)
> > How is _checkMD implemented?  Is it in a C extension module?
> 
> Sorry, self._checksum, I think.

It's roughly: self.checkMD => self._checksum => yum.misc.checksum => yum.misc.Checksums => file.read() / hashlib.*

I don't think there are any other C module calls, and I'd guess of the two that it's hashlib that's the more likely to be failing. But, again, I've never seen a report of this outside of anaconda ... so I'm very suspicious of the threading, or the general anaconda env.

Comment 27 Bohuslav "Slavek" Kabrda 2014-04-07 16:35:00 UTC
- We *think* (and this is really just a hunch, although it seems to be working) that there may be a problem with buffered reading in multithreaded applications. - It seems that there was a remotely similar problem reported upstream without a reliable reproducer [1], so the upstream fix may haven't actually fixed it.
- During our debugging, we were getting output "close failed in file object destructor", which gets printed by file object destructor - this gets invoked by "del fo", line 363 in /usr/lib/python2.7/site-packages/yum/misc.py, method "checksum". After removing this line, we can't reproduce the problem any more.

I'm however still not sure that this is the right solution and why it works. James, why is the line there in the first place? Any reason to delete the file descriptor explicitly?

[1] http://bugs.python.org/issue9295

Comment 28 James Antill 2014-04-07 20:24:30 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #27)

> I'm however still not sure that this is the right solution and why it works.
> James, why is the line there in the first place? Any reason to delete the
> file descriptor explicitly?

 Not sure, git blame shows it's "from" 2004 but it might well have been pasted from somewhere else at that time making it much older ... I'd guess it's just someone from a C background being explicit.
 The only other thing I can think is that because the fo can be passed in there could have been some weird code path with an object that really wanted a __del__ method to be called there (but I hope that isn't true).

 I have no problem removing the del line, if you want me to.

Comment 29 Bohuslav "Slavek" Kabrda 2014-04-08 06:57:49 UTC
(In reply to James Antill from comment #28)
> (In reply to Bohuslav "Slavek" Kabrda from comment #27)
> 
> > I'm however still not sure that this is the right solution and why it works.
> > James, why is the line there in the first place? Any reason to delete the
> > file descriptor explicitly?
> 
>  Not sure, git blame shows it's "from" 2004 but it might well have been
> pasted from somewhere else at that time making it much older ... I'd guess
> it's just someone from a C background being explicit.
>  The only other thing I can think is that because the fo can be passed in
> there could have been some weird code path with an object that really wanted
> a __del__ method to be called there (but I hope that isn't true).

Judging from anaconda-tb, the "file" argument passed into this function is a string, which means that fo is file opened inside this function. (That makes the problem even more weird, though...)

>  I have no problem removing the del line, if you want me to.

I'll try to confirm that removing this line solves the issue - if yes, I'll reassign this bug to Yum so that you can fix it.

(I'll continue investigating this problem, since it shouldn't be there in the first place - it's probably caused by a race condition inside Python's C code.)

Comment 30 Vratislav Podzimek 2014-04-08 09:16:36 UTC
The extra del(fo) in yum codebase seems to cause the issue (though it is a valid statement there). Reassigning.

Comment 31 Vratislav Podzimek 2014-04-08 09:17:55 UTC
Created attachment 883935 [details]
patch fixing the issue

Patch that seems to fix the issue. No additional info needed.

Comment 33 James Antill 2014-04-08 14:16:27 UTC
(In reply to Bohuslav "Slavek" Kabrda from comment #29)

> Judging from anaconda-tb, the "file" argument passed into this function is a
> string, which means that fo is file opened inside this function. (That makes
> the problem even more weird, though...)

 Yeh, missed that. It's only called when it's guaranteed to be a noop, nice :).
 On the upside there's roughly 0% chance of removing the del line causing a problem, given it doesn't do anything :).

Comment 41 Peter Kotvan 2014-04-11 10:33:52 UTC
Created attachment 885359 [details]
anaconda.log

Comment 42 Peter Kotvan 2014-04-11 10:33:56 UTC
Created attachment 885360 [details]
anaconda-tb-PxxtnE

Comment 43 Peter Kotvan 2014-04-11 10:34:00 UTC
Created attachment 885361 [details]
anaconda-tb-VDWZWJ

Comment 44 Peter Kotvan 2014-04-11 10:34:03 UTC
Created attachment 885362 [details]
program.log

Comment 45 Peter Kotvan 2014-04-11 10:34:07 UTC
Created attachment 885363 [details]
storage.log

Comment 46 Peter Kotvan 2014-04-11 10:34:14 UTC
Created attachment 885364 [details]
storage.state

Comment 47 Peter Kotvan 2014-04-11 10:34:18 UTC
Created attachment 885366 [details]
syslog

Comment 52 Vratislav Podzimek 2014-04-14 08:59:14 UTC
Created attachment 886071 [details]
proposed patch from the version 5 of the updates.img

Comment 54 James Antill 2014-04-15 13:43:38 UTC
 Vratislav, you want to remove the CHUNK argument to open?

 I assume you want to do this as well as the previous removal of the "del"?

 Should we be doing this upstream too, or is this a temporary workaround for an issue that'll be fixed in python?

Comment 55 Vratislav Podzimek 2014-04-15 15:16:23 UTC
The updates.img that is considired to be fixing the issue contains both changes, removing the "del()" call and removing the CHUNK argument to open. So I think these should both be applied.

And I believe it should be done upstream. It is, with high probability, a bug in Python (as per comment #27), but it is a long-term issue and nobody know when and if ever it gets resolved. On the other hand, that argument is unnecessary, it's better to let Python/OS decide on buffering.

Comment 56 James Antill 2014-04-15 15:23:13 UTC
Ok, thanks.

Comment 57 Karel Srot 2014-05-06 12:21:59 UTC
Hello,
do we have any update on this bug? Is this still an issue, eventually is anybody able to confirm the fix on recent compose?

Comment 64 Martin Kolman 2014-06-02 14:17:57 UTC
So the reporter in bug 1103692 replied that they are seeing the issue ("return without exception set" in yum) in RC 3.1 DVD server variant only, not on workstation, compute node or desktop. This would indicate that the fix included in RC 3.1 and mentioned in comment 61 might be ineffective.

Comment 65 Martin Kolman 2014-06-03 12:15:32 UTC
Radek Vykydal was able to reproduce the issue on a KVM VM booted from PXE, but only once. Subsequent runs did not reproduce it. This seems to be another indication that the issue is some sort of a race condition.

Comment 66 Ludek Smid 2014-06-16 10:22:31 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.