Bug 1610456 - [hawkey] occasional segfault when interrupting (SIGINT) dnf process (may be caused by particular plugins in use, e.g. "leaves" ones in the past)
Summary: [hawkey] occasional segfault when interrupting (SIGINT) dnf process (may be c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: amatej
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1684270
TreeView+ depends on / blocked
 
Reported: 2018-07-31 16:37 UTC by Jan Pokorný [poki]
Modified: 2019-03-31 03:00 UTC (History)
5 users (show)

Fixed In Version: libdnf-0.28.1-1.fc30 libdnf-0.28.1-1.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1684270 (view as bug list)
Environment:
Last Closed: 2019-03-31 00:04:12 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1610542 None CLOSED [hawkey/libsolv integration] dnf crashes when rpm-extension-full and -less arguments are combined 2019-06-24 07:07:12 UTC

Internal Links: 1610542

Description Jan Pokorný [poki] 2018-07-31 16:37:04 UTC
In a sense, this is an extension over [bug 1425476], though I am not
sure how closely these bugs are related.

What I recklessly did was to run what boils down to something like:

> /usr/bin/python3 /usr/bin/dnf update --allowerasing zsh

then pressing Ctrl-C shortly after, and the crash emerged.  Sadly,
the coredump was not generated due to ulimit settings, but I went on
to reproduce this, with a chance of hitting that circumstance once
in about 10 to 30 attempts, assuming python3-dnf-plugin-{,show-}leaves
packages are installed, because that's a prerequisite as discovered
later on.

I've figured out that the problem is around hawkey - Python object
system integration -- all traces in most recent first order:

- brief stack trace (somewhat sifted by coredumpctl compared to
  72 frames shown in gdb; there was just a single thread):

> #0  0x00007f7005938691 PyObject_GetIter (libpython3.7m.so.1.0)
> #1  0x00007f6ffd11aefd _ZL10query_iterP7_object (_hawkey.so)
> #2  0x00007f70059386a6 PyObject_GetIter (libpython3.7m.so.1.0)
> #3  0x00007f70059da6e2 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #4  0x00007f700597b7ca _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #5  0x00007f70059d9ab4 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #6  0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #7  0x00007f700597b971 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #8  0x00007f70059d9ab4 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #9  0x00007f70059369ea _PyFunction_FastCallDict (libpython3.7m.so.1.0)
> #10 0x00007f7005945dc6 _PyObject_Call_Prepend (libpython3.7m.so.1.0)
> #11 0x00007f700593711b PyObject_Call (libpython3.7m.so.1.0)
> #12 0x00007f7005a47fd8 methodcaller_call (libpython3.7m.so.1.0)
> #13 0x00007f7005935391 _PyObject_FastCallDict (libpython3.7m.so.1.0)
> #14 0x00007f700596a4d1 map_next (libpython3.7m.so.1.0)
> #15 0x00007f700598f447 list___init__ (libpython3.7m.so.1.0)
> #16 0x00007f70059a5949 _PyObject_FastCallKeywords (libpython3.7m.so.1.0)
> #17 0x00007f70059dea44 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #18 0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #19 0x00007f700597b971 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #20 0x00007f70059de6c7 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #21 0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #22 0x00007f700597b971 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #23 0x00007f70059d9ab4 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #24 0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #25 0x00007f700597b971 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #26 0x00007f70059d9ab4 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #27 0x00007f700597b7ca _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #28 0x00007f70059d9c69 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #29 0x00007f700597b7ca _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #30 0x00007f70059d9c69 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #31 0x00007f700597b7ca _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #32 0x00007f70059d9c69 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #33 0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #34 0x00007f700597b971 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #35 0x00007f70059d9c69 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #36 0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #37 0x00007f700597b971 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
> #38 0x00007f70059daa85 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
> #39 0x00007f70059358e6 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
> #40 0x00007f70059368a3 PyEval_EvalCodeEx (libpython3.7m.so.1.0)
> #41 0x00007f70059368cb PyEval_EvalCode (libpython3.7m.so.1.0)
> #42 0x00007f7005a5f9ae run_mod (libpython3.7m.so.1.0)
> #43 0x00007f7005a5fd46 PyRun_FileExFlags (libpython3.7m.so.1.0)
> #44 0x00007f7005a61e08 PyRun_SimpleFileExFlags (libpython3.7m.so.1.0)
> #45 0x00007f7005a637a2 pymain_main.constprop.361 (libpython3.7m.so.1.0)
> #46 0x00007f7005a640e0 _Py_UnixMain (libpython3.7m.so.1.0)
> #47 0x00007f70054d50e3 __libc_start_main (libc.so.6)
> #48 0x0000558e2f7b708a _start (python3.7)

- detailed trace of the 4 most recent frames:

> #0  0x00007f7005938691 in PyObject_GetIter (o=0x0, o=<optimized out>) at /usr/src/debug/python3-3.7.0-4.fc29.x86_64/Objects/abstract.c:2518
>        	t = <optimized out>
>        	f = <optimized out>

2512│ PyObject *
2513│ PyObject_GetIter(PyObject *o)
2514│ {
2515│     PyTypeObject *t = o->ob_type;
2516│     getiterfunc f;
2517│
2518├>    f = t->tp_iter;

> #1  0x00007f6ffd11aefd in query_iter(_object*) (self=<unknown at remote 0x7f6ffc5e32d0>) at /usr/src/debug/libdnf-0.16.1-2.fc29.x86_64/python/hawkey/iutil-py.hpp:48
>        	pset = <optimized out>
>        	list = {
>           pyObj = 0x0
>        	}
>        	iter = <optimized out>

>> 33│ /**
>> 34│ * @brief Smart pointer to PyObject
>> 35│ *
>> 36│ * Owns and manages PyObject. Decrements the reference count for the PyObject
>> 37│ * (calls Py_XDECREF on it) when  UniquePtrPyObject goes out of scope.
>> 38│ *
>> 39│ * Implements subset of standard std::unique_ptr methods.
>> 40│ */
>> 41│ class UniquePtrPyObject {
>> 42│ public:
>> 43│     constexpr UniquePtrPyObject() noexcept : pyObj(NULL) {}
>> 44│     explicit UniquePtrPyObject(PyObject * pyObj) noexcept : pyObj(pyObj) {}
>> 45│     UniquePtrPyObject(UniquePtrPyObject && src) noexcept : pyObj(src.pyObj) { src.pyObj = NULL; }
>> 46│     UniquePtrPyObject & operator =(UniquePtrPyObject && src) noexcept;
>> 47│     explicit operator bool() const noexcept { return pyObj != NULL; }
>> 48├>    PyObject * get() const noexcept { return pyObj; }
>> 49│     PyObject * release() noexcept;
>> 50│     void reset(PyObject * pyObj = NULL) noexcept;
>> 51│     ~UniquePtrPyObject();
>> 52│ private:
>> 53│     PyObject * pyObj;
>> 54│ };

> #2  0x00007f70059386a6 in PyObject_GetIter (o=<unknown at remote 0x7f6ffc5e32d0>, o=<optimized out>) at /usr/src/debug/python3-3.7.0-4.fc29.x86_64/Objects/abstract.c:2525
>        	res = <optimized out>
>        	t = <optimized out>
>        	f = <optimized out>

>> 2512│ PyObject *
>> 2513│ PyObject_GetIter(PyObject *o)
>> 2514│ {
>> 2515│     PyTypeObject *t = o->ob_type;
>> 2516│     getiterfunc f;
>> 2517│
>> 2518│     f = t->tp_iter;
>> 2519│     if (f == NULL) {
>> 2520│         if (PySequence_Check(o))
>> 2521│             return PySeqIter_New(o);
>> 2522│         return type_error("'%.200s' object is not iterable", o);
>> 2523│     }
>> 2524│     else {
>> 2525├>        PyObject *res = (*f)(o);
>> 2526│         if (res != NULL && !PyIter_Check(res)) {
>> 2527│             PyErr_Format(PyExc_TypeError,
>> 2528│                          "iter() returned non-iterator "
>> 2529│                          "of type '%.100s'",
>> 2530│                          res->ob_type->tp_name);
>> 2531│             Py_DECREF(res);
>> 2532│             res = NULL;
>> 2533│         }
>> 2534│         return res;
>> 2535│     }
>> 2536│ }

> #3  0x00007f70059da6e2 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3-3.7.0-4.fc29.x86_64/Python/ceval.c:2759
>        	iterable = <unknown at remote 0x7f6ffc5e32d0>
>        	iter = <optimized out>
>        	stack_pointer = <optimized out>
>        	next_instr = 0x7f6ffc7b2114
>        	opcode = <optimized out>
>        	oparg = <optimized out>
>        	why = <optimized out>
>        	fastlocals = <optimized out>
>         freevars = <optimized out>
>         retval = <optimized out>
>        	tstate = <optimized out>
>        	co = <optimized out>
>         instr_ub = -1
>        	instr_lb = 0
>        	instr_prev = -1
>        	first_instr = <optimized out>
>        	names = <optimized out>
>        	consts = <optimized out>
>        	opcode_targets = { [nothing telling list OMITTED] }

>> 2756│         TARGET(GET_ITER) {
>> 2757│      	  /* before: [obj]; after [getiter(obj)] */
>> 2758│             PyObject *iterable = TOP();
>> 2759├>            PyObject *iter = PyObject_GetIter(iterable);
>> 2760│             Py_DECREF(iterable);
>> 2761│             SET_TOP(iter);
>> 2762│             if (iter == NULL)
>> 2763│                 goto error;
>> 2764│             PREDICT(FOR_ITER);
>> 2765│             PREDICT(CALL_FUNCTION);
>> 2766│             DISPATCH();
>> 2767│         }

- guesstimated Python trace (sadly, py-* functions were not working for
  me in gdb, not sure if because of Python 3.7, so I did little bit of
  gdb wizardry to extract that by hand):

/usr/lib/python3.7/site-packages/dnf-plugins/leaves.py:

> 153│  def findleaves(self):
> 154│      (packages, depends, rdepends) = self.buildgraph()
> 155├>     return [packages[i] for i in self.kosaraju(depends, rdepends)]


/usr/lib/python3.7/site-packages/dnf/util.py:

> 235│  def mapall(fn, *seq):
> 236│      """Like functools.map(), but return a list instead of an iterator.
> 237│
> 238│      This means all side effects of fn take place even without iterating the
> 239│      result.
> 240│
> 241│      """
> 242├>     return list(map(fn, *seq))

/usr/lib/python3.7/site-packages/dnf/plugin.py:

>  94│ class Plugins(object):
>  95│     def __init__(self):
>  96│         self.plugin_cls = []
>  97│         self.plugins = []
>  98│ 
>  99│     def _caller(method):
> 100│         def fn(self):
> 101├>            dnf.util.mapall(operator.methodcaller(method), self.plugins)
> 102│         return fn
> ...│
> 141│     run_sack = _caller('sack')
> 142├>    run_resolved = _caller('resolved')
> 143│     run_pre_transaction = _caller('pre_transaction')
> 144│     run_transaction = _caller('transaction')

/usr/lib/python3.7/site-packages/dnf/base.py:

> 802│      def resolve(self, allow_erasing=False):
> 803│          # :api
> 804│          """Build the transaction set."""
> 805│          exc = None
> 806│          self._finalize_comps_trans()
> ...│
> 847├>         self._plugins.run_resolved()
> 848│          self.repo_module_dict.enable_based_on_rpms()
> 849│          return got_transaction

/usr/lib/python3.7/site-packages/dnf/cli/main.py:

> 142│  def resolving(cli, base):
> 143│      """Perform the depsolve, download and RPM transaction stage."""
> 144│  
> 145│      if base.transaction is None:
> 146├>         base.resolve(cli.demands.allow_erasing)
> 147│          logger.info(_('Dependencies resolved.'))

* * *

Based on the above, perhaps the simplest workaround is to either
use --noplugins switch or to avoid python3-dnf-plugin-{,show-}leaves
packages/plugins -- at least I wasn't able to reproduce the issue
afterwards.  On the other hand, "leaves" plugins my not be the
only affected, I just happened to have only dnf-plugins-core
installed in addition to those.

My guess this corner case is not covered because of either, sorted
with increasing importance:

- questionable arrangements around signal handling and arising
  asynchronicity

- API misuse in "leaves" plugins (and possibly others)

- design flaw around plugin management
  (possibly wrt. critical sections)

I think it would be trivial to fix python/hawkey/iutil-py.hpp with
explicit inline NULL test, however, that could be rather shortsighted
approach leading just to masking the possible deeper issue that
could resurface at other places where the mole hasn't be whacked yet.
And dnf amounts to a critical piece of software.

Comment 1 Jan Kurik 2018-08-14 08:39:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 2 Jan Pokorný [poki] 2018-12-14 17:49:23 UTC
Have hit something similar again, this time around only with
{python3-,}dnf-plugins-core present, and with relatively
recent packages:

$ rpm -q dnf rpm libdnf librepo libsolv
> dnf-4.0.10-1g5bbf4ac2.fc30.noarch
> rpm-4.14.2.1-3.fc30.x86_64
> libdnf-0.24.1-1ga0afd844.fc30.x86_64
> librepo-1.9.2-1.fc30.x86_64
> libsolv-0.7.2-11g95dcddc7.fc30.x86_64

From "coredumpctl info"

#0  0x00007f37338f2861 PyObject_GetIter (libpython3.7m.so.1.0)
#1  0x00007f3730c56c3d query_iter (_hawkey.so)
#2  0x00007f37338f2876 PyObject_GetIter (libpython3.7m.so.1.0)
#3  0x00007f37339a8142 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#4  0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#5  0x00007f37339a7629 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#6  0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#7  0x00007f37339a7474 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#8  0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#9  0x00007f37339a7474 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#10 0x00007f37338efc08 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
#11 0x00007f3733936a41 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#12 0x00007f37339a84e7 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#13 0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#14 0x00007f37339a7474 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#15 0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#16 0x00007f37339a7474 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#17 0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#18 0x00007f37339a7629 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#19 0x00007f373393689a _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#20 0x00007f37339a7629 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#21 0x00007f37338efc08 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
#22 0x00007f3733936a41 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#23 0x00007f37339a7629 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#24 0x00007f37338efc08 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
#25 0x00007f3733936a41 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
#26 0x00007f37339a84e7 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
#27 0x00007f37338efc08 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
#28 0x00007f37338f0aa3 PyEval_EvalCodeEx (libpython3.7m.so.1.0)
#29 0x00007f37338f0acb PyEval_EvalCode (libpython3.7m.so.1.0)
#30 0x00007f3733a15a02 n/a (libpython3.7m.so.1.0)
#31 0x00007f3733a175ba PyRun_FileExFlags (libpython3.7m.so.1.0)
#32 0x00007f3733a188a8 PyRun_SimpleFileExFlags (libpython3.7m.so.1.0)
#33 0x00007f3733a1a60c n/a (libpython3.7m.so.1.0)
#34 0x00007f3733a1a86c _Py_UnixMain (libpython3.7m.so.1.0)
#35 0x00007f3733b78ee3 __libc_start_main (libc.so.6)
#36 0x000055d5ff74b08e _start (python3.7)

The command being run (with shell quoting and regardless it's a bad
specification):
/usr/bin/python3 /usr/bin/dnf provides 'pkgconfig(glib-2.0)>=2'

The reproducibility per [comment 0] is about the same
(more towards the higher bound than the lower one).

Sadly, debugger itself is currently crashing for me, retaining the
coredumps for when I have more luck.  But the pattern is seemingly
the same as in [comment 0].

That being said, bumping to Rawhide.

Comment 3 Jan Pokorný [poki] 2018-12-20 23:45:48 UTC
You can find the coredump at another (gdb targeted) [bug 1661199].
Let me know if that's not available to download for you.

Comment 4 Jan Pokorný [poki] 2018-12-20 23:48:07 UTC
Also it looks like --noplugins is a definitive workaround for the time
being, should anyone want to use that (I don't for now).

Comment 5 Ben Cotton 2019-02-19 17:11:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 30 development cycle.
Changing version to '30.

Comment 6 amatej 2019-03-07 08:43:19 UTC
I created a PR with the NULL test. 
https://github.com/rpm-software-management/libdnf/pull/694

I think it's a correct solution, because the signal interrupt in C is handled by Python signal module, after that calls into Python/C API return NULL/-1. We should always check for that and safely propagate the "error" up. Once we get back to Python the interpreter notices the signal flags (set in the mentioned Python signal module) and throws Keyboard Exception.

Comment 7 Fedora Update System 2019-03-27 13:37:22 UTC
dnf-4.2.2-1.fc30 librepo-1.9.6-1.fc30 libdnf-0.28.1-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-b504d6ad8f

Comment 8 Fedora Update System 2019-03-27 13:37:46 UTC
dnf-4.2.2-1.fc29 librepo-1.9.6-1.fc29 libdnf-0.28.1-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-7fbfa37585

Comment 9 Fedora Update System 2019-03-27 18:18:27 UTC
dnf-4.2.2-1.fc30, libdnf-0.28.1-1.fc30, librepo-1.9.6-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-b504d6ad8f

Comment 10 Fedora Update System 2019-03-27 20:01:58 UTC
dnf-4.2.2-1.fc29, libdnf-0.28.1-1.fc29, librepo-1.9.6-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-7fbfa37585

Comment 11 Fedora Update System 2019-03-31 00:04:12 UTC
dnf-4.2.2-1.fc30, libdnf-0.28.1-1.fc30, librepo-1.9.6-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2019-03-31 03:00:47 UTC
dnf-4.2.2-1.fc29, libdnf-0.28.1-1.fc29, librepo-1.9.6-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.