Bug 1417870
Summary: | [abrt] system-python: strlen(): system-python killed by signal 11 | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Igor Gnatenko <ignatenko> | ||||||||||||||||||||||||||
Component: | libdnf | Assignee: | rpm-software-management | ||||||||||||||||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||||
Priority: | low | ||||||||||||||||||||||||||||
Version: | rawhide | CC: | arjun.is, bkabrda, codonell, cstratak, dj, dominik, fweimer, jmracek, kdudka, klember, law, mcyprian, mfabian, mhroncok, mluscon, mustafa1024m, pfrankli, pviktori, randy, rkuska, rlpowell, rpm-software-management, siddhesh, tomspur, torsava, vondruch | ||||||||||||||||||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||
URL: | https://retrace.fedoraproject.org/faf/reports/bthash/8862e0d7d1762017693f2e1a4c4687a162b004f2 | ||||||||||||||||||||||||||||
Whiteboard: | abrt_hash:42ce4b38346841f156bce3ef0831cf7ba2d2c418;VARIANT_ID=workstation; | ||||||||||||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||||
Last Closed: | 2017-02-10 15:02:54 UTC | Type: | --- | ||||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||||
Attachments: |
|
Description
Igor Gnatenko
2017-01-31 10:42:13 UTC
Created attachment 1246125 [details]
File: backtrace
Created attachment 1246126 [details]
File: cgroup
Created attachment 1246127 [details]
File: core_backtrace
Created attachment 1246128 [details]
File: dso_list
Created attachment 1246129 [details]
File: environ
Created attachment 1246130 [details]
File: exploitable
Created attachment 1246131 [details]
File: limits
Created attachment 1246132 [details]
File: maps
Created attachment 1246133 [details]
File: open_fds
Created attachment 1246134 [details]
File: proc_pid_status
Created attachment 1246135 [details]
File: var_log_messages
Removing /var/cache/dnf fixes problem. * dnf --disablerepo=\* --enablerepo=rawhide update -> doesn't crash * dnf --disablerepo=\* --enablerepo=rawhide --enablerepo=kdudka-covscan update -> doesn't crash * dnf --disablerepo=\* --enablerepo=rawhide --enablerepo=kdudka-covscan --enablerepo=rpmfusion-free-rawhide update -> crashes in horrible fire * dnf --refresh update -> doesn't really refresh cache, but fixes problem * removing *.solv and *.solvx files doesn't fix problem * removing packages.db fixes problem Downgrade from glib -29 to -28 fixes problem. glibc x86_64 2.24.90-28.fc26 @commandline 3.4 M glibc-all-langpacks x86_64 2.24.90-28.fc26 @commandline 7.0 M glibc-common x86_64 2.24.90-28.fc26 @commandline 878 k glibc-devel x86_64 2.24.90-28.fc26 @commandline 962 k glibc-headers x86_64 2.24.90-28.fc26 @commandline 512 k glibc-langpack-en x86_64 2.24.90-28.fc26 @commandline 288 k libcrypt-nss x86_64 2.24.90-28.fc26 @commandline 50 k This looks like a use-after-free issue in libdnf or its Python bindings. Reassigning. Igor, if you still have the files around that make dnf crash, it might be worth running it under valgrind to see if it can pinpoint any use-after-free issues. (In reply to Kalev Lember from comment #18) > Igor, if you still have the files around that make dnf crash, it might be > worth running it under valgrind to see if it can pinpoint any use-after-free > issues. https://ignatenkobrain.fedorapeople.org/dnf-cache.tar.xz Not sure this is glibc issue, since I get bug 1418172 with glibc-2.24.90-26.fc26.x86_64 Actually there seems to be similar issue on F25: https://retrace.fedoraproject.org/faf/reports/1505192/ *** Bug 1418172 has been marked as a duplicate of this bug. *** It's hard to reproduce. to avoid these situations again (to actually know you need to restart system), please use tracer plugin form dnf-plugins-extras. DNF should probably take information from updateinfo metadata and report to user that package requires restarting. (In reply to Honza Silhan from comment #23) It seems that you know what is the reason behind this issue, so could you please enlighten me what is the problem here? As I said, in my case I got the error with way older version of glibc then Igor used and I am pretty sure that I updated just a few packages non of which is running on the background. So why I should restart anything? Created attachment 1248405 [details]
Valgrind output
Some observations.
1) It is interesting, that in this case, the "dnf update" crashes only with the "--refresh" parameter. It won't crash otherwise.
2) The log is in czech, since I was not able to reproduce the error with LANG=C.utf-8.
3) I was not able to reproduce this when running under Valgrind, nor GDB
$ rpm -q glibc
glibc-2.24.90-29.fc26.x86_64
$ rpm -q dnf
dnf-2.0.0-2.fc26.noarch
$ rpm -q libdnf
libdnf-0.7.0-0.7gitf9b798c.fc26.x86_64
$ rpm -q python3-hawkey
python3-hawkey-0.7.0-0.7gitf9b798c.fc26.x86_64
There's a bunch of invalid memory accesses (use-after-free) in the valgrind log. Could someone who understands the hawkey python bindings look at those please? I think that fixing those should fix the crash. So from the Vagrant output, it all begins in reldep_from_pystr [1] and this is the implementation: ~~~ DnfReldep * reldep_from_pystr(PyObject *o, DnfSack *sack) { DnfReldep *reldep = NULL; const char *reldep_str = NULL; PyObject *tmp_py_str = NULL; reldep_str = pycomp_get_string(o, &tmp_py_str); if (reldep_str == NULL) return NULL; Py_XDECREF(tmp_py_str); reldep = reldep_from_str(sack, reldep_str); return reldep; } ~~~ The string is allocated by "pycomp_get_string". In its description is written: ~~~ /** * bytes, basic string or unicode string in Python 2/3 to c string converter, * you need to call Py_XDECREF(tmp_py_str) after usage of returned string */ ~~~ The "Py_XDECREF" is called indeed, but *prior* the string is used. So now I can only guess, that once the reference is decreased, the GC kicks in in some cases and cleans up the memory and later, when the reldep_from_str is called, it migh crash. So this [2] is PR which might fix the SEGFAULT (and it fixes another similar looking place). But: 1. I have not tested the patch at all. 2. I don't have reproducer at my hand, but the shortest path seems to be something like properly modified test_reldep_list [3]. I would really appreciate if somebody else (more skilled in Python and DNF) could find the right reproducer and provide regression test prior this gets merged. [1]: https://github.com/rpm-software-management/libdnf/blob/master/python/hawkey/iutil-py.c#L313 [2]: https://github.com/rpm-software-management/libdnf/pull/255 [3]: https://github.com/rpm-software-management/libdnf/blob/master/python/hawkey/tests/tests/test_query.py#L189 (In reply to Vít Ondruch from comment #27) > Vagrant s/Vagrant/Valgrind/ of course ... to my defense, I faced this issue during work on Vagrant :D *** Bug 1421797 has been marked as a duplicate of this bug. *** Confirmed latest libdnf works; thanks! Igor, could you please refer to the actual commit that fixed this bug and set the Fixed In Version field accordingly? (In reply to Kamil Dudka from comment #31) > Igor, could you please refer to the actual commit that fixed this bug and > set the Fixed In Version field accordingly? 1. It's linked from Vit's comment above 2. I have absolutely no idea with which version it got fixed |