Bug 2054230
Summary: | Kickstart installation hangs before Verifying: stage of dns transaction | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Radek Vykydal <rvykydal> |
Component: | dnf | Assignee: | rpm-software-management |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 37 | CC: | anaconda-maint-list, awilliam, daniel.mach, jmracek, jonathan, jrohel, jstodola, kellin, mblaha, mhatina, packaging-team-maint, pkratoch, rpm-software-management, vanmeeuwen+fedora, vmukhame, vponcova, w |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-08-25 09:28:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Radek Vykydal
2022-02-14 13:34:45 UTC
Jaroslave, would you have any idea or hint what can be the reason, or how to debug the issue ? It looks like an issue in DNF. Reassigning. Well it is not easy to provide correct suggestion. The interruption happened when the last package is installed and scriptlets were running (post trans). They can require unpredictable time. See output - the last statement is related to systemd and some fails related to services. (suggesting close - not a bug) Recently we also have got some reports related to sudden transaction freezes but it was related to file systems. It means non of those possibilities are related to DNF and we cannot improve it. The problem appeared recently therefore if the issue is not reproducible in Fedora 34 it is likely it (suggesting close - not a bug) What can be problem in DNF is synchronization between our internal database and rpm transaction result. But it will not cause random freezes. If the problem is caused by the synchronization then we will improve it in DNF 5 (next major DNF/MICRODNF) (suggesting close - wontfix). The last option would be that there is an issue in RPM but it is unlikely (suggesting changing component to RPM) Personally I believe that it is not an issue and the freeze was related to running scriptlet or consequence of it. I am changing the component back to Anaconda for reevaluation. Please feel free to return back to DNF when you will feel that the issue is related to us with additional information. Created attachment 1867716 [details]
/tmp/packaging.log from installer
/tmp/packaging.log from another instance of the issue.
I was able to reproduce the issue in kstests batch, have a few living instances running and I can provide you access to poke around if you want. In the previous comment I added one more log, not much info there but maybe can be helpful. Unfortunately, those running instances are VMs using user networking so it is not possible to ssh to the VMs. I am using virt-viewer with X tunneled from hypervisor to access the guest terminal. Or I can try to provide any further logging info or hanging process core dump if you guide me. Thank you, Radek Created attachment 1867815 [details]
backtrace from hanging anaconda module running dnf transaction
Outputs of '$(gdb) thread apply bt all' and '$(gdb) thread apply py-bt all' applied on core dumps collected in the issue reproduced in kickstart-tests run.
Thank you very much for additional information. The last visible call is: File "/usr/lib/python3.10/site-packages/dnf/base.py", line 1154, in <listcomp> names = set([i.name for i in q]) But it is strange and may be the crash is related with something else like incorrect memory handling or so on. The problem is that there are multiple thinks happened on the line. There is a creation of set, iteration over dnf.query, creation package object and accessing of the name attribute. The line can be split into: names = set() for pkg in q: name = pkg.name names.add(name) May be it will say more specific where it fails. What can be wrong? Iteration over q returns not valid pkg therefore pkg.name will crash in string conversion because it returns randome memory instead string. I think the problem is in `for i in q` when i is dereferenced (so it can hold the next item). That triggers package_dealloc, g_object_unref and g_signal_handlers_destroy and eventually stops at g_mutex_lock_slowpath. However, I am not sure. Thread 1 (Thread 0x7efeb37fe640 (LWP 2678)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007efeddb9d37c in g_mutex_lock_slowpath (mutex=0x7efeddad3110 <g.g_signal_mutex_lock.lto_priv>) at ../glib/gthread-posix.c:1493 #2 0x00007efeddb9d3c3 in g_mutex_lock (mutex=<optimized out>) at ../glib/gthread-posix.c:1517 #3 0x00007efeddaa47ac in g_signal_handlers_destroy (instance=0x7efea019ccb0) at ../gobject/gsignal.c:2776 #4 0x00007efedda93231 in g_object_real_dispose (object=0x7efea019ccb0) at ../gobject/gobject.c:1182 #5 0x00007efedda96d24 in g_object_unref (_object=<optimized out>) at ../gobject/gobject.c:3636 #6 g_object_unref (_object=0x7efea019ccb0) at ../gobject/gobject.c:3553 #7 0x00007efed344f426 in package_dealloc(_PackageObject*) () from /usr/lib64/python3.10/site-packages/hawkey/_hawkey.so #8 0x00007efeec221366 in subtype_dealloc (self=<Package at remote 0x7efed223adc0>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/typeobject.c:1458 #9 0x00007efeec1fd92a in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/object.c:2292 #10 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:500 #11 _Py_XDECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:567 #12 list_dealloc (op=0x7efeb83973c0) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/listobject.c:343 #13 0x00007efeec221016 in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/object.c:2298 #14 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:500 #15 listiter_next (it=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/listobject.c:32 ... #17 0x00007efeec209c43 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7efeb817f920, for file /usr/lib/python3.10/site-packages/dnf/base.py, line 1154, in <listcomp> ... ... #23 0x00007efeec209c43 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7efea0946930, for file /usr/lib/python3.10/site-packages/dnf/base.py, line 1154, in _verify_transaction Reassigning to dnf based on comments 7 and 8. Created attachment 1871006 [details] Call Trace for the update from https://bugzilla.redhat.com/show_bug.cgi?id=2054230#c7 (In reply to Jaroslav Mracek from comment #7) > The line can be split into: > > names = set() > for pkg in q: > name = pkg.name > names.add(name) > > May be it will say more specific where it fails. What can be wrong? > Iteration over q returns not valid pkg therefore pkg.name will crash in > string conversion because it returns randome memory instead string. Attaching call trace for that update of base.py 1154,1160c1154 < #names = set([i.name for i in q]) < logger.log(dnf.logging.DDEBUG, 'DDDDDEBUG2 for loop') < names = set() < for pkg in q: < name = pkg.name < names.add(name) < --- > names = set([i.name for i in q]) We are probably hitting the same problem on RHEL-9, see bug 2073510. Created attachment 1875022 [details]
list of packages used to create the installer image
I can confirm that the issue is reproducible also with Fedora 35 installer. Attaching the package versions used in the F35 installer iso.
Created attachment 1877257 [details]
list of packages used to create the F34 installer image
I was not able to reproduce the issue with Fedora 34 installer. Attaching the package versions used in the F34 installer iso.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle. Changing version to 37. We haven't seen the flake on daily-iso since 04/05/2023 (tracking in the issue in https://github.com/rhinstaller/kickstart-tests/issues/675). *** Bug 2051753 has been marked as a duplicate of this bug. *** According to Comment 15 the issue is somehow resolved by changes outside of DNF components. |