Created attachment 1860984 [details] Log from kickstart test whith Anaconda (dnf transaction) actually hanging (after 02:53:00,518 or 02:53:01,095). In our daily Anaconda kickstart tests we are seeing a lot of flakes when the installation hangs just before Verifying stage of the transaction installing the system in chroot: 02:18:24,179 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch 02:18:24,240 INFO anaconda:dnf: Running in chroot, ignoring command 'daemon-reload'#012Running in chroot, ignoring command 'reload-or-restart' 02:18:24,248 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895 02:18:24,249 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch 02:18:24,337 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895 02:18:24,338 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch 02:18:25,241 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895 02:18:25,243 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch 02:18:25,299 INFO anaconda:dnf: Running in chroot, ignoring request. 02:18:25,308 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895 02:18:25,309 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch 02:18:25,420 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895 02:18:25,421 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch 02:18:25,663 INFO systemd:Started run-r58f4d818ef4640daaf1750e584289455.service - /usr/bin/systemctl start man-db-cache-update. 02:18:25,668 NOTICE audit:SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=run-r58f4d818ef4640daaf1750e584289455 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' 02:18:25,906 INFO systemctl:Failed to start man-db-cache-update.service: Unit man-db-cache-update.service not found. 02:18:25,912 NOTICE systemd:run-r58f4d818ef4640daaf1750e584289455.service: Main process exited, code=exited, status=5/NOTINSTALLED 02:18:25,913 WARNING systemd:run-r58f4d818ef4640daaf1750e584289455.service: Failed with result 'exit-code'. 02:18:25,916 NOTICE audit:SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=run-r58f4d818ef4640daaf1750e584289455 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' >>>>> HERE THE INSTALLATION HANGS SOMETIMES 02:18:26,180 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: NetworkManager-1:1.36.0-0.9.fc36.x86_64 1643999059 0f38560ebb20f328d71d150a9122e0c0615c9bee5e1f026c2c9dc9a061debc93 02:18:26,180 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: NetworkManager-libnm-1:1.36.0-0.9.fc36.x86_64 1643999059 01a81ec2b6864ce71d3a8f89db5499c1d39be2102c47203a8b74020e67cadea6 02:18:26,181 INFO anaconda:anaconda: progress: Verifying NetworkManager.x86_64 (1/331) 02:18:26,181 INFO anaconda:anaconda: progress: Verifying NetworkManager-libnm.x86_64 (2/331) 02:18:26,182 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: acl-2.3.1-3.fc36.x86_64 1642625839 7275a39a6126e48f48578813420a7a5eb557d73bf7be0b9c42de17b9e25767b0 02:18:26,182 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: alternatives-1.19-2.fc36.x86_64 1642634100 904a986d78a659d2c8fa5b810d884c0362f345979c74e0731065e22ab8988c9d 02:18:26,182 INFO anaconda:anaconda: progress: Verifying acl.x86_64 (3/331)
Jaroslave, would you have any idea or hint what can be the reason, or how to debug the issue ?
It looks like an issue in DNF. Reassigning.
Well it is not easy to provide correct suggestion. The interruption happened when the last package is installed and scriptlets were running (post trans). They can require unpredictable time. See output - the last statement is related to systemd and some fails related to services. (suggesting close - not a bug) Recently we also have got some reports related to sudden transaction freezes but it was related to file systems. It means non of those possibilities are related to DNF and we cannot improve it. The problem appeared recently therefore if the issue is not reproducible in Fedora 34 it is likely it (suggesting close - not a bug) What can be problem in DNF is synchronization between our internal database and rpm transaction result. But it will not cause random freezes. If the problem is caused by the synchronization then we will improve it in DNF 5 (next major DNF/MICRODNF) (suggesting close - wontfix). The last option would be that there is an issue in RPM but it is unlikely (suggesting changing component to RPM) Personally I believe that it is not an issue and the freeze was related to running scriptlet or consequence of it. I am changing the component back to Anaconda for reevaluation. Please feel free to return back to DNF when you will feel that the issue is related to us with additional information.
Created attachment 1867716 [details] /tmp/packaging.log from installer /tmp/packaging.log from another instance of the issue.
I was able to reproduce the issue in kstests batch, have a few living instances running and I can provide you access to poke around if you want. In the previous comment I added one more log, not much info there but maybe can be helpful. Unfortunately, those running instances are VMs using user networking so it is not possible to ssh to the VMs. I am using virt-viewer with X tunneled from hypervisor to access the guest terminal. Or I can try to provide any further logging info or hanging process core dump if you guide me. Thank you, Radek
Created attachment 1867815 [details] backtrace from hanging anaconda module running dnf transaction Outputs of '$(gdb) thread apply bt all' and '$(gdb) thread apply py-bt all' applied on core dumps collected in the issue reproduced in kickstart-tests run.
Thank you very much for additional information. The last visible call is: File "/usr/lib/python3.10/site-packages/dnf/base.py", line 1154, in <listcomp> names = set([i.name for i in q]) But it is strange and may be the crash is related with something else like incorrect memory handling or so on. The problem is that there are multiple thinks happened on the line. There is a creation of set, iteration over dnf.query, creation package object and accessing of the name attribute. The line can be split into: names = set() for pkg in q: name = pkg.name names.add(name) May be it will say more specific where it fails. What can be wrong? Iteration over q returns not valid pkg therefore pkg.name will crash in string conversion because it returns randome memory instead string.
I think the problem is in `for i in q` when i is dereferenced (so it can hold the next item). That triggers package_dealloc, g_object_unref and g_signal_handlers_destroy and eventually stops at g_mutex_lock_slowpath. However, I am not sure. Thread 1 (Thread 0x7efeb37fe640 (LWP 2678)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007efeddb9d37c in g_mutex_lock_slowpath (mutex=0x7efeddad3110 <g.g_signal_mutex_lock.lto_priv>) at ../glib/gthread-posix.c:1493 #2 0x00007efeddb9d3c3 in g_mutex_lock (mutex=<optimized out>) at ../glib/gthread-posix.c:1517 #3 0x00007efeddaa47ac in g_signal_handlers_destroy (instance=0x7efea019ccb0) at ../gobject/gsignal.c:2776 #4 0x00007efedda93231 in g_object_real_dispose (object=0x7efea019ccb0) at ../gobject/gobject.c:1182 #5 0x00007efedda96d24 in g_object_unref (_object=<optimized out>) at ../gobject/gobject.c:3636 #6 g_object_unref (_object=0x7efea019ccb0) at ../gobject/gobject.c:3553 #7 0x00007efed344f426 in package_dealloc(_PackageObject*) () from /usr/lib64/python3.10/site-packages/hawkey/_hawkey.so #8 0x00007efeec221366 in subtype_dealloc (self=<Package at remote 0x7efed223adc0>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/typeobject.c:1458 #9 0x00007efeec1fd92a in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/object.c:2292 #10 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:500 #11 _Py_XDECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:567 #12 list_dealloc (op=0x7efeb83973c0) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/listobject.c:343 #13 0x00007efeec221016 in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/object.c:2298 #14 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:500 #15 listiter_next (it=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/listobject.c:32 ... #17 0x00007efeec209c43 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7efeb817f920, for file /usr/lib/python3.10/site-packages/dnf/base.py, line 1154, in <listcomp> ... ... #23 0x00007efeec209c43 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7efea0946930, for file /usr/lib/python3.10/site-packages/dnf/base.py, line 1154, in _verify_transaction
Reassigning to dnf based on comments 7 and 8.
Created attachment 1871006 [details] Call Trace for the update from https://bugzilla.redhat.com/show_bug.cgi?id=2054230#c7 (In reply to Jaroslav Mracek from comment #7) > The line can be split into: > > names = set() > for pkg in q: > name = pkg.name > names.add(name) > > May be it will say more specific where it fails. What can be wrong? > Iteration over q returns not valid pkg therefore pkg.name will crash in > string conversion because it returns randome memory instead string. Attaching call trace for that update of base.py 1154,1160c1154 < #names = set([i.name for i in q]) < logger.log(dnf.logging.DDEBUG, 'DDDDDEBUG2 for loop') < names = set() < for pkg in q: < name = pkg.name < names.add(name) < --- > names = set([i.name for i in q])
We are probably hitting the same problem on RHEL-9, see bug 2073510.
Created attachment 1875022 [details] list of packages used to create the installer image I can confirm that the issue is reproducible also with Fedora 35 installer. Attaching the package versions used in the F35 installer iso.
Created attachment 1877257 [details] list of packages used to create the F34 installer image I was not able to reproduce the issue with Fedora 34 installer. Attaching the package versions used in the F34 installer iso.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle. Changing version to 37.
We haven't seen the flake on daily-iso since 04/05/2023 (tracking in the issue in https://github.com/rhinstaller/kickstart-tests/issues/675).
*** Bug 2051753 has been marked as a duplicate of this bug. ***
According to Comment 15 the issue is somehow resolved by changes outside of DNF components.