Bug 2054230 - Kickstart installation hangs before Verifying: stage of dns transaction
Summary: Kickstart installation hangs before Verifying: stage of dns transaction
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dnf
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2051753 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-14 13:34 UTC by Radek Vykydal
Modified: 2023-08-25 09:28 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-25 09:28:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Log from kickstart test whith Anaconda (dnf transaction) actually hanging (after 02:53:00,518 or 02:53:01,095). (803.69 KB, text/plain)
2022-02-14 13:34 UTC, Radek Vykydal
no flags Details
backtrace from hanging anaconda module running dnf transaction (94.40 KB, text/plain)
2022-03-23 13:50 UTC, Radek Vykydal
no flags Details
list of packages used to create the F34 installer image (23.86 KB, text/plain)
2022-05-05 07:19 UTC, Radek Vykydal
no flags Details

Description Radek Vykydal 2022-02-14 13:34:45 UTC
Created attachment 1860984 [details]
Log from kickstart test whith Anaconda (dnf transaction) actually hanging (after 02:53:00,518 or 02:53:01,095).

In our daily Anaconda kickstart tests we are seeing a lot of flakes when the installation hangs just before Verifying stage of the transaction installing the system in chroot:

02:18:24,179 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch
02:18:24,240 INFO anaconda:dnf: Running in chroot, ignoring command 'daemon-reload'#012Running in chroot, ignoring command 'reload-or-restart'
02:18:24,248 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895
02:18:24,249 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch
02:18:24,337 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895
02:18:24,338 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch
02:18:25,241 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895
02:18:25,243 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch
02:18:25,299 INFO anaconda:dnf: Running in chroot, ignoring request.
02:18:25,308 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895
02:18:25,309 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch
02:18:25,420 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Configuring (running scriptlet for): rootfiles-8.1-31.fc36.noarch 1642789586 c2c9054a57bbf8937e5e99b5f61127c2c76b6042a688f53caf37fd31a5cd7895
02:18:25,421 INFO anaconda:anaconda: progress: Configuring rootfiles.noarch
02:18:25,663 INFO systemd:Started run-r58f4d818ef4640daaf1750e584289455.service - /usr/bin/systemctl start man-db-cache-update.
02:18:25,668 NOTICE audit:SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=run-r58f4d818ef4640daaf1750e584289455 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
02:18:25,906 INFO systemctl:Failed to start man-db-cache-update.service: Unit man-db-cache-update.service not found.
02:18:25,912 NOTICE systemd:run-r58f4d818ef4640daaf1750e584289455.service: Main process exited, code=exited, status=5/NOTINSTALLED
02:18:25,913 WARNING systemd:run-r58f4d818ef4640daaf1750e584289455.service: Failed with result 'exit-code'.
02:18:25,916 NOTICE audit:SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=run-r58f4d818ef4640daaf1750e584289455 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

>>>>> HERE THE INSTALLATION HANGS SOMETIMES

02:18:26,180 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: NetworkManager-1:1.36.0-0.9.fc36.x86_64 1643999059 0f38560ebb20f328d71d150a9122e0c0615c9bee5e1f026c2c9dc9a061debc93
02:18:26,180 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: NetworkManager-libnm-1:1.36.0-0.9.fc36.x86_64 1643999059 01a81ec2b6864ce71d3a8f89db5499c1d39be2102c47203a8b74020e67cadea6
02:18:26,181 INFO anaconda:anaconda: progress: Verifying NetworkManager.x86_64 (1/331)
02:18:26,181 INFO anaconda:anaconda: progress: Verifying NetworkManager-libnm.x86_64 (2/331)
02:18:26,182 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: acl-2.3.1-3.fc36.x86_64 1642625839 7275a39a6126e48f48578813420a7a5eb557d73bf7be0b9c42de17b9e25767b0
02:18:26,182 INFO anaconda:anaconda: modules.payloads.payload.dnf.transaction_progress: Verifying: alternatives-1.19-2.fc36.x86_64 1642634100 904a986d78a659d2c8fa5b810d884c0362f345979c74e0731065e22ab8988c9d
02:18:26,182 INFO anaconda:anaconda: progress: Verifying acl.x86_64 (3/331)

Comment 1 Radek Vykydal 2022-02-14 13:44:57 UTC
Jaroslave, would you have any idea or hint what can be the reason, or how to debug the issue ?

Comment 2 Vendula Poncova 2022-02-22 14:14:45 UTC
It looks like an issue in DNF. Reassigning.

Comment 3 Jaroslav Mracek 2022-03-08 09:02:18 UTC
Well it is not easy to provide correct suggestion. The interruption happened when the last package is installed and scriptlets were running (post trans). They can require unpredictable time. See output - the last statement is related to systemd and some fails related to services. (suggesting close - not a bug)

Recently we also have got some reports related to sudden transaction freezes but it was related to file systems. It means non of those possibilities are related to DNF and we cannot improve it. The problem appeared recently therefore if the issue is not reproducible in Fedora 34 it is likely it (suggesting close - not a bug)

What can be problem in DNF is synchronization between our internal database and rpm transaction result. But it will not cause random freezes. If the problem is caused by the synchronization then we will improve it in DNF 5 (next major DNF/MICRODNF) (suggesting close - wontfix).

The last option would be that there is an issue in RPM but it is unlikely (suggesting changing component to RPM)

Personally I believe that it is not an issue and the freeze was related to running scriptlet or consequence of it.

I am changing the component back to Anaconda for reevaluation. Please feel free to return back to DNF when you will feel that the issue is related to us with additional information.

Comment 4 Radek Vykydal 2022-03-23 08:08:46 UTC
Created attachment 1867716 [details]
/tmp/packaging.log from installer

/tmp/packaging.log from another instance of the issue.

Comment 5 Radek Vykydal 2022-03-23 08:18:39 UTC
I was able to reproduce the issue in kstests batch, have a few living instances running and I can provide you access to poke around if you want.
In the previous comment I added one more log, not much info there but maybe can be helpful.

Unfortunately, those running instances are VMs using user networking so it is not possible to ssh to the VMs. I am using virt-viewer with X tunneled from hypervisor to access the guest terminal.

Or I can try to provide any further logging info or hanging process core dump if you guide me.

Thank you, Radek

Comment 6 Radek Vykydal 2022-03-23 13:50:55 UTC
Created attachment 1867815 [details]
backtrace from hanging anaconda module running dnf transaction

Outputs of '$(gdb) thread apply bt all' and '$(gdb) thread apply py-bt all' applied on core dumps collected in the issue reproduced in kickstart-tests run.

Comment 7 Jaroslav Mracek 2022-03-29 14:08:47 UTC
Thank you very much for additional information. The last visible call is:

  File "/usr/lib/python3.10/site-packages/dnf/base.py", line 1154, in <listcomp>
    names = set([i.name for i in q])

But it is strange and may be the crash is related with something else like incorrect memory handling or so on. The problem is that there are multiple thinks happened on the line. There is a creation of set, iteration over dnf.query, creation package object and accessing of the name attribute.

The line can be split into:

names = set()
for pkg in q:
    name = pkg.name
    names.add(name)

May be it will say more specific where it fails. What can be wrong? Iteration over q returns not valid pkg therefore pkg.name will crash in string conversion because it returns randome memory instead string.

Comment 8 Vendula Poncova 2022-03-29 18:44:53 UTC
I think the problem is in `for i in q` when i is dereferenced (so it can hold the next item). That triggers package_dealloc, g_object_unref and g_signal_handlers_destroy and eventually stops at g_mutex_lock_slowpath. However, I am not sure.

Thread 1 (Thread 0x7efeb37fe640 (LWP 2678)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007efeddb9d37c in g_mutex_lock_slowpath (mutex=0x7efeddad3110 <g.g_signal_mutex_lock.lto_priv>) at ../glib/gthread-posix.c:1493
#2  0x00007efeddb9d3c3 in g_mutex_lock (mutex=<optimized out>) at ../glib/gthread-posix.c:1517
#3  0x00007efeddaa47ac in g_signal_handlers_destroy (instance=0x7efea019ccb0) at ../gobject/gsignal.c:2776
#4  0x00007efedda93231 in g_object_real_dispose (object=0x7efea019ccb0) at ../gobject/gobject.c:1182
#5  0x00007efedda96d24 in g_object_unref (_object=<optimized out>) at ../gobject/gobject.c:3636
#6  g_object_unref (_object=0x7efea019ccb0) at ../gobject/gobject.c:3553
#7  0x00007efed344f426 in package_dealloc(_PackageObject*) () from /usr/lib64/python3.10/site-packages/hawkey/_hawkey.so
#8  0x00007efeec221366 in subtype_dealloc (self=<Package at remote 0x7efed223adc0>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/typeobject.c:1458
#9  0x00007efeec1fd92a in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/object.c:2292
#10 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:500
#11 _Py_XDECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:567
#12 list_dealloc (op=0x7efeb83973c0) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/listobject.c:343
#13 0x00007efeec221016 in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/object.c:2298
#14 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Include/object.h:500
#15 listiter_next (it=<optimized out>) at /usr/src/debug/python3.10-3.10.3-1.fc37.x86_64/Objects/listobject.c:32
...
#17 0x00007efeec209c43 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7efeb817f920, for file /usr/lib/python3.10/site-packages/dnf/base.py, line 1154, in <listcomp> ...
...
#23 0x00007efeec209c43 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7efea0946930, for file /usr/lib/python3.10/site-packages/dnf/base.py, line 1154, in _verify_transaction

Comment 9 Vendula Poncova 2022-04-05 13:15:46 UTC
Reassigning to dnf based on comments 7 and 8.

Comment 10 Radek Vykydal 2022-04-06 08:05:14 UTC
Created attachment 1871006 [details]
Call Trace for the update from https://bugzilla.redhat.com/show_bug.cgi?id=2054230#c7

(In reply to Jaroslav Mracek from comment #7)
 
> The line can be split into:
> 
> names = set()
> for pkg in q:
>     name = pkg.name
>     names.add(name)
> 
> May be it will say more specific where it fails. What can be wrong?
> Iteration over q returns not valid pkg therefore pkg.name will crash in
> string conversion because it returns randome memory instead string.


Attaching call trace for that update of base.py

1154,1160c1154
<         #names = set([i.name for i in q])
<         logger.log(dnf.logging.DDEBUG, 'DDDDDEBUG2 for loop')
<         names = set()
<         for pkg in q:
<             name = pkg.name
<             names.add(name)
< 
---
>         names = set([i.name for i in q])

Comment 11 Jan Stodola 2022-04-08 16:15:58 UTC
We are probably hitting the same problem on RHEL-9, see bug 2073510.

Comment 12 Radek Vykydal 2022-04-26 08:18:50 UTC
Created attachment 1875022 [details]
list of packages used to create the installer image

I can confirm that the issue is reproducible also with Fedora 35 installer. Attaching the package versions used in the F35 installer iso.

Comment 13 Radek Vykydal 2022-05-05 07:19:05 UTC
Created attachment 1877257 [details]
list of packages used to create the F34 installer image

I was not able to reproduce the issue with Fedora 34 installer. Attaching the package versions used in the F34 installer iso.

Comment 14 Ben Cotton 2022-08-09 13:13:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle.
Changing version to 37.

Comment 15 Radek Vykydal 2023-07-18 07:33:21 UTC
We haven't seen the flake on daily-iso since 04/05/2023 (tracking in the issue in https://github.com/rhinstaller/kickstart-tests/issues/675).

Comment 16 Jaroslav Mracek 2023-08-25 09:27:21 UTC
*** Bug 2051753 has been marked as a duplicate of this bug. ***

Comment 17 Jaroslav Mracek 2023-08-25 09:28:59 UTC
According to Comment 15 the issue is somehow resolved by changes outside of DNF components.


Note You need to log in before you can comment on or make changes to this bug.