RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2127057 - Anaconda crashes with Segmentation fault after updating cloned connection values (in #012#6 0x00007f05b8178e09 nm_connection_to_dbus (libnm.so.0 + 0xe4e09))
Summary: Anaconda crashes with Segmentation fault after updating cloned connection val...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: anaconda
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Radek Vykydal
QA Contact: Release Test Team
URL:
Whiteboard:
: 2050674 2152786 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-15 08:50 UTC by Radek Vykydal
Modified: 2023-05-16 08:51 UTC (History)
15 users (show)

Fixed In Version: anaconda-33.16.8.3-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1931389
Environment:
Last Closed: 2023-05-16 08:15:29 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-134026 0 None None None 2022-09-15 08:57:38 UTC
Red Hat Issue Tracker RTT-4898 0 None None None 2022-10-14 16:45:19 UTC
Red Hat Issue Tracker RTT-4899 0 None None None 2022-10-14 16:45:23 UTC
Red Hat Product Errata RHBA-2023:2748 0 None None None 2023-05-16 08:15:40 UTC

Description Radek Vykydal 2022-09-15 08:50:53 UTC
+++ This bug was initially created as a clone of Bug #1931389 +++

Created attachment 1758588 [details]
virt-install.log with backtrace from network-device-mac test of rawhide scenarion from https://github.com/rhinstaller/kickstart-tests/actions/runs/587393076

In our nightly kickstart tests we are seeing crashes (segmentation fault) after cloning of a connection via libnm API when reading its values. The backtrace is attached in the log, the head of the backtrace:

23:37:05,585 CRIT systemd-coredump:Process 1649 (python3) of user 0 dumped core.#012#012Stack trace of thread 1774:#012#0  0x00007f05c9a58292 raise (libc.so.6 + 0x3d292)#012#1  0x00007f05c9a58310 __restore_rt (libc.so.6 + 0x3d310)#012#2  0x00007f05bb719711 g_type_check_instance_is_fundamentally_a (libgobject-2.0.so.0 + 0x36711)#012#3  0x00007f05bb7088b5 g_object_get_property (libgobject-2.0.so.0 + 0x258b5)#012#4  0x00007f05b8178a6a property_to_dbus.constprop.0 (libnm.so.0 + 0xe4a6a)#012#5  0x00007f05b8178c97 _nm_setting_to_dbus.constprop.0 (libnm.so.0 + 0xe4c97)#012#6  0x00007f05b8178e09 nm_connection_to_dbus (libnm.so.0 + 0xe4e09)#012#7  0x00007f05bb6dec04 ffi_call_unix64 (libffi.so.6 + 0x6c04)#012#8  0x00007f05bb6de107 ffi_call (libffi.so.6 + 0x6107)#012#9  0x00007f05bb8c6e1b pygi_invoke_c_callable (_gi.cpython-39-x86_64-linux-gnu.so + 0x2de1b) ...

The first occurrence I noticed is from nightly test of 2021-02-17/18.
Attaching log from 2021-02-21 (yesterday).
The source of isos for nightly tests:
https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Server/x86_64/os/images/boot.iso

Seems like a race condition, happening in ~ 1 / 60 (or maybe 30) tests by very rough estimate trying to guess only kickstart tests touching the code.

The code in anaconda that is referred in the crash in the log:
https://github.com/rhinstaller/anaconda/blob/67dfc2dffcebf31dc375c815f84963f1d5ee45bb/pyanaconda/modules/network/nm_client.py#L719

--- Additional comment from Radek Vykydal on 2021-02-22 10:49:51 UTC ---



--- Additional comment from Radek Vykydal on 2021-02-22 14:53:26 UTC ---

Hello Thomas, asking for info to make sure I am not missing some known issue, or using libnm with threads in an obviously wrong way. If there is nothing apparent I'll try to come with an isolated reproducer (the issue seems to be race condition which is rather hard to hit).

--- Additional comment from Thomas Haller on 2021-02-22 15:25:21 UTC ---

> Hello Thomas, asking for info to make sure I am not missing some known issue, or using libnm with threads in an obviously wrong way.

Is anaconda multi threaded? (aside of course the GDBus worker thread and other non-glib threads).

libnm in general is not thread-safe -- in the sense, that multiple thread can access the same libnm object *instance*. You would need to add locking for that.
However, multiple thread can create distinct NMConnection, NMSetting, NMClient instances, if you take care to only access them from one thread.

With NMConnection and NMSetting instances, they are mostly simple data objects with some API. Meaning, they don't interact with a GMainContext.

But NMClient is the cache of D-Bus Objects. It registers timeouts and D-Bus signals, and thus events will happen by enqueuing them in the GMainContext instance that is associated with the NMClient instance. Those events will be dispatched, when you iterate that GMainContext. Meaning, you can only access an NMClient instance if you have ownership of the GMainContext, eg. by iterating it.

Before libnm 1.22.0, only g_main_context_get_default() could be used by NMClient. That means, you could not have NMClient instances associated with another GMainContext. Commonly you iterate g_main_context_get_default() on one thread (although theoretically you could pass it between threads), and that means, you cannot really have an NMClient instance on another thread because access to it would always go through g_main_context_get_default().

Since 1.22.0, the NMClient instance is associate with the g_main_context_get_thread_default() instance from the moment when you create it. That means, you can have a worker thread, create a new GMainContext, and use a NMClient with that context. If you do that, beware of leaks when you stop iterating the GMainContext (which is tricky, I could explain...).





Hm. I feel like I didn't answer your question. What was your question again? No, there are no known issues with multi threadding, if you use libnm-1.22 or newer and if you do it "right". But why would that matter for anaconda?

--- Additional comment from Radek Vykydal on 2021-02-22 16:15:10 UTC ---

Relevant part from the log from the description made more readable:

23:37:04,882 DEBUG anaconda:anaconda: network: Initialization started.
23:37:04,886 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.network:Dumping configuration state - Initialization started.
23:37:04,888 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.network:{'connection': {'autoconnect-retries': <1>, 'id': <'ksdev0'>, 'interface-name': <'ksdev0'>, 'multi-connect': <1>, 'permissions': <@as []>, 'timestamp': <uint64 1613950601>, 'type': <'802-3-ethernet'>, 'uuid': <'85295384-46da-40dd-97f5-8f05223179b3'>, 'wait-device-timeout': <60000>}, '802-3-ethernet': {'auto-negotiate': <false>, 'mac-address-blacklist': <@as []>, 's390-options': <@a{ss} {}>}, 'ipv4': {'address-data': <@aa{sv} []>, 'dhcp-timeout': <90>, 'dhcp-vendor-class-identifier': <'anaconda-Linux'>, 'dns': <@au []>, 'dns-search': <@as []>, 'may-fail': <false>, 'method': <'auto'>, 'route-data': <@aa{sv} []>}, 'ipv6': {'addr-gen-mode': <0>, 'address-data': <@aa{sv} []>, 'dhcp-timeout': <90>, 'dns': <@aay []>, 'dns-search': <@as []>, 'method': <'auto'>, 'route-data': <@aa{sv} []>}, 'proxy': {}}
23:37:04,893 DEBUG anaconda:anaconda: network: Devices found: ['ksdev0']
23:37:04,898 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:dasbus.connection:Publishing an object at /org/fedoraproject/Anaconda/Modules/Network/Task/1.
23:37:04,903 DEBUG anaconda:anaconda: network: Running task Apply kickstart
23:37:04,905 WARNING org.fedoraproject.Anaconda.Modules.Network:INFO:anaconda.threading:Running Thread: AnaTaskThread-ApplyKickstartTask-1 (139662525113920)
23:37:04,907 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.nm_client:kickstart specification --device=52:54:00:12:34:56 -> ksdev0 (existing device found)
23:37:04,907 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.initialization:Apply kickstart: updating connection 85295384-46da-40dd-97f5-8f05223179b3 of device ksdev0
23:37:04,907 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.nm_client:updating connection 85295384-46da-40dd-97f5-8f05223179b3:
23:37:04,907 WARNING org.fedoraproject.Anaconda.Modules.Network:{'connection': {'autoconnect-retries': <1>, 'id': <'ksdev0'>, 'interface-name': <'ksdev0'>, 'multi-connect': <1>, 'permissions': <@as []>, 'timestamp': <uint64 1613950601>, 'type': <'802-3-ethernet'>, 'uuid': <'85295384-46da-40dd-97f5-8f05223179b3'>, 'wait-device-timeout': <60000>}, '802-3-ethernet': {'auto-negotiate': <false>, 'mac-address-blacklist': <@as []>, 's390-options': <@a{ss} {}>}, 'ipv4': {'address-data': <@aa{sv} []>, 'dhcp-timeout': <90>, 'dhcp-vendor-class-identifier': <'anaconda-Linux'>, 'dns': <@au []>, 'dns-search': <@as []>, 'may-fail': <false>, 'method': <'auto'>, 'route-data': <@aa{sv} []>}, 'ipv6': {'addr-gen-mode': <0>, 'address-data': <@aa{sv} []>, 'dhcp-timeout': <90>, 'dns': <@aay []>, 'dns-search': <@as []>, 'method': <'auto'>, 'route-data': <@aa{sv} []>}, 'proxy': {}}
23:37:04,910 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.nm_client:update connection: mac 52:54:00:12:34:56 is bound to name ksdev0
23:37:04,910 WARNING org.fedoraproject.Anaconda.Modules.Network:DEBUG:anaconda.modules.network.nm_client:Bind to mac: bound to 52:54:00:12:34:56
23:37:04,912 WARNING org.fedoraproject.Anaconda.Modules.Network:WARNING:py.warnings:/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py:1016: Warning: g_main_context_push_thread_default: assertion 'acquired_context' failed
23:37:04,912 WARNING org.fedoraproject.Anaconda.Modules.Network:  connection.update2(
23:37:04,914 DEBUG NetworkManager:<debug> [1613950624.9149] create NMAuditManager singleton (96855c95cb3f38e9)
23:37:04,917 DEBUG NetworkManager:<debug> [1613950624.9178] ++ connection 'update connection' (0x55818099ae60/NMSimpleConnection/"802-3-ethernet" < 0x5581809f0b00/NMSimpleConnection/"802-3-ethernet") [/org/freedesktop/NetworkManager/Settings/1]:
23:37:04,918 DEBUG NetworkManager:<debug> [1613950624.9183] ++ connection                [ 0x7f7398006ba0 < 0x7f7394003de0 ]
23:37:04,918 DEBUG NetworkManager:<debug> [1613950624.9184] ++ connection.timestamp      = 1613950601
23:37:04,918 DEBUG NetworkManager:<debug> [1613950624.9184] ++ 802-3-ethernet            [ 0x7f73980021f0 < 0x55818098e150 ]
23:37:04,918 WARNING org.fedoraproject.Anaconda.Modules.Network:WARNING:py.warnings:/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py:1016: Warning: g_main_context_pop_thread_default: assertion 'stack != NULL' failed
23:37:04,918 WARNING org.fedoraproject.Anaconda.Modules.Network:  connection.update2(
23:37:04,918 DEBUG NetworkManager:<debug> [1613950624.9188] ++ 802-3-ethernet.mac-address = '52:54:00:12:34:56'
23:37:04,918 DEBUG NetworkManager:<debug> [1613950624.9189] ++ ipv4                      [ 0x558180a47b20 < 0x5581809f88d0 ]
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9189] ++ ipv4.dhcp-timeout         < 90
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9190] ++ ipv4.dhcp-vendor-class-identifier < 'anaconda-Linux'
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9191] ++ ipv4.may-fail             < FALSE
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9192] ++ ipv6                      [ 0x558180a47bf0 < 0x5581809f89a0 ]
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9193] ++ ipv6.addresses            = ((GPtrArray*) 0x55818097a560) < ((GPtrArray*) 0x7f7394001da0)
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9194] ++ ipv6.method               = 'manual' < 'auto'
23:37:04,919 DEBUG NetworkManager:<debug> [1613950624.9196] ++ ipv6.dhcp-timeout         < 90
23:37:04,921 DEBUG NetworkManager:<debug> [1613950624.9211] Saving secrets for connection /org/freedesktop/NetworkManager/Settings/1 (ksdev0)
23:37:04,922 INFO NetworkManager:<info>  [1613950624.9223] audit: op="connection-update" uuid="85295384-46da-40dd-97f5-8f05223179b3" name="ksdev0" args="connection.timestamp,ipv4.dhcp-vendor-class-identifier,ipv4.may-fail,ipv4.dhcp-timeout,ipv6.addresses,ipv6.dhcp-timeout,ipv6.method,802-3-ethernet.mac-address" pid=1649 uid=0 result="success"
23:37:04,923 WARNING org.fedoraproject.Anaconda.Modules.Network:Fatal Python error: Segmentation fault
23:37:04,923 WARNING org.fedoraproject.Anaconda.Modules.Network:Current thread 0x00007f05b732a640 (most recent call first):
23:37:04,926 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py", line 719 in update_connection_from_ksdata
23:37:04,928 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/initialization.py", line 109 in run
23:37:04,929 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/utils.py", line 130 in wrapped
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/common/task/task.py", line 109 in _task_run_callback
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/common/task/task.py", line 96 in _thread_run_callback
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/threading.py", line 892 in run
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/threading.py", line 275 in run
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/threading.py", line 954 in _bootstrap_inner
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/threading.py", line 912 in _bootstrap
23:37:04,930 WARNING org.fedoraproject.Anaconda.Modules.Network:Thread 0x00007f05c98a5740 (most recent call first):
23:37:04,931 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib/python3.9/site-packages/gi/overrides/GLib.py", line 497 in run
23:37:04,931 NOTICE audit:ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:kernel_t:s0 pid=1649 comm="python3" exe="/usr/bin/python3.9" sig=11 res=1
23:37:04,931 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/common/base/base.py", line 92 in run
23:37:04,931 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/network.py", line 105 in run
23:37:04,931 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/__main__.py", line 25 in <module>
23:37:04,931 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/runpy.py", line 87 in _run_code
23:37:04,931 WARNING org.fedoraproject.Anaconda.Modules.Network:  File "/usr/lib64/python3.9/runpy.py", line 197 in _run_module_as_main
23:37:04,959 INFO systemd:Created slice system-systemd\x2dcoredump.slice.
23:37:04,964 NOTICE audit:BPF prog-id=54 op=LOAD
23:37:04,965 NOTICE audit:BPF prog-id=55 op=LOAD
23:37:04,965 NOTICE audit:BPF prog-id=56 op=LOAD
23:37:04,976 INFO systemd:Started Process Core Dump (PID 1775/UID 0).
23:37:04,977 NOTICE audit:SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=systemd-coredump@0-1775-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
23:37:05,585 CRIT systemd-coredump:Process 1649 (python3) of user 0 dumped core.#012#012Stack trace of thread 1774:
#012#0  0x00007f05c9a58292 raise (libc.so.6 + 0x3d292)
#012#1  0x00007f05c9a58310 __restore_rt (libc.so.6 + 0x3d310)
#012#2  0x00007f05bb719711 g_type_check_instance_is_fundamentally_a (libgobject-2.0.so.0 + 0x36711)
#012#3  0x00007f05bb7088b5 g_object_get_property (libgobject-2.0.so.0 + 0x258b5)
#012#4  0x00007f05b8178a6a property_to_dbus.constprop.0 (libnm.so.0 + 0xe4a6a)
#012#5  0x00007f05b8178c97 _nm_setting_to_dbus.constprop.0 (libnm.so.0 + 0xe4c97)
#012#6  0x00007f05b8178e09 nm_connection_to_dbus (libnm.so.0 + 0xe4e09)
#012#7  0x00007f05bb6dec04 ffi_call_unix64 (libffi.so.6 + 0x6c04)
#012#8  0x00007f05bb6de107 ffi_call (libffi.so.6 + 0x6107)
#012#9  0x00007f05bb8c6e1b pygi_invoke_c_callable (_gi.cpython-39-x86_64-linux-gnu.so + 0x2de1b)
#012#10 0x00007f05bb8c58d6 _wrap_g_callable_info_invoke (_gi.cpython-39-x86_64-linux-gnu.so + 0x2c8d6)
#012#11 0x00007f05bb8ba60e _callable_info_call (_gi.cpython-39-x86_64-linux-gnu.so + 0x2160e)
#012#12 0x00007f05c9cfb9a7 _PyObject_MakeTpCall (libpython3.9.so.1.0 + 0x1119a7)
#012#13 0x00007f05c9cf88f6 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10e8f6)
#012#14 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#15 0x00007f05c9cfffde _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x115fde)
#012#16 0x00007f05c9cf460c _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10a60c)
#012#17 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#18 0x00007f05c9cf64ad _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10c4ad)
#012#19 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#20 0x00007f05c9cfffde _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x115fde)
#012#21 0x00007f05c9cf388d _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10988d)
#012#22 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#23 0x00007f05c9cf388d _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10988d)
#012#24 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#25 0x00007f05c9d08522 method_vectorcall (libpython3.9.so.1.0 + 0x11e522)
#012#26 0x00007f05c9cf64ad _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10c4ad)
#012#27 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#28 0x00007f05c9cf8315 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10e315)
#012#29 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#30 0x00007f05c9cf388d _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10988d)
#012#31 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#32 0x00007f05c9cf388d _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10988d)
#012#33 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#34 0x00007f05c9d08522 method_vectorcall (libpython3.9.so.1.0 + 0x11e522)
#012#35 0x00007f05c9db790a t_bootstrap (libpython3.9.so.1.0 + 0x1cd90a)
#012#36 0x00007f05c9db7578 pythread_wrapper (libpython3.9.so.1.0 + 0x1cd578)
#012#37 0x00007f05c9a031c9 start_thread (libpthread.so.0 + 0x91c9)
#012#38 0x00007f05c9b1b473 __clone (libc.so.6 + 0x100473)
#012#012Stack trace of thread 1649:
#012#0  0x00007f05c9b107cf __poll (libc.so.6 + 0xf57cf)
#012#1  0x00007f05bb80bc36 g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xa8c36)
#012#2  0x00007f05bb7b77e3 g_main_loop_run (libglib-2.0.so.0 + 0x547e3)
#012#3  0x00007f05bb6dec04 ffi_call_unix64 (libffi.so.6 + 0x6c04)
#012#4  0x00007f05bb6de107 ffi_call (libffi.so.6 + 0x6107)
#012#5  0x00007f05bb8c6e1b pygi_invoke_c_callable (_gi.cpython-39-x86_64-linux-gnu.so + 0x2de1b)
#012#6  0x00007f05bb8c58d6 _wrap_g_callable_info_invoke (_gi.cpython-39-x86_64-linux-gnu.so + 0x2c8d6)
#012#7  0x00007f05bb8ba60e _callable_info_call (_gi.cpython-39-x86_64-linux-gnu.so + 0x2160e)
#012#8  0x00007f05c9cfb9a7 _PyObject_MakeTpCall (libpython3.9.so.1.0 + 0x1119a7)
#012#9  0x00007f05c9cf88f6 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10e8f6)
#012#10 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#11 0x00007f05c9cfffde _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x115fde)
#012#12 0x00007f05c9cf388d _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10988d)
#012#13 0x00007f05c9d00303 function_code_fastcall (libpython3.9.so.1.0 + 0x116303)
#012#14 0x00007f05c9d08481 method_vectorcall (libpython3.9.so.1.0 + 0x11e481)
#012#15 0x00007f05c9cf8315 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10e315)
#012#16 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#17 0x00007f05c9cfffde _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x115fde)
#012#18 0x00007f05c9cf388d _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10988d)
#012#19 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#20 0x00007f05c9d6ecf5 _PyEval_EvalCodeWithName (libpython3.9.so.1.0 + 0x184cf5)
#012#21 0x00007f05c9d6ec8d PyEval_EvalCodeEx (libpython3.9.so.1.0 + 0x184c8d)
#012#22 0x00007f05c9d6ec3f PyEval_EvalCode (libpython3.9.so.1.0 + 0x184c3f)
#012#23 0x00007f05c9d74823 builtin_exec (libpython3.9.so.1.0 + 0x18a823)
#012#24 0x00007f05c9d005ec cfunction_vectorcall_FASTCALL (libpython3.9.so.1.0 + 0x1165ec)
#012#25 0x00007f05c9cf3620 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x109620)
#012#26 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#27 0x00007f05c9cfffde _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x115fde)
#012#28 0x00007f05c9cf3620 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x109620)
#012#29 0x00007f05c9cf229d _PyEval_EvalCode (libpython3.9.so.1.0 + 0x10829d)
#012#30 0x00007f05c9cfffde _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x115fde)
#012#31 0x00007f05c9d8f2f9 pymain_run_module (libpython3.9.so.1.0 + 0x1a52f9)
#012#32 0x00007f05c9d8ece7 Py_RunMain (libpython3.9.so.1.0 + 0x1a4ce7)
#012#33 0x00007f05c9d6170d Py_BytesMain (libpython3.9.so.1.0 + 0x17770d)
#012#34 0x00007f05c9a42b75 __libc_start_main (libc.so.6 + 0x27b75)
#012#35 0x000055d6ad53309e _start (python3.9 + 0x109e)
#012#012Stack trace of thread 1673:
#012#0  0x00007f05c9b107cf __poll (libc.so.6 + 0xf57cf)
#012#1  0x00007f05bb80bc36 g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xa8c36)
#012#2  0x00007f05bb7b59f3 g_main_context_iteration (libglib-2.0.so.0 + 0x529f3)
#012#3  0x00007f05bb7b5a41 glib_worker_main (libglib-2.0.so.0 + 0x52a41)
#012#4  0x00007f05bb7e68c2 g_thread_proxy (libglib-2.0.so.0 + 0x838c2)
#012#5  0x00007f05c9a031c9 start_thread (libpthread.so.0 + 0x91c9)
#012#6  0x00007f05c9b1b473 __clone (libc.so.6 + 0x100473)
#012#012Stack trace of thread 1676:
#012#0  0x00007f05c9b107cf __poll (libc.so.6 + 0xf57cf)
#012#1  0x00007f05bb80bc36 g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xa8c36)
#012#2  0x00007f05bb7b77e3 g_main_loop_run (libglib-2.0.so.0 + 0x547e3)
#012#3  0x00007f05bb59ac7a gdbus_shared_thread_func.lto_priv.0 (libgio-2.0.so.0 + 0x10fc7a)
#012#4  0x00007f05bb7e68c2 g_thread_proxy (libglib-2.0.so.0 + 0x838c2)
#012#5  0x00007f05c9a031c9 start_thread (libpthread.so.0 + 0x91c9)
#012#6  0x00007f05c9b1b473 __clone (libc.so.6 + 0x100473)
#012#012Stack trace of thread 1700:
#012#0  0x00007f05c9b15f2d syscall (libc.so.6 + 0xfaf2d)
#012#1  0x00007f05bb8061ac g_cond_wait_until (libglib-2.0.so.0 + 0xa31ac)
#012#2  0x00007f05bb7882e1 g_async_queue_pop_intern_unlocked (libglib-2.0.so.0 + 0x252e1)
#012#3  0x00007f05bb788466 g_async_queue_timeout_pop (libglib-2.0.so.0 + 0x25466)
#012#4  0x00007f05bb7e9539 g_thread_pool_thread_proxy.lto_priv.0 (libglib-2.0.so.0 + 0x86539)
#012#5  0x00007f05bb7e68c2 g_thread_proxy (libglib-2.0.so.0 + 0x838c2)
#012#6  0x00007f05c9a031c9 start_thread (libpthread.so.0 + 0x91c9)
#012#7  0x00007f05c9b1b473 __clone (libc.so.6 + 0x100473)
23:37:05,608 WARNING org.fedoraproject.Anaconda.Modules.Network:/usr/libexec/anaconda/start-module: line 41:  1649 Segmentation fault      (core dumped) python3 -m $1
23:37:05,611 INFO systemd:systemd-coredump: Succeeded.

--- Additional comment from Thomas Haller on 2021-02-22 16:24:16 UTC ---

it's not clear to me.

It seems thread 1649 is running the "main" thread, while thread 1774 crashes accessing a NMConnection instance.


-- first of all, in libnm `NMConnection` is an interface, implemented by the glib classes `NMSimpleConnection` and `NMRemoteConnection`. The latter are the instances that live inside the NMClient cache (`nm_client_get_connections()`).


I would assume that there is one NMClient instance associated with thread 1649. Is that the case?
What is the `NMConnection` instance involved in the crash? If it is actually an `NMRemoteConnection` from the `NMClient` cache, then you cannot access it outside of the main thread (well... technically you could, if you acquire exclusive access to the GMainContext -- which of course would be dangerous to deadlock, unless you do it really right). 

Where is the python code that runs on that other thread?

--- Additional comment from Radek Vykydal on 2021-02-23 10:51:33 UTC ---

(In reply to Thomas Haller from comment #5)

Thank you for help Thomas. Looks like the problem is sharing the NMClient in main thread and task thread? 
(The crashing process is not Anaconda but network module run separately and accessed via DBus from Anaconda.)

> it's not clear to me.
> 
> It seems thread 1649 is running the "main" thread, while thread 1774 crashes
> accessing a NMConnection instance.
> 
> 
> -- first of all, in libnm `NMConnection` is an interface, implemented by the
> glib classes `NMSimpleConnection` and `NMRemoteConnection`. The latter are
> the instances that live inside the NMClient cache
> (`nm_client_get_connections()`).
> 

I think we are using NMRemoteConnection here.

> 
> I would assume that there is one NMClient instance associated with thread
> 1649. Is that the case?

We are passing the client instance created in the main thread to the task calling it in a thread.
(We are running the task threads serially in the module, but it is possible that the NMClient is accessed from main thread
by another asynchronous DBus call.)

> What is the `NMConnection` instance involved in the crash? If it is actually
> an `NMRemoteConnection` from the `NMClient` cache, then you cannot access it
> outside of the main thread (well... technically you could, if you acquire
> exclusive access to the GMainContext -- which of course would be dangerous
> to deadlock, unless you do it really right). 
> 
> Where is the python code that runs on that other thread?

(In reply to Radek Vykydal from comment #0)
 
> The code in anaconda that is referred in the crash in the log:
> https://github.com/rhinstaller/anaconda/blob/67dfc2dffcebf31dc375c815f84963f1d5ee45bb/pyanaconda/modules/network/nm_client.py#L719

The caller (update_connection_from_ksdata) is run from a task in its thread:
https://github.com/rhinstaller/anaconda/blob/67dfc2dffcebf31dc375c815f84963f1d5ee45bb/pyanaconda/modules/network/initialization.py#L109

A task:
https://github.com/rhinstaller/anaconda/blob/67dfc2dffcebf31dc375c815f84963f1d5ee45bb/pyanaconda/modules/common/task/task.py#L62

Also the warnings from the logs seem to be suggesting we are doing something wrong ?
23:37:04,912 WARNING org.fedoraproject.Anaconda.Modules.Network:WARNING:py.warnings:/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py:1016: Warning: g_main_context_push_thread_default: assertion 'acquired_context' failed
...
23:37:04,918 WARNING org.fedoraproject.Anaconda.Modules.Network:WARNING:py.warnings:/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py:1016: Warning: g_main_context_pop_thread_default: assertion 'stack != NULL' failed

The line 1016: https://github.com/rhinstaller/anaconda/blob/67dfc2dffcebf31dc375c815f84963f1d5ee45bb/pyanaconda/modules/network/nm_client.py#L1016

--- Additional comment from Thomas Haller on 2021-02-23 21:21:12 UTC ---

> We are passing the client instance created in the main thread to the task calling it in a thread.
(We are running the task threads serially in the module, but it is possible that the NMClient is accessed from main thread
by another asynchronous DBus call.)

The NMClient instance is strongly associate with the GMainContext (g_main_context_get_thread_default()) of the moment when you created it.
There is also nm_client_get_main_context().

As NMClient registers to events, whenever an event happens, that will be enqueued (as idle action) in that main context. That means, whenever that main context gets iterated/run, those callbacks get invoked.

If you now pass the NMClient to another thread and access it there, while also iterating the main context on the original thread, it's likely to result in a crash.

You can only access the NMClient, if you g_main_context_acquire() it's context (for example, while iterating it).


> Also the warnings from the logs seem to be suggesting we are doing something wrong ?

that's right. nm_client_get_main_context() is already acquired on the main thread. You cannot meaningfully do something on this thread before releasing it from the main thread.

--- Additional comment from Radek Vykydal on 2021-03-01 13:47:25 UTC ---

I am trying to create a new NMClient in the thread (applying https://github.com/rhinstaller/anaconda/pull/3207) but still I am getting assertion failures, the first one when creating the Client:

13:13:14,431 WARNING org.fedoraproject.Anaconda.Modules.Network:INFO:anaconda.threading:Running Thread: AnaTaskThread-ApplyKickstartTask-1 (140229622785600)
13:13:14,435 WARNING org.fedoraproject.Anaconda.Modules.Network:WARNING:py.warnings:/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py:63: Warning: g_main_context_push_thread_default: assertion 'acquired_context' failed
13:13:14,435 WARNING org.fedoraproject.Anaconda.Modules.Network:  nm_client = NM.Client.new(None)
13:13:14,435 WARNING org.fedoraproject.Anaconda.Modules.Network:WARNING:py.warnings:/usr/lib64/python3.9/site-packages/pyanaconda/modules/network/nm_client.py:63: Warning: g_main_context_pop_thread_default: assertion 'g_queue_peek_head (stack) == context' failed
13:13:14,435 WARNING org.fedoraproject.Anaconda.Modules.Network:  nm_client = NM.Client.new(None)

(more in the attached log)

--- Additional comment from Radek Vykydal on 2021-03-01 13:51:28 UTC ---

Adding the log for the case from comment #8 without the patch for comparison.

--- Additional comment from Radek Vykydal on 2021-03-03 16:05:04 UTC ---

Thomas, please see comment #8.

Do you think the PR https://github.com/rhinstaller/anaconda/pull/3207 can fix the segfault issue ?
Does it even make sense to create NMClient instances in the PR without creating new contexts for threads like in:
https://github.com/rvykydal/anaconda/commit/b8f91fa4d567e2c4f886acdae14dd895bde094b5
- which seems to lead to deadlocks, for example here:
https://github.com/rvykydal/anaconda/blob/65c067c00344838ee8917751903cef5f864fc3b7/pyanaconda/modules/network/nm_client.py#L1025
The finish callback is never called, maybe the syncing on Queue is broken here?
Seems like we'd need to rework synchronisation of the calls...

--- Additional comment from Thomas Haller on 2021-06-22 12:14:34 UTC ---

(In reply to Radek Vykydal from comment #10)
> Thomas, please see comment #8.

sorry for the late reply

> Do you think the PR https://github.com/rhinstaller/anaconda/pull/3207 can
> fix the segfault issue ?

hm, I am not familiar with the code there. It seems you create now new NMClient instances
at various places. But the branch does nothing special about GMainContext, does it?

Is this code executed on different threads?

I also don't see that the code creates multiple GMainContext instances (for example
per thread) and that it pushes them with g_main_context_push_thread_default()). I would
have expected that. If not, how is that supposed to work?

> Does it even make sense to create NMClient instances in the PR without
> creating new contexts for threads like in:
> https://github.com/rvykydal/anaconda/commit/
> b8f91fa4d567e2c4f886acdae14dd895bde094b5

this patch alone does not seem right. Sure, it creates NMClient instances
with their own GMainContext (good). But this entire mode of operation implies
that the corresponding GMainContext gets iterated. I think in this patch,
the context does not get iterated and all events get queued up indefinitely and
the NMClient instance does not get any signals.

> - which seems to lead to deadlocks, for example here:
> https://github.com/rvykydal/anaconda/blob/
> 65c067c00344838ee8917751903cef5f864fc3b7/pyanaconda/modules/network/
> nm_client.py#L1025
> The finish callback is never called, maybe the syncing on Queue is broken
> here?
> Seems like we'd need to rework synchronisation of the calls...



Hm...   let me try to formulate how I think it should be.

The NMClient is strongly associated with the g_main_context_get_thread_default(),
at the time of creation (see also, nm_client_get_main_context()).

NMClient registers to D-Bus signals and makes asynchronous D-Bus calls. It always
does that with nm_client_get_main_context() as current g_main_context_get_thread_default().
That means, signals and callbacks get invoked, as the main context gets iterated/run.

For example, what does not work is iterating the main context on the main thread, and accessing
NMClient from another thread (because while the main context gets iterated, signals get emitted
and the NMClient cache changes). The means for synchronizing access to NMClient is by "going through"
the GMainContext.

that can mean, that on the other thread you call g_main_context_aquire() (or g_main_context_iteration())
and then you have access to the NMClient. Of course, if you currently are iterating the context on
another thread, you cannot acquire the context.

That means there are multiple modes of operations. For example,

  - you could have only one NMClient instance with it's own GMainContext, and pass that
    GMainContext around between thread when somebody wants to access the NMClient. Other
    threads would call g_main_context_aquire() (or g_main_context_iteration()) to get
    exclusive access.

  - or each thread could have its own NMClient instance (with separate GMainContext). The downside
    is that each NMClient instance has an overhead.

  - Maybe the simpler solution is to only use the NMClient from the main thread, and somehow
    propagate information from the NMClient cache to other threads via your own mechanism (queue.Queue()?).

Whichever you choose, the main context associated with the NMClient instance must be used only by one
thread at a time.


Highly recommended reading about GMainContext: https://developer.gnome.org/programming-guidelines/stable/main-contexts.html.en



Does this help? Please reach out on IRC for a quicker response next time.

--- Additional comment from Ben Cotton on 2021-08-10 12:52:46 UTC ---

This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

--- Additional comment from Radek Vykydal on 2021-10-07 13:33:40 UTC ---

(In reply to Thomas Haller from comment #11)

> Highly recommended reading about GMainContext:
> https://developer.gnome.org/programming-guidelines/stable/main-contexts.html.
> en

for the record, moved to:
https://developer-old.gnome.org/programming-guidelines/stable/main-contexts.html.en

--- Additional comment from Radek Vykydal on 2022-06-30 11:29:37 UTC ---

https://github.com/rhinstaller/anaconda/pull/4212

--- Additional comment from Radek Vykydal on 2022-09-08 12:28:34 UTC ---

(In reply to Radek Vykydal from comment #14)
> https://github.com/rhinstaller/anaconda/pull/4212

Obsoleted by this PR:
https://github.com/rhinstaller/anaconda/pull/4322

Comment 1 Jiri Konecny 2022-09-15 09:02:40 UTC
Related to bug 2125256.

Comment 3 Radek Vykydal 2022-10-21 12:03:00 UTC
https://github.com/rhinstaller/anaconda/pull/4384

Comment 4 Radek Vykydal 2022-10-25 09:39:14 UTC
(In reply to Radek Vykydal from comment #3)
> https://github.com/rhinstaller/anaconda/pull/4384

I am running network kickstart tests to try to confirm the patch fixes the issue, results so far:
RHEL-8.8.0-20221018.d.3             : reproduced in 5/1500 tests 
RHEL-8.8.0-20221018.d.3 + the patch : reproduced in 0/1500 tests

Comment 7 Radek Vykydal 2022-11-03 09:39:49 UTC
We were missing a patch in the original PR from comment #3, opening new PR:
https://github.com/rhinstaller/anaconda/pull/4417

The mass tests in comment #4 included the patch so there is no need to re-run them. I've just forgot to update the PR and network kickstart tests were not run on the PR before merging.

Comment 8 Radek Vykydal 2022-11-03 11:27:28 UTC
Disabling related kickstart tests in dailies temporarily: https://github.com/rhinstaller/kickstart-tests/pull/825

Comment 11 Jan Stodola 2022-11-28 14:41:50 UTC
Confirmed with Radek that this issue is no longer present.

Marking as Verified:Tested

Comment 13 Jan Stodola 2022-12-14 10:04:15 UTC
*** Bug 2152786 has been marked as a duplicate of this bug. ***

Comment 20 Jan Stodola 2023-01-19 16:18:13 UTC
*** Bug 2050674 has been marked as a duplicate of this bug. ***

Comment 22 errata-xmlrpc 2023-05-16 08:15:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (anaconda bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2748


Note You need to log in before you can comment on or make changes to this bug.