Version-Release number of selected component: NetworkManager-1.0.8-1.fc23 Additional info: reporter: libreport-2.6.3 backtrace_rating: 4 cmdline: /usr/sbin/NetworkManager --no-daemon crash_function: nm_supplicant_interface_get_scanning executable: /usr/sbin/NetworkManager global_pid: 4733 kernel: 4.2.8-300.fc23.i686 runlevel: N 5 type: CCpp uid: 0 Truncated backtrace: Thread no. 1 (9 frames) #0 nm_supplicant_interface_get_scanning at supplicant-manager/nm-supplicant-interface.c:329 #1 scanning_allowed at nm-device-wifi.c:1366 #2 ffi_call_SYSV at ../src/x86/sysv.S:65 #3 ffi_call at ../src/x86/ffi.c:382 #4 g_cclosure_marshal_generic at gclosure.c:1487 #8 g_signal_emitv at gsignal.c:3122 #9 check_scanning_allowed at nm-device-wifi.c:1418 #10 request_wireless_scan at nm-device-wifi.c:1495 #11 request_wireless_scan_periodic at nm-device-wifi.c:1550
Created attachment 1114194 [details] File: backtrace
Created attachment 1114195 [details] File: cgroup
Created attachment 1114196 [details] File: core_backtrace
Created attachment 1114197 [details] File: dso_list
Created attachment 1114198 [details] File: environ
Created attachment 1114199 [details] File: exploitable
Created attachment 1114200 [details] File: limits
Created attachment 1114201 [details] File: maps
Created attachment 1114202 [details] File: mountinfo
Created attachment 1114203 [details] File: namespaces
Created attachment 1114204 [details] File: open_fds
Created attachment 1114205 [details] File: proc_pid_status
Created attachment 1114206 [details] File: var_log_messages
fix for review at: th/supplicant-manager-fix-ref-count-rh1298007
> device/ethernet: cleanup clearing handlers registered to supplicant interface supplicant_interface_release() no longer removes the priv->supplicant.iface_state_id handler, so the signal never gets disconnected. > supplicant: don't pass start_now argument to nm_supplicant_interface_new() Instead of starting the interface if the die_count has not been exceeded, shouldn't it just use is_available()? That already takes the die_count into account and also checks priv->running. A bug in the old code, but we should fix that up. The rest of the patch is fine, but we can just get rid of the "start_now" variable entirely and just do: if (is_available (self)) nm_supplicant_interface_set_supplicant_available (iface, TRUE); We must have the if() there unfortunately, because the supplicant_interface doesn't track the supplicant_available state internally and ignore the call if the state doesn't actually change. But that's OK. > wifi: fix crash due to wrong ownership handling in nm_supplicant_manager_iface_release() nm_supplicant_manager_create_interface() should really check if there's an existing NMSupplicantInterface for the same 'iface' name and bail out early with a big warning. We really need to know what causes that problem, because it should never ever happen.
(In reply to Dan Williams from comment #15) > > device/ethernet: cleanup clearing handlers registered to supplicant interface > > supplicant_interface_release() no longer removes the > priv->supplicant.iface_state_id handler, so the signal never gets > disconnected. done in fixup! commit > > supplicant: don't pass start_now argument to nm_supplicant_interface_new() > > Instead of starting the interface if the die_count has not been exceeded, > shouldn't it just use is_available()? That already takes the die_count into > account and also checks priv->running. A bug in the old code, but we should > fix that up. The rest of the patch is fine, but we can just get rid of the > "start_now" variable entirely and just do: > > if (is_available (self)) > nm_supplicant_interface_set_supplicant_available (iface, TRUE); > > We must have the if() there unfortunately, because the supplicant_interface > doesn't track the supplicant_available state internally and ignore the call > if the state doesn't actually change. But that's OK. done in fixup. > > wifi: fix crash due to wrong ownership handling in nm_supplicant_manager_iface_release() > > nm_supplicant_manager_create_interface() should really check if there's an > existing NMSupplicantInterface for the same 'iface' name and bail out early > with a big warning. We really need to know what causes that problem, > because it should never ever happen. done in fixup. Repushed.
+ /* manager caches and reuses teh supplicant. In case the supplicant is already + * registered, aquire a the supplicant first. */ + supplicant_interface_release (self); + if (!sup_iface) { Isn't this comment outdated? (The manager doesn't cache the supplicant interface anymore I think). static NMSupplicantConfig * build_supplicant_config (NMDeviceWifi *self, NMConnection *connection, guint32 fixed_freq, GError **error) { - NMDeviceWifiPrivate *priv = NM_DEVICE_WIFI_GET_PRIVATE (self); + NMDeviceWifiPrivate *priv; [...] - g_return_val_if_fail (self != NULL, NULL); + g_return_val_if_fail (self, NULL); + + priv = NM_DEVICE_WIFI_GET_PRIVATE (self); + g_return_val_if_fail (priv->sup_iface, NULL); Nitpick: in my opinion we should keep: NMDeviceWifiPrivate *priv = NM_DEVICE_WIFI_GET_PRIVATE (self) and remove g_return_val_if_fail (self != NULL, NULL)' since the former already throws an assertion if the pointer is NULL. /* priv->ifaces may be modified if availability changes; can't use GHashTableIter */ - ifaces = g_hash_table_get_values (priv->ifaces); The comment is obsolete now. - g_hash_table_remove (priv->ifaces, ifname); + priv->ifaces = g_slist_remove (priv->ifaces, sup_iface); + g_object_remove_toggle_ref ((GObject *) sup_iface, _sup_iface_last_ref, self); I'm not very familiar with toggle references, does it make any difference here between g_object_remove_toggle_ref() and g_object_unref()?
(In reply to Beniamino Galvani from comment #17) > + /* manager caches and reuses teh supplicant. In case the supplicant > is already > + * registered, aquire a the supplicant first. */ > + supplicant_interface_release (self); > + if (!sup_iface) { > > Isn't this comment outdated? (The manager doesn't cache the supplicant > interface anymore I think). fixup added. > static NMSupplicantConfig * > build_supplicant_config (NMDeviceWifi *self, > NMConnection *connection, > guint32 fixed_freq, > GError **error) > { > - NMDeviceWifiPrivate *priv = NM_DEVICE_WIFI_GET_PRIVATE (self); > + NMDeviceWifiPrivate *priv; > [...] > - g_return_val_if_fail (self != NULL, NULL); > + g_return_val_if_fail (self, NULL); > + > + priv = NM_DEVICE_WIFI_GET_PRIVATE (self); > + g_return_val_if_fail (priv->sup_iface, NULL); > > Nitpick: in my opinion we should keep: > > NMDeviceWifiPrivate *priv = NM_DEVICE_WIFI_GET_PRIVATE (self) > > and remove > > g_return_val_if_fail (self != NULL, NULL)' > > since the former already throws an assertion if the pointer is NULL. fixup! > /* priv->ifaces may be modified if availability changes; can't use > GHashTableIter */ > - ifaces = g_hash_table_get_values (priv->ifaces); > > The comment is obsolete now. comment updated. > - g_hash_table_remove (priv->ifaces, ifname); > + priv->ifaces = g_slist_remove (priv->ifaces, sup_iface); > + g_object_remove_toggle_ref ((GObject *) sup_iface, > _sup_iface_last_ref, self); > > I'm not very familiar with toggle references, does it make any > difference here between g_object_remove_toggle_ref() and > g_object_unref()? remove_toggle_ref() is an object_unref() and unsubscribing the toggle callback. I think this works nice, because the manager owns a reference to the object, but if everybody else drops it's reference, the manager destroys the object. Only downside is, that the object is owned by the manager too, which is non-obvious. I added a comment about that. Repushed.
merged to master: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=de19bc20d041c3a5267dc05303b24e5871d10632 A simplified fix was added to nm-1-0: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=bd27102277e5d7e52d87bd26711ae6c431e08192 Note that this does only converts the crash of the reporter to a (non-crashing) assertion. As syslog from comment 13 shows, the issue is that the wifi devices got renamed, but seems that the renaming didn't clean up the supplicant interface (leading to creating supplicant interfaces for wlan0 twice). Leaving BZ open
I backported the nm-1-0 workaround [1] to Fedora 22 and Fedora 23. It will be in updates https://bodhi.fedoraproject.org/updates/NetworkManager-1.0.10-3.fc23 https://bodhi.fedoraproject.org/updates/NetworkManager-1.0.10-3.fc22 I don't close this bug, because the patch only avoids the crash, but doesn't properly handle renaming of wifi devices. [1] http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=bd27102277e5d7e52d87bd26711ae6c431e08192
NetworkManager-1.0.12-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-8201e3fefa
NetworkManager-1.0.12-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.