Bug 472774

Summary: NM appears to hang for 5-6 seconds during shutdown
Product: [Fedora] Fedora Reporter: Tom London <selinux>
Component: NetworkManagerAssignee: Dan Williams <dcbw>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 11CC: dcbw, jns, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-11 00:30:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom London 2008-11-24 14:41:45 UTC
Description of problem:
NetworkManager appears to hang 5-6 seconds during shutdown, as shown by normal "shutdown messages".

Checking /var/log/messages, looks like this is correct: 16:59:33->16:59:43.

Is "loosing HAL" significant?  Timeout on message bus?

Or should this take this long?

Nov 23 16:59:33 tlondon NetworkManager: <info>  (wlan0): device state change: 8 -> 3
Nov 23 16:59:33 tlondon NetworkManager: <info>  (wlan0): deactivating device (reason: 38).
Nov 23 16:59:33 tlondon NetworkManager: <info>  wlan0: canceled DHCP transaction, dhcp client pid 3333
Nov 23 16:59:34 tlondon nscd: 28637 Access Vector Cache (AVC) started
Nov 23 16:59:34 tlondon audio-entropyd: audio-entropyd stopping due to signal 15
Nov 23 16:59:34 tlondon NetworkManager: <info>  Policy set 'Auto eth0' (eth0) as default for routing and DNS.
Nov 23 16:59:36 tlondon NetworkManager: <info>  HAL disappeared
Nov 23 16:59:36 tlondon ntpd[2362]: ntpd exiting on signal 15
Nov 23 16:59:36 tlondon bluetoothd[2338]: bridge pan0 removed
Nov 23 16:59:36 tlondon bluetoothd[2338]: Stopping SDP server
Nov 23 16:59:36 tlondon bluetoothd[2338]: Exit
Nov 23 16:59:37 tlondon NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15, shutting down normally.
Nov 23 16:59:37 tlondon NetworkManager: <info>  (eth0): now unmanaged
Nov 23 16:59:37 tlondon NetworkManager: <info>  (eth0): device state change: 8 -> 1
Nov 23 16:59:37 tlondon NetworkManager: <info>  (eth0): deactivating device (reason: 36).
Nov 23 16:59:37 tlondon NetworkManager: <info>  eth0: canceled DHCP transaction, dhcp client pid 2582
Nov 23 16:59:37 tlondon NetworkManager: <WARN>  check_one_route(): (eth0) error -34 returned from rtnl_route_del(): Sucess#012
Nov 23 16:59:37 tlondon NetworkManager: <info>  (eth0): cleaning up...
Nov 23 16:59:37 tlondon NetworkManager: <info>  (eth0): taking down device.
Nov 23 16:59:37 tlondon NetworkManager: <info>  (wlan0): now unmanaged
Nov 23 16:59:37 tlondon NetworkManager: <info>  (wlan0): device state change: 3 -> 1
Nov 23 16:59:37 tlondon NetworkManager: <info>  (wlan0): cleaning up...
Nov 23 16:59:37 tlondon NetworkManager: <info>  (wlan0): taking down device.
Nov 23 16:59:39 tlondon kernel: iwlagn 0000:03:00.0: PCI INT A disabled
Nov 23 16:59:39 tlondon NetworkManager: <info>  disconnected by the system bus.
Nov 23 16:59:39 tlondon nm-dispatcher.action: Error in get_property: Message did not receive a reply (timeout by message bus)
Nov 23 16:59:43 tlondon nm-system-settings: disconnected from the system bus, exiting.
Nov 23 16:59:43 tlondon nm-system-settings: nm_sysconfig_settings_remove_connection: assertion `NM_IS_SYSCONFIG_SETTINGS (self)' failed
Nov 23 16:59:43 tlondon nm-dispatcher.action: Disconnected from the system bus, exiting.
Nov 23 16:59:43 tlondon restorecond: terminated
Nov 23 16:59:43 tlondon rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
Nov 23 16:59:43 tlondon auditd[1961]: The audit daemon is exiting.
Nov 23 16:59:43 tlondon kernel: audit(1227488383.913:43): audit_pid=0 old=1961 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1


Version-Release number of selected component (if applicable):
NetworkManager-0.7.0-0.12.svn4326.fc11.x86_64

How reproducible:
Every shutdown

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Bug Zapper 2008-11-26 05:51:53 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 2 Tom London 2008-12-27 18:07:58 UTC
Nope, this continues with current rawhide.

Dec 26 19:47:25 tlondon NetworkManager: <info>  HAL disappeared
Dec 26 19:47:25 tlondon ntpd[2423]: Deleting interface #7 wlan0, 192.168.1.103#123, interface stats: received=0, sent=0, dropped=0, active_time=9365 secs
Dec 26 19:47:25 tlondon ntpd[2423]: ntpd exiting on signal 15
Dec 26 19:47:25 tlondon bluetoothd[2393]: bridge pan0 removed
Dec 26 19:47:25 tlondon bluetoothd[2393]: Stopping SDP server
Dec 26 19:47:25 tlondon bluetoothd[2393]: Exit
Dec 26 19:47:25 tlondon NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15, shutting down normally.
Dec 26 19:47:25 tlondon NetworkManager: <info>  (eth0): now unmanaged
Dec 26 19:47:25 tlondon NetworkManager: <info>  (eth0): device state change: 8 -> 1
Dec 26 19:47:25 tlondon NetworkManager: <info>  (eth0): deactivating device (reason: 36).
Dec 26 19:47:26 tlondon NetworkManager: <info>  eth0: canceled DHCP transaction, dhcp client pid 2634
Dec 26 19:47:26 tlondon NetworkManager: <WARN>  check_one_route(): (eth0) error -34 returned from rtnl_route_del(): Sucess#012
Dec 26 19:47:26 tlondon NetworkManager: <info>  (eth0): cleaning up...
Dec 26 19:47:26 tlondon NetworkManager: <info>  (eth0): taking down device.
Dec 26 19:47:26 tlondon NetworkManager: <info>  (wlan0): now unmanaged
Dec 26 19:47:26 tlondon NetworkManager: <info>  (wlan0): device state change: 3 -> 1
Dec 26 19:47:26 tlondon NetworkManager: <info>  (wlan0): cleaning up...
Dec 26 19:47:26 tlondon NetworkManager: <info>  (wlan0): taking down device.
Dec 26 19:47:28 tlondon kernel: iwlagn 0000:03:00.0: PCI INT A disabled
Dec 26 19:47:28 tlondon NetworkManager: <info>  disconnected by the system bus.
Dec 26 19:47:28 tlondon dbus: Rejected send message, 26 matched rules; type="method_return", sender=":1.8" (uid=0 pid=2351 comm="/usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wp") interface="(unset)" member="(unset)" error name="(unset)" requested_reply=0 destination=":1.7" (uid=0 pid=2344 comm="NetworkManager --pid-file=/var/run/NetworkManager/"))
Dec 26 19:47:28 tlondon nm-dispatcher.action: Error in get_property: Message did not receive a reply (timeout by message bus)
Dec 26 19:47:31 tlondon nm-dispatcher.action: Disconnected from the system bus, exiting.
Dec 26 19:47:31 tlondon nm-system-settings: disconnected from the system bus, exiting.
Dec 26 19:47:31 tlondon nm-system-settings: nm_sysconfig_settings_remove_connection: assertion `NM_IS_SYSCONFIG_SETTINGS (self)' failed

If NM starts shutdown at 19:47:25, it looks like it completes take down of eth0 at :26 (T+1 sec), completes the take down of wlan0 at :28 (T+3 sec), followed by 3 seconds of further cleanup (T+6 seconds).

Any need for the iwlagn "disabling" to take +2 seconds?

Other?

Comment 3 Dan Williams 2009-02-15 14:31:28 UTC
Ah right, that issue.  The Intel 3945, 4956, and 5000 drivers have a problem where they disable their interrupts and hang for 4 seconds when you take the device down.  Not really a NetworkManager problem, but a kernel problem, and something Intel is slowly trying to fix.

The "nm_sysconfig_settings_remove_connection" assertion should be fixed now as well.  I'm adding some logging to NM's exit to see exactly when NM exits.

With those changes in the next Fedora update, lets see if this issue gets better.  The update containing the changes will be 0.7.1-0.something.  Whenever you get that update, let me know how the shutdown goes.

Comment 4 Tom London 2009-02-16 19:33:52 UTC
Hmmm..... checking today (BEFORE updated NM), looks better:

Feb 16 08:24:18 tlondon NetworkManager: <info>  HAL disappeared
Feb 16 08:24:19 tlondon ntpd[2618]: ntpd exiting on signal 15
Feb 16 08:24:19 tlondon bluetoothd[2564]: Unregistered interface org.bluez.NetworkPeer on path /org/bluez/2564/hci0
Feb 16 08:24:19 tlondon bluetoothd[2564]: Unregistered interface org.bluez.NetworkHub on path /org/bluez/2564/hci0
Feb 16 08:24:19 tlondon bluetoothd[2564]: Unregistered interface org.bluez.NetworkRouter on path /org/bluez/2564/hci0
Feb 16 08:24:19 tlondon bluetoothd[2564]: bridge pan0 removed
Feb 16 08:24:19 tlondon bluetoothd[2564]: Stopping SDP server
Feb 16 08:24:19 tlondon bluetoothd[2564]: Exit
Feb 16 08:24:19 tlondon NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15, shutting down normally.
Feb 16 08:24:19 tlondon NetworkManager: <info>  (eth1): now unmanaged
Feb 16 08:24:19 tlondon NetworkManager: <info>  (eth1): device state change: 8 -> 1
Feb 16 08:24:19 tlondon NetworkManager: <info>  (eth1): deactivating device (reason: 36).
Feb 16 08:24:19 tlondon NetworkManager: <info>  eth1: canceled DHCP transaction, dhcp client pid 2786
Feb 16 08:24:19 tlondon NetworkManager: <WARN>  check_one_route(): (eth1) error -34 returned from rtnl_route_del(): Sucess#012
Feb 16 08:24:19 tlondon NetworkManager: <info>  (eth1): cleaning up...
Feb 16 08:24:19 tlondon NetworkManager: <info>  (eth1): taking down device.
Feb 16 08:24:19 tlondon NetworkManager: <info>  (wlan1): now unmanaged
Feb 16 08:24:19 tlondon NetworkManager: <info>  (wlan1): device state change: 3 -> 1
Feb 16 08:24:19 tlondon NetworkManager: <info>  (wlan1): cleaning up...
Feb 16 08:24:19 tlondon NetworkManager: <info>  (wlan1): taking down device.
Feb 16 08:24:19 tlondon kernel: iwlagn 0000:03:00.0: PCI INT A disabled
Feb 16 08:24:19 tlondon dbus: Rejected send message, 21 matched rules; type="method_return", sender=":1.8" (uid=0 pid=2532 comm="/usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wp") interface="(unset)" member="(unset)" error name="(unset)" requested_reply=0 destination=":1.7" (uid=0 pid=2525 comm="NetworkManager --pid-file=/var/run/NetworkManager/"))
Feb 16 08:24:19 tlondon NetworkManager: <info>  disconnected by the system bus.
Feb 16 08:24:19 tlondon nm-dispatcher.action: Error in get_property: Message did not receive a reply (timeout by message bus)
Feb 16 08:24:20 tlondon nm-dispatcher.action: Disconnected from the system bus, exiting.
Feb 16 08:24:20 tlondon nm-system-settings: disconnected from the system bus, exiting.
Feb 16 08:24:20 tlondon nm-system-settings: nm_sysconfig_settings_remove_connection: assertion `NM_IS_SYSCONFIG_SETTINGS (self)' failed
Feb 16 08:24:20 tlondon restorecond: terminated

I am running a newer kernel: kernel-2.6.29-0.119.rc5.fc11.x86_64, NetworkManager: NetworkManager-0.7.0-2.git20090207.fc11.x86_64

and, I'm running with Vt-d/intel_iommu=on.

any of that possibly impact this?

Comment 5 Dan Williams 2009-02-17 11:54:00 UTC
Probably not; at least I can't think of any off the top of my head.

The "nm_sysconfig_settings_remove_connection: assertion `NM_IS_SYSCONFIG_SETTINGS
(self)' failed" message is a harmless bug I'll get rid of.

Comment 6 Jessica Sterling 2009-03-10 17:01:14 UTC
This bug has been triaged

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 7 Bug Zapper 2009-06-09 09:55:04 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Dan Williams 2009-11-11 00:30:01 UTC
I haven't seen this in a while, and if iwlagn is involved, I suspect a bug in older kernels that hung the machine for a few seconds when the device was taken down (due to spinning with disabled interrupts or something).

commit 83e0428fff5578ffa7e7bb3281a000a205825fda
Author: Zhu Yi <yi.zhu>
Date:   Tue Dec 2 12:14:04 2008 -0800

    iwlwifi: fix DMA channel number in iwl_txq_ctx_stop
    
    The patch fixes the misuse of DMA channel number by Tx queue number in
    iwl_tx_ctx_stop().
    
    The problem was originally reported by Wu Fengguang who complains
    iwlagn driver takes too long time when issuing `ifconfig wlan0 down`.
    The patch now decreases the interface bring down time from 2 seconds
    to 0.8 second.
    
    This fixes bugs:
    http://bugzilla.kernel.org/show_bug.cgi?id=11956
    http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1790
    
    Signed-off-by: Zhu Yi <yi.zhu>
    Tested-by: Fengguang Wu <fengguang.wu>
    Acked-by: Tomas Winkler <tomas.winkler>
    Signed-off-by: Reinette Chatre <reinette.chatre>
    Signed-off-by: John W. Linville <linville>

I'd imagine it's fixed in 2.6.30 and later.