Bug 635006 - Network does not come back up on resume from hibernate
Summary: Network does not come back up on resume from hibernate
Keywords:
Status: CLOSED DUPLICATE of bug 549089
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-17 15:10 UTC by Orion Poplawski
Modified: 2010-09-23 21:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-23 21:31:14 UTC


Attachments (Terms of Use)

Description Orion Poplawski 2010-09-17 15:10:40 UTC
Description of problem:

I'm seeing this message in the logs of a number of machines:

NetworkManager[1441]: <error> [1284099212.540711] [nm-manager.c:1312] user_proxy_init(): could not init user settings proxy: (3) Could not get owner of name 'org.freedesktop.NetworkManagerUserSettings': no such name

What brought it to my attention was a machine that failed to bring up its network from hibernate:

Sep 17 08:24:48 kolea kernel: Restarting tasks ... done.
Sep 17 08:24:50 kolea ntpd[1346]: Deleting interface #7 eth0, fe80::213:d3ff:fef7:f6e0#123, interface stats: received=0, sent=0, dropped=0, active_time=182902 secs
Sep 17 08:24:50 kolea ntpd[1346]: Deleting interface #6 eth0, 192.168.0.44#123, interface stats: received=624, sent=624, dropped=0, active_time=182902 secs
Sep 17 08:24:50 kolea ntpd[1346]: 65.44.101.180 interface 192.168.0.44 -> (null)
Sep 17 08:24:51 kolea automount[7829]: umount_autofs_indirect: ask umount returned busy /data
Sep 17 08:24:53 kolea automount[7829]: umount_autofs_indirect: ask umount returned busy /home
Sep 17 08:24:58 kolea automount[10862]: bind_ldap_simple: lookup(ldap): Unable to bind to the LDAP server: (default), error Can't contact LDAP server
Sep 17 08:24:58 kolea automount[10862]: bind_ldap_simple: lookup(ldap): Unable to bind to the LDAP server: (default), error Can't contact LDAP server
Sep 17 08:24:58 kolea automount[10862]: bind_ldap_simple: lookup(ldap): Unable to bind to the LDAP server: (default), error Can't contact LDAP server
Sep 17 08:24:58 kolea automount[10862]: bind_ldap_simple: lookup(ldap): Unable to bind to the LDAP server: (default), error Can't contact LDAP server
Sep 17 08:40:02 kolea sssd[be[default]]: Could not start TLS. (null)
Sep 17 08:40:02 kolea sssd[be[default]]: Could not start TLS. (null)
Sep 17 08:40:02 kolea sssd[be[default]]: Could not start TLS. (null)
Sep 17 08:44:23 kolea kernel: usb 1-5: new high speed USB device using ehci_hcd and address 10
Sep 17 08:44:23 kolea kernel: usb 1-5: New USB device found, idVendor=0424, idProduct=2504
Sep 17 08:44:23 kolea kernel: usb 1-5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Sep 17 08:44:23 kolea kernel: hub 1-5:1.0: USB hub found
Sep 17 08:44:23 kolea kernel: hub 1-5:1.0: 4 ports detected
Sep 17 08:44:38 kolea kdm: :0[1629]: Received unknown command 1 from greeter
Sep 17 08:45:00 kolea sssd[be[default]]: Could not start TLS. (null)
Sep 17 08:45:00 kolea sssd[be[default]]: Could not start TLS. (null)
Sep 17 08:45:00 kolea sssd[be[default]]: Could not start TLS. (null)
Sep 17 08:45:01 kolea NetworkManager[1058]: <error> [1284734701.579551] [nm-manager.c:1331] user_proxy_init(): could not init user settings proxy: (3) Could not get owner of name 'org.freedesktop.NetworkManagerUserSettings': no such name

I ended up restarting NetworkManager and it came up okay.  /var/log/pm-suspend.log appears okay:

-rw-r--r--. 1 root root 5095 Sep 17 08:24 /var/log/pm-suspend.log

Fri Sep 17 08:24:49 MDT 2010: Running hooks for thaw
....
/usr/lib64/pm-utils/sleep.d/55NetworkManager thaw hibernate: success.


We use ldap for user info and autofs maps and automount home directories.  I suppose this could be an sssd issue.

The other times this is appearing in the logs is during system shutdown, where it appears to be the very first message in the logs regarding the shutdown.

Sep 10 00:13:32 ventus NetworkManager[1191]: <error> [1284099212.540711] [nm-manager.c:1312] user_proxy_init(): could not init user settings proxy: (3) Could not get owner of name 'org.freedesktop.NetworkManagerUserSettings': no such name
Sep 10 00:13:32 ventus init: tty4 main process (1668) killed by TERM signal


Version-Release number of selected component (if applicable):
NetworkManager-0.8.1-6.git20100831.fc13.x86_64

Comment 1 Stephen Gallagher 2010-09-20 12:58:14 UTC
What version of SSSD are you running?

Comment 2 Orion Poplawski 2010-09-20 14:48:13 UTC
sssd-1.2.2-21.fc13.x86_64

Comment 3 Stephen Gallagher 2010-09-20 14:59:38 UTC
That's very strange. That error message implies that you were able to connect to the LDAP server, but not to initiate TLS.

That "(null)" is supposed to be the reason returned by the LDAP server, but we're not getting one.

Is it possible that the connection dropped between the bind and the request to start TLS?

This sounds like NM to me (sorry!)

Comment 4 Orion Poplawski 2010-09-20 15:27:54 UTC
Well, I'm pretty sure the network wasn't up at all at that point.  No idea why NM didn't bring it back up since there are no messages from it.  I've turned on more debugging so hopefully will have more info next time it happens.

Comment 5 Orion Poplawski 2010-09-20 19:47:38 UTC
Stephen - just a thought.  Could sssd have a socket connection left over to the LDAP server that it is trying to reuse and timing out when the machine comes back up from hibernate?  Does sssd need any hooks into the pm system to bump it when a machine resumes?  Otherwise I'll see what the NM debug logs show it trying to do on resume.

Comment 6 Stephen Gallagher 2010-09-20 20:37:01 UTC
(In reply to comment #5)
> Stephen - just a thought.  Could sssd have a socket connection left over to the
> LDAP server that it is trying to reuse and timing out when the machine comes
> back up from hibernate?  Does sssd need any hooks into the pm system to bump it
> when a machine resumes?  Otherwise I'll see what the NM debug logs show it
> trying to do on resume.

It's possible, but I'd need to see detailed SSSD debug logs to be certain.

Comment 7 Orion Poplawski 2010-09-22 17:50:55 UTC
Most recently:

hibernate:
Sep 21 16:11:03 kolea NetworkManager[20071]: <info> (eth0): cleaning up...
Sep 21 16:11:03 kolea NetworkManager[20071]: <info> (eth0): taking down device.
Sep 21 16:11:03 kolea kernel: tg3 0000:02:00.0: PME# enabled
Sep 21 16:11:03 kolea kernel: pci0000:00: wake-up capability enabled by ACPI
Sep 21 16:11:03 kolea NetworkManager[20071]: <debug> [1285107063.655884] [nm-netlink-monitor.c:1
17] link_msg_handler(): netlink link message: iface idx 2 flags 0x1002
Sep 21 16:11:03 kolea NetworkManager[20071]: <info> (eth0): carrier now OFF (device state 1)

wakeup:
Sep 22 10:11:39 kolea kernel: Restarting tasks ... done.
Sep 22 10:11:39 kolea kernel: PM: Basic memory bitmaps freed
Sep 22 10:11:42 kolea ntpd[1346]: Deleting interface #11 eth0, fe80::213:d3ff:fef7:f6e0#123, interface stats: received=0, sent=0, dropped=0, active_time=33302 secs
Sep 22 10:11:42 kolea ntpd[1346]: Deleting interface #10 eth0, 192.168.0.44#123, interface stats: received=190, sent=190, dropped=0, active_time=33302 secs
Sep 22 10:11:42 kolea ntpd[1346]: 65.44.101.180 interface 192.168.0.44 -> (null)
Sep 22 10:11:48 kolea NetworkManagerDispatcher: performing autofs restart
Sep 22 10:11:52 kolea automount[30427]: umount_autofs_indirect: ask umount returned busy /data
Sep 22 10:11:58 kolea automount[30427]: umount_autofs_indirect: ask umount returned busy /home
Sep 22 10:12:04 kolea run-parts(/etc/cron.hourly)[17108]: finished zz-hibernate
Sep 22 10:12:04 kolea kernel: SELinux: initialized (dev autofs, type autofs), uses genfs_contexts
Sep 22 10:12:04 kolea automount[17110]: bind_ldap_simple: lookup(ldap): Unable to bind to the LDAP server: (default), error Can't contact LDAP server

NM just appears very unresponsive.  System was then shut down and restarted at 10:15.

sssd seems pretty quiet:

(Wed Sep 22 10:11:40 2010) [sssd[be[default]]] [cleanup_users_send] (9): Cache expiration is set to 0 days
(Wed Sep 22 10:11:42 2010) [sssd[be[default]]] [sysdb_search_users_check_handle] (6): Search use
rs with filter: (&(objectclass=user)(&(!(dataExpireTimestamp=0))(dataExpireTimestamp<=1285171900)(!(lastLogin=*))))
(Wed Sep 22 10:11:42 2010) [sssd[be[default]]] [cleanup_users_process] (4): Found 1 expired user entries!
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [cleanup_users_delete] (9): About to delete user orion
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sldb_request_callback] (6): LDB Error: 1 (ltdb modify without transaction)
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sysdb_op_default_done] (6): Error: 5 (Input/output error)
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [cleanup_users_delete_done] (2): User delete returned 5 (Input/output error)
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [ldap_id_cleanup_users_done] (1): Failed to cleanup users (5 [Input/output error]), retrying later!
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [ldap_id_cleanup_set_timer] (6): Scheduling next cleanup at 1285182702.1047662
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sbus_dispatch] (9): dbus conn: 2267AE0
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sbus_dispatch] (9): Dispatching.
(Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sbus_message_handler] (9): Received SBUS method [ping]
(Wed Sep 22 10:11:44 2010) [sssd[be[default]]] [sbus_dispatch] (9): dbus conn: 2267AE0
(Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [sbus_dispatch] (9): Dispatching.
(Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [sbus_message_handler] (9): Received SBUS method [resInit]
(Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [resolv_reread_configuration] (4): Recreating all c-ares channels
(Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [recreate_ares_channel] (4): Initializing new c-ares channel
(Wed Sep 22 10:11:46 2010) [sssd[be[default]]] [recreate_ares_channel] (4): Destroying the old c-ares channel
(Wed Sep 22 10:11:50 2010) [sssd[be[default]]] [sbus_dispatch] (9): dbus conn: 2267AE0
(Wed Sep 22 10:11:50 2010) [sssd[be[default]]] [sbus_dispatch] (9): Dispatching.
(Wed Sep 22 10:11:50 2010) [sssd[be[default]]] [sbus_message_handler] (9): Received SBUS method [ping]
...

Comment 8 Orion Poplawski 2010-09-23 17:24:30 UTC
Okay, more info.  I put some more debugging into NM and determined the following.  NM is ignoring the wake message coming from the pm-utils thaw script because dbus_bus_get_unix_user (in nm_auth_get_caller_uid) fails with a NameHasNoOwner DBusError message:

Could not get UID of name ':1.54': no such name

In dbus_bus_get_unix_user() the reply to dbus_connection_send_with_reply_and_block is NULL.

I cannot seem to reproduce this by running (which the thaw script does):

dbus-send --system --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.wake

as root from a shell - dbus_bus_get_unix_user() appears to work there.  System is being put into hibernate by a script run from cron, which may have something to do with it.

Comment 9 Dan Williams 2010-09-23 20:43:18 UTC
NM 20100831 and later also listen for upower events for suspend/resume so that we dont' have to bother with pm-utils problems like this anymore (pm-utils invocations of dbus-send dont' stick around long enough to be able to authenticate the call).  I'm sure we all agree that making sure that the sleep/wake calls are authenticated is a good thing :)  Is upower running on this machine?

Comment 10 Orion Poplawski 2010-09-23 20:59:18 UTC
upower was not installed, I'm running KDE which doesn't appear to use it.  Also, I need this to work with no one logged in at all.

Comment 11 Orion Poplawski 2010-09-23 21:01:10 UTC
What if the pm-utils calls to dbus-send used --print-reply?  Would that allow them to be authenticated reliably?

Comment 12 Orion Poplawski 2010-09-23 21:27:48 UTC
Looks like this is essentially the same as bug 610299.

Comment 13 Orion Poplawski 2010-09-23 21:29:18 UTC
Bug 477964 claims this was fixed in dbus 1.2.6, but apparently not, or has been broken again.

Comment 14 Orion Poplawski 2010-09-23 21:31:14 UTC

*** This bug has been marked as a duplicate of bug 549089 ***


Note You need to log in before you can comment on or make changes to this bug.