Bug 635006
| Summary: | Network does not come back up on resume from hibernate | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Orion Poplawski <orion> |
| Component: | NetworkManager | Assignee: | Dan Williams <dcbw> |
| Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 13 | CC: | dcbw, sgallagh |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-09-23 21:31:14 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Orion Poplawski
2010-09-17 15:10:40 UTC
What version of SSSD are you running? sssd-1.2.2-21.fc13.x86_64 That's very strange. That error message implies that you were able to connect to the LDAP server, but not to initiate TLS. That "(null)" is supposed to be the reason returned by the LDAP server, but we're not getting one. Is it possible that the connection dropped between the bind and the request to start TLS? This sounds like NM to me (sorry!) Well, I'm pretty sure the network wasn't up at all at that point. No idea why NM didn't bring it back up since there are no messages from it. I've turned on more debugging so hopefully will have more info next time it happens. Stephen - just a thought. Could sssd have a socket connection left over to the LDAP server that it is trying to reuse and timing out when the machine comes back up from hibernate? Does sssd need any hooks into the pm system to bump it when a machine resumes? Otherwise I'll see what the NM debug logs show it trying to do on resume. (In reply to comment #5) > Stephen - just a thought. Could sssd have a socket connection left over to the > LDAP server that it is trying to reuse and timing out when the machine comes > back up from hibernate? Does sssd need any hooks into the pm system to bump it > when a machine resumes? Otherwise I'll see what the NM debug logs show it > trying to do on resume. It's possible, but I'd need to see detailed SSSD debug logs to be certain. Most recently: hibernate: Sep 21 16:11:03 kolea NetworkManager[20071]: <info> (eth0): cleaning up... Sep 21 16:11:03 kolea NetworkManager[20071]: <info> (eth0): taking down device. Sep 21 16:11:03 kolea kernel: tg3 0000:02:00.0: PME# enabled Sep 21 16:11:03 kolea kernel: pci0000:00: wake-up capability enabled by ACPI Sep 21 16:11:03 kolea NetworkManager[20071]: <debug> [1285107063.655884] [nm-netlink-monitor.c:1 17] link_msg_handler(): netlink link message: iface idx 2 flags 0x1002 Sep 21 16:11:03 kolea NetworkManager[20071]: <info> (eth0): carrier now OFF (device state 1) wakeup: Sep 22 10:11:39 kolea kernel: Restarting tasks ... done. Sep 22 10:11:39 kolea kernel: PM: Basic memory bitmaps freed Sep 22 10:11:42 kolea ntpd[1346]: Deleting interface #11 eth0, fe80::213:d3ff:fef7:f6e0#123, interface stats: received=0, sent=0, dropped=0, active_time=33302 secs Sep 22 10:11:42 kolea ntpd[1346]: Deleting interface #10 eth0, 192.168.0.44#123, interface stats: received=190, sent=190, dropped=0, active_time=33302 secs Sep 22 10:11:42 kolea ntpd[1346]: 65.44.101.180 interface 192.168.0.44 -> (null) Sep 22 10:11:48 kolea NetworkManagerDispatcher: performing autofs restart Sep 22 10:11:52 kolea automount[30427]: umount_autofs_indirect: ask umount returned busy /data Sep 22 10:11:58 kolea automount[30427]: umount_autofs_indirect: ask umount returned busy /home Sep 22 10:12:04 kolea run-parts(/etc/cron.hourly)[17108]: finished zz-hibernate Sep 22 10:12:04 kolea kernel: SELinux: initialized (dev autofs, type autofs), uses genfs_contexts Sep 22 10:12:04 kolea automount[17110]: bind_ldap_simple: lookup(ldap): Unable to bind to the LDAP server: (default), error Can't contact LDAP server NM just appears very unresponsive. System was then shut down and restarted at 10:15. sssd seems pretty quiet: (Wed Sep 22 10:11:40 2010) [sssd[be[default]]] [cleanup_users_send] (9): Cache expiration is set to 0 days (Wed Sep 22 10:11:42 2010) [sssd[be[default]]] [sysdb_search_users_check_handle] (6): Search use rs with filter: (&(objectclass=user)(&(!(dataExpireTimestamp=0))(dataExpireTimestamp<=1285171900)(!(lastLogin=*)))) (Wed Sep 22 10:11:42 2010) [sssd[be[default]]] [cleanup_users_process] (4): Found 1 expired user entries! (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [cleanup_users_delete] (9): About to delete user orion (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sldb_request_callback] (6): LDB Error: 1 (ltdb modify without transaction) (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sysdb_op_default_done] (6): Error: 5 (Input/output error) (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [cleanup_users_delete_done] (2): User delete returned 5 (Input/output error) (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [ldap_id_cleanup_users_done] (1): Failed to cleanup users (5 [Input/output error]), retrying later! (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [ldap_id_cleanup_set_timer] (6): Scheduling next cleanup at 1285182702.1047662 (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sbus_dispatch] (9): dbus conn: 2267AE0 (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sbus_dispatch] (9): Dispatching. (Wed Sep 22 10:11:43 2010) [sssd[be[default]]] [sbus_message_handler] (9): Received SBUS method [ping] (Wed Sep 22 10:11:44 2010) [sssd[be[default]]] [sbus_dispatch] (9): dbus conn: 2267AE0 (Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [sbus_dispatch] (9): Dispatching. (Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [sbus_message_handler] (9): Received SBUS method [resInit] (Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [resolv_reread_configuration] (4): Recreating all c-ares channels (Wed Sep 22 10:11:45 2010) [sssd[be[default]]] [recreate_ares_channel] (4): Initializing new c-ares channel (Wed Sep 22 10:11:46 2010) [sssd[be[default]]] [recreate_ares_channel] (4): Destroying the old c-ares channel (Wed Sep 22 10:11:50 2010) [sssd[be[default]]] [sbus_dispatch] (9): dbus conn: 2267AE0 (Wed Sep 22 10:11:50 2010) [sssd[be[default]]] [sbus_dispatch] (9): Dispatching. (Wed Sep 22 10:11:50 2010) [sssd[be[default]]] [sbus_message_handler] (9): Received SBUS method [ping] ... Okay, more info. I put some more debugging into NM and determined the following. NM is ignoring the wake message coming from the pm-utils thaw script because dbus_bus_get_unix_user (in nm_auth_get_caller_uid) fails with a NameHasNoOwner DBusError message: Could not get UID of name ':1.54': no such name In dbus_bus_get_unix_user() the reply to dbus_connection_send_with_reply_and_block is NULL. I cannot seem to reproduce this by running (which the thaw script does): dbus-send --system --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.wake as root from a shell - dbus_bus_get_unix_user() appears to work there. System is being put into hibernate by a script run from cron, which may have something to do with it. NM 20100831 and later also listen for upower events for suspend/resume so that we dont' have to bother with pm-utils problems like this anymore (pm-utils invocations of dbus-send dont' stick around long enough to be able to authenticate the call). I'm sure we all agree that making sure that the sleep/wake calls are authenticated is a good thing :) Is upower running on this machine? upower was not installed, I'm running KDE which doesn't appear to use it. Also, I need this to work with no one logged in at all. What if the pm-utils calls to dbus-send used --print-reply? Would that allow them to be authenticated reliably? Looks like this is essentially the same as bug 610299. Bug 477964 claims this was fixed in dbus 1.2.6, but apparently not, or has been broken again. *** This bug has been marked as a duplicate of bug 549089 *** |