Bug 740342 - Use of --print-reply in /usr/lib64/pm-utils/sleep.d/95packagekit causes it to block subsequent pm-utils hooks on wake
Summary: Use of --print-reply in /usr/lib64/pm-utils/sleep.d/95packagekit causes it to...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: PackageKit
Version: 16
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Richard Hughes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-21 17:24 UTC by Adam Williamson
Modified: 2013-02-14 01:16 UTC (History)
14 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-02-14 01:16:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
fuller extract from /var/log/messages (4.24 KB, text/plain)
2011-09-21 17:25 UTC, Adam Williamson
no flags Details
log message after waking from suspend (50.46 KB, text/plain)
2011-11-21 09:57 UTC, Nikolai Maziashvili
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 45710 0 None None None Never

Description Adam Williamson 2011-09-21 17:24:44 UTC
In at least F15 and F16, there's a noticeable 25 second delay between when you wake the system from suspend, and when NetworkManager tries to reconnect to the network.

I finally decided to take a look at why today, and found something interesting. On both my laptop and my desktop - one F15, one F16 - what happens is the kernel wakes up, and dbus tries to activate the org.freedesktop.PackageKit service. In the meantime, some kernel stuff goes on, and ntpd discovers it doesn't have a network any more and takes itself down. dbus successfully activates the net.reactivated.Fprint service. The system spins its wheels for another 10 seconds, then finally:

Sep 21 10:16:54 adam dbus[1074]: [system] Failed to activate service 'org.freedesktop.PackageKit': timed out
Sep 21 10:16:54 adam dbus-daemon[1074]: dbus[1074]: [system] Failed to activate service 'org.freedesktop.PackageKit': timed out
Sep 21 10:16:54 adam dbus-daemon[1074]: (upowerd:1467): UPower-DEBUG: emitting resuming
Sep 21 10:16:54 adam dbus-daemon[1074]: (upowerd:1467): UPower-Linux-DEBUG: SYSFS add /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.4
Sep 21 10:16:54 adam NetworkManager[1057]: NetworkManager[1057]: <info> wake requested (sleeping: yes  enabled: yes)

it sure reads as if NM doesn't get woken up until after the timeout and failure of the attempt to activate org.freedesktop.PackageKit.

Why would things be ordered like this? Can it be solved, so we get a network connection immediately after resume? Thanks!

Comment 1 Adam Williamson 2011-09-21 17:25:41 UTC
Created attachment 524251 [details]
fuller extract from /var/log/messages

Here's a fuller extract, showing all the stuff that happens between resume and NM firing.

Comment 2 Dan Williams 2011-09-21 17:38:14 UTC
Quite interesting; there's nothing explicit in NM that requires PolicyKit, but the sleep/wake stuff may request authentication of the caller via PolicyKit.  If that fails, NM does listen for upower events as well, so perhaps pm-utils is calling NM which then requests PK auth for pm-utils, which fails for some reason, and then after that upower is woken up and NM gets the wakeup from the upower signal?

Comment 3 Adam Williamson 2011-09-21 17:48:36 UTC
um, it's *Package*Kit in the log, not PolicyKit. (I realize this makes it a lot weirder).

To be clear - would you expect NM to wait for the upower signal before kicking in, or should something else wake it up first? Right now it seems like the upower 'resuming' signal is somehow blocked by the PackageKit service activation, and then as soon as upower actually fires its signal, NM picks it up and wakes up.

Comment 4 Adam Williamson 2011-10-27 21:28:49 UTC
anyone got any new ideas on this?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 5 Nikolai Maziashvili 2011-11-21 09:55:50 UTC
I'm experiencing same issue in F16. When i hit a key to wake laptop from suspend my desktop shows up first for about 5 seconds. It means that anyone can see what i was doing on the laptop before suspend. Nothing is editable though, screen is locked, it is just shows content for 5 seconds then it goes black and shows login (unlock) dialogue. 
I takes about 45 seconds until i am able to enter password to unlock session, till then nothing is responding. As soon as i was logged in i "cat"ted /var/log/message to file and you can see that from the moment i pressed key to moment i could "cat" log 1 minute has passed.
Hopefully you have some extra info from my log.

Comment 6 Nikolai Maziashvili 2011-11-21 09:57:38 UTC
Created attachment 534752 [details]
log message after waking from suspend

This is log message "recorded" right after i logged in to session woken up from suspend.

Comment 7 Adam Williamson 2011-11-21 23:47:14 UTC
nikolai: that is not the same as this bug. this bug is specifically about the resumption of the network connection, and you're talking about something else. please file a separate bug.

fwiw, I see the same thing as you at first - after resuming, I see the pre-suspend desktop for a few seconds before it 'fades out' to the unlock screen - but I don't see any delay before being able to type the unlock password. But please file a new bug for that issue, do not post further to this bug. Thanks.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 8 Adam Williamson 2012-01-20 17:27:29 UTC
hughsie suggested a possible bad interaction with ntpd, but I tried disabling that, no difference. this is still happening on current rawhide, log:

Jan 20 09:24:25 adam kernel: [  493.561711] Restarting tasks ... done.
Jan 20 09:24:25 adam kernel: [  493.571609] sd 8:0:0:0: [sdc] 7823360 512-byte logical blocks: (4.00 GB/3.73 GiB)
Jan 20 09:24:25 adam kernel: [  493.575224] sd 8:0:0:0: [sdc] No Caching mode page present
Jan 20 09:24:25 adam kernel: [  493.575226] sd 8:0:0:0: [sdc] Assuming drive cache: write through
Jan 20 09:24:25 adam kernel: [  493.577509] sd 8:0:0:0: [sdc] No Caching mode page present
Jan 20 09:24:25 adam kernel: [  493.577512] sd 8:0:0:0: [sdc] Assuming drive cache: write through
Jan 20 09:24:25 adam kernel: [  493.579461]  sdc: sdc1
Jan 20 09:24:25 adam dbus[899]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Jan 20 09:24:25 adam dbus-daemon[899]: Unmounting CIFS filesystem: /share/data dbus[899]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Jan 20 09:24:50 adam dbus[899]: [system] Failed to activate service 'org.freedesktop.PackageKit': timed out
Jan 20 09:24:50 adam dbus-daemon[899]: dbus[899]: [system] Failed to activate service 'org.freedesktop.PackageKit': timed out
Jan 20 09:24:50 adam NetworkManager[885]: <info> wake requested (sleeping: yes  enabled: yes)



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Dan Williams 2012-01-20 17:40:39 UTC
NM has two mechanisms for being automatically woken up:

1) it gets poked explicitly by pm-utils via /usr/lib64/pm-utils/sleep.d/55NetworkManager

2) NM itself listens to dbus signals from upower

In the case of (2) in the bad log:

Sep 21 10:16:29 adam dbus[1074]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
...
Sep 21 10:16:54 adam dbus-daemon[1074]: (upowerd:1467): UPower-DEBUG: emitting resuming
Sep 21 10:16:54 adam NetworkManager[1057]: NetworkManager[1057]: <info> wake requested (sleeping: yes  enabled: yes)

it's obvious that resume happens very early on, and for some reason either pm-utils isn't getting triggered, but more worrisome than that is that upower doesn't get around to emitting the wakeup until 14 seconds later.  So what wakes up upower and why is it taking that long?

Comment 10 Adam Williamson 2012-01-20 18:34:25 UTC
So, this one turns out to be hilariously simple. 55NetworkManager tries to run dbus_send. dbus_send does not exist. Doing 's,dbus_send,dbus-send' in /usr/lib64/pm-utils/sleep.d/55NetworkManager is enough to fix the problem.

(though I note that, after doing that, I have no "dbus[899]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)" in the logs at all any more on resume, so maybe pm-utils now isn't making it past 55NetworkManager?)

Comment 11 Adam Williamson 2012-01-20 18:40:32 UTC
hum, well I stuck a 'echo blahblah > /tmp/pk_was_called' into 99packagekit and indeed /tmp/pk_was_called shows up on resume, so obviously 99packagekit is still getting called, but for some reason nothing shows up in /var/log/messages for it any more. Maybe it succeeds now?

Comment 12 Adam Williamson 2012-01-20 18:57:29 UTC
so, dbus_send is a function in /usr/lib64/pm-utils/functions which just does this:

dbus_send ()
{
        command dbus-send "$@" 2>/dev/null || return $NA
}

and should be sourced by the . "${PM_FUNCTIONS}" in 55NetworkManager.

But changing dbus_send to dbus-send undeniably 'fixes' the problem. I'm 4/4 on resumes since changing it. I don't understand why dbus_send isn't working, but it isn't.

Comment 13 Adam Williamson 2012-01-20 19:04:10 UTC
another interesting note: on my laptop the PackageKit service activation succeeds, and so the upower activation of NM happens quickly and I don't see the delay.

so there's two bugs here: 55NetworkManager not firing correctly unless dbus_send is changed to dbus-send, and 95packagekit failing on my desktop but working on my laptop.

Comment 14 Adam Williamson 2012-01-20 22:41:03 UTC
so I think I'm wrong with the dbus_send / dbus-send thing. I think somehow 55NetworkManager just either never fires or never manages to wake up NM, and the only thing that ever manages to wake it up is upower. I was misled by the fact that sometimes it works really fast and without a packagekit entry in the log, but I think those are the times that 95packagekit *succeeds* and so the upower wakeup happens really quickly.

I've tested this by moving 55NetworkManager entirely out of the way (to /tmp) and sometimes the wakeup still happens really fast. It seems the first NM reconnect on wake after booting is slow (PK fails), but if you suspend soon after that and resume quickly, the NM wakeup happens fast (PK succeeds). So I think 55NetworkManager is just a dead letter, somehow. It doesn't fire, no matter whether you use dbus_send or dbus-send, no matter if you apply https://bugs.freedesktop.org/show_bug.cgi?id=42500 .



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 15 Adam Williamson 2012-01-20 22:59:26 UTC
well, sheesh, think I finally have it: on resume the hooks get called *in reverse order*. So 95packagekit is called before 55NetworkManager. And because 95packagekit doesn't have && , it blocks everything after it.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Adam Williamson 2012-01-20 23:22:48 UTC
Okay. Yeah. That's it.

95packagekit blocks everything that happens after it, which on resume, is pretty much everything. You can't just add && to the command in it, because then the ;; on the next line becomes invalid. I'm not sure what the hell that does.

55NetworkManager also needs to have https://bugs.freedesktop.org/show_bug.cgi?id=42500 applied to actually work.

If you move 95packagekit to 50packagekit, and patch 55NetworkManager, then things work: on resume, 55NetworkManager happens before 50packagekit, and it actually works. I'm not sure if it makes sense to move packagekit in the ordering, but it may, I'm not sure it makes sense for it to go to sleep after networking and get woken up before networking. But it may be more correct or sufficient to simply fix it to run in 'parallel' with the other hooks.

thoughts, hughsie?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 17 Adam Williamson 2012-01-21 00:07:20 UTC
Okay, another way to fix this is to fix 55NetworkManager's dbus methods and then simply to drop the --print-reply from 95packagekit. This is what causes it to block everything else. Without the --print-reply, the other hooks go ahead and run immediately, and 95packagekit actually winds up succeeding after the network comes back up: so this not only unblocks NM, it actually allows PK to succeed in my case as well.

Hughsie, do you see anything wrong with simply dropping --print-reply from /usr/lib64/pm-utils/sleep.d/95packagekit ? it's part of PackageKit not pm-utils. I can't see any rationale for using --print-reply there, I can't see any reason we should wait for that reply before continuing with resume.

Applying the patch for 55NetworkManager's dbus methods seems to be definitely correct, so I'll just send through a build that does that.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 18 Adam Williamson 2012-02-06 19:02:25 UTC
Filed https://bugs.freedesktop.org/show_bug.cgi?id=45710 upstream to have the dbus-send parameter dropped from PK's pm-utils script.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 19 Richard Hughes 2012-03-16 20:19:56 UTC
I've committed this upstream, thanks.

Comment 20 Fedora End Of Life 2013-01-16 22:44:08 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 21 Fedora End Of Life 2013-02-14 01:16:31 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.