Bug 477964

Summary: Use --print-reply in NetworkManager sleep/wake script to work around D-Bus bug
Product: [Fedora] Fedora Reporter: James <james>
Component: pm-utilsAssignee: Phil Knirsch <pknirsch>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: 17083525, andrew_dockery, anmar, bbaetz, bugzilla, cbredesen, choeger, clancy.kieran+redhat, dan.c.hooper, danw, daryll, dbn.lists, dcbw, dlrobin874, fkooman, jbeuree, joachim.deguara, joshua.bakerlepain, jph, mishu, murphyl, opensource, pknirsch, richard, rvokal, schveiguy, scottt.tw, tim, tore, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-18 21:25:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nm log with "disabled network bug"
none
NM log when all networking disabled none

Description James 2008-12-26 15:42:29 UTC
Description of problem:
The summary says it all: sometimes I resume my notebook, either having been using wireless (iwlagn, Intel 4965) or wired (r8169), networking is disabled in NetworkManager (as in, Networking Enabled is unchecked in the applet). Checking Networking Enabled brings the network back up.

Version-Release number of selected component (if applicable):
kernel-2.6.27.10-167.fc10.x86_64
NetworkManager-0.7.0-0.12.svn4326.fc10.x86_64
gnome-power-manager-2.24.2-2.fc10.x86_64
dbus-1.2.4-2.fc10.x86_64

How reproducible:
Intermitted.

Steps to Reproduce:
1. Suspend.
2. Resume.
  
Actual results:
Networking sometimes disabled.

Expected results:
Networking consistently re-enabled IF the ethernet cable is plugged in or one of the auto-connect wireless networks is available.

Comment 1 Dox96 2009-01-02 01:31:59 UTC
I also get this problem, except I use Fedora 9, and the problem occurs when I resume from hibernation.  Otherwise it is exactly as described above.  I have been hibernating my laptop for more than a year every night, and this problem has only appeared recently.  At the moment, I am only using a wired connection.

I am using the following components:
kernel-2.6.27.9-73.fc9.i686
NetworkManager-0.7.0-0.12.svn4326.fc9.i386
gnome-power-manager-2.22.1-1.fc9.i386
dbus-1.2.4-2.fc9.i386

Comment 2 James 2009-01-06 14:17:26 UTC
Still present in NetworkManager-0.7.0-1.git20090102.fc10.x86_64.

Comment 3 Dan Williams 2009-01-08 17:56:00 UTC
Can you attach some logs from /var/log/messages showing a suspend/resume cycle where this happens?

Comment 4 Christoph Höger 2009-01-09 10:40:10 UTC
Here it comes from tomorrow morning.
(note that all gaps are due to a slow syncing cisco switch and our dhcp server being a little lazy)

I had a look over the code with that log and one from a successfull run. The only difference I could see so far was that null in the first dhcp message.
It seems that everything else runs fine. Maybe thats all just a nm-applet race condition on wakeup. I'll use ifconfig the next time it hits me to see, if NetworkManager runs fine in background.

Comment 5 Christoph Höger 2009-01-09 10:40:53 UTC
Created attachment 328535 [details]
nm log with "disabled network bug"

Comment 6 James 2009-01-10 13:04:40 UTC
Created attachment 328620 [details]
NM log when all networking disabled

I've attached mine above. The system started off working on wired (r8169), was then suspended, and resumed at home with no network cable plugged in, but in the presence of my wireless network.

Comment 7 Christoph Höger 2009-01-14 07:50:27 UTC
I had the same issue again this morning and this time had a look at my log files _before_ I reactivated NM. I can confirm that the "Waking up" message never appears. Which would mean that it's not NM's fault at all. Instead the process which should send the "Sleep" dbus message (or dbus itself) is buggy.
So, what process is in charge of this (I've seen a lot of distribution specific scripts in some /etc/* folders in the past, but what does f9 do?)? And how can I bring dbus to log all messages that go to org.freedesktop.NetworkManager?

Comment 8 Dan Williams 2009-01-14 19:02:19 UTC
Look in the pm-utils package:

/usr/lib/pm-utils/sleep.d/10NetworkManager

that's the script that tells NM to wake up.  Not sure exactly when that gets run, but if that definitely sends the message out to the bus, then its either dbus, libdbus, or NetworkManager.  There used to be a bug a long time ago with dbus where it wouldn't send the message if the sender process exited too quickly, which was usually triggered by dbus-send.  Try adding a --print-reply to the call in that script so that it reads:

        dbus-send --system                        \
                --print-reply \
                --dest=org.freedesktop.NetworkManager \
                /org/freedesktop/NetworkManager       \
                org.freedesktop.NetworkManager.wake

Comment 9 Christoph Höger 2009-01-28 09:53:52 UTC
so after two weeks I'll call this observer efect!

No resume problems since I added --print-reply

That seems to be a race condition somewhere between that particular script and either dbus not being able to handle that message yet and/or NM not being able to recieve it at that point. 

I would suggest asking some dbus expert on that.

Comment 10 Kieran Clancy 2009-01-29 22:54:24 UTC
I can confirm this bug. I see it quite often these days (on a wired network).

When I put the following in /etc/dbus-1/system.conf:
<allow receive_interface="*" eavesdrop="true" />

And then run: dbus-monitor --system "interface='org.freedesktop.NetworkManager'"

Sometimes I see the following when I resume:
signal sender=:1.120 -> dest=org.freedesktop.NetworkManager path=/org/freedesktop/NetworkManager; interface=org.freedesktop.NetworkManager; member=wake

So it is receiving the wake but not the sleep signal.

At other times (when the bug happens), I get this instead:
signal sender=:1.188 -> dest=org.freedesktop.NetworkManager path=/org/freedesktop/NetworkManager; interface=org.freedesktop.NetworkManager; member=sleep
signal sender=:1.29 -> dest=(null destination) path=/org/freedesktop/NetworkManager; interface=org.freedesktop.NetworkManager; member=StateChanged
signal sender=:1.29 -> dest=(null destination) path=/org/freedesktop/NetworkManager; interface=org.freedesktop.NetworkManager; member=StateChange
method call sender=:1.23 -> dest=org.freedesktop.NetworkManager path=/org/freedesktop/NetworkManager; interface=org.freedesktop.NetworkManager; member=state
signal sender=:1.29 -> dest=(null destination) path=/org/freedesktop/NetworkManager; interface=org.freedesktop.NetworkManager; member=PropertiesChanged

So it gets the sleep signal but never gets the wake signal.

For what it's worth, this makes all my applications using NFS hang, which is not particularly fun.

Comment 11 Dan Williams 2009-01-30 01:09:12 UTC
This is probably a D-Bus bug then.  Michael Meeks discovered a race in D-Bus with short-lived processes that is (as yet) unresolved, which is probably what's causing this issue.

I'd suggest adding --print-reply in the pm-utils script.

Comment 12 Dan Williams 2009-02-03 16:49:45 UTC
*** Bug 483288 has been marked as a duplicate of this bug. ***

Comment 13 Phil Knirsch 2009-02-06 16:36:34 UTC
Sounds reasonable. Richard, do you agree? We can remove the hack once d-bus has been fixed.

Thanks & regards, Phil

Comment 14 Dan Williams 2009-02-14 21:28:26 UTC
*** Bug 485581 has been marked as a duplicate of this bug. ***

Comment 15 Chris Schanzle 2009-02-26 16:51:54 UTC
Just saying "Thanks" for the interim workaround; no 'networking disbled' issues after wakeup on f10 x86_64.

Comment 16 Dan Williams 2009-03-24 15:54:46 UTC
*** Bug 491878 has been marked as a duplicate of this bug. ***

Comment 17 Dan Nicholson 2009-03-27 21:43:17 UTC
The problem with using --print-reply is that the call will block waiting for the reply from dbus. This will cause pm-utils to block in that hook, slowing down the rest of the process as it waits for the hooks to complete. This can take a couple seconds (the last time I checked).

You could mess around with backgrounding the process, but it would really be better if this was fixed in dbus. What pm-utils is doing is perfectly reasonable, and, in fact, the way dbus is supposed to work.

Of course, if the dbus fix is not forthcoming, then workarounds are necessary. I just don't want the workaround to go in and then lose sight of the real problem.

Comment 18 Niels Haase 2009-04-08 23:23:36 UTC
*** Bug 493785 has been marked as a duplicate of this bug. ***

Comment 19 Dan Williams 2009-04-09 01:19:33 UTC
*** Bug 494955 has been marked as a duplicate of this bug. ***

Comment 20 Jean-Philippe 2009-04-09 13:55:59 UTC
For as much as you are happy with --print-reply, there is, shall we say, a little issue:

-{*~*}- ~ less /usr/lib/pm-utils/sleep.d/10NetworkManager
/usr/lib/pm-utils/sleep.d/10NetworkManager: No such file or directory

Yep indeed: you 64bits users out there, and I suspect the numbers are increasing, you'd have to edit:

-{*~*}- ~ sudo vim /usr/lib64/pm-utils/sleep.d/55NetworkManager

instead.

male sense though.

Cheers

Comment 21 Jean-Philippe 2009-04-09 14:23:52 UTC
^^^^^some^admin^please^remove^above^mesage^^^^^

  uname -r
  2.6.27.9-159.fc10.x86_64

GREAT - THE ABOVE HACK KILLED THE 'SUSPEND' ABILITY OF MY LAPTOP, EVEN AFTER REVERTING TO THE FORMER, UNHACKED VERSION OF SAID FILE. 

64BITS PEOPLE, DO _NOT_ EDIT /usr/lib64/pm-utils/sleep.d/55NetworkManager OR DO SO AT YOU OWN RISK.

GENTLEMEN, YOU JUST BROUGHT THE WHOLE 'I DON'T USE LINUX BECAUSE IT DOESN'T WORK' TOPIC ON THE TABLE, RIGHT HERE, RIGHT NOW & MY LANGUAGE SKILLS AREN'T HALFWAY GOOD ENOUGH TO FULLY EXPRESS THE EXACT SUBSTANCE OF MY FEELINGS ABOUT THE SITUATION.

f**CK

Jean-Philippe

Comment 22 Jean-Philippe 2009-04-09 14:26:22 UTC
^^^^^some^admin^please^remove^above^mesage^^^^^

  uname -r
  2.6.27.9-159.fc10.x86_64

GREAT - THE ABOVE HACK KILLED THE 'SUSPEND' ABILITY OF MY LAPTOP, EVEN AFTER REVERTING TO THE FORMER, UNHACKED VERSION OF SAID FILE. 

64BITS PEOPLE, DO _NOT_ EDIT /usr/lib64/pm-utils/sleep.d/55NetworkManager OR DO SO AT YOU OWN RISK.

GENTLEMEN, YOU JUST BROUGHT THE WHOLE 'I DON'T USE LINUX BECAUSE IT DOESN'T WORK' TOPIC ON THE TABLE, RIGHT HERE, RIGHT NOW & MY LANGUAGE SKILLS AREN'T HALFWAY GOOD ENOUGH TO FULLY EXPRESS THE EXACT SUBSTANCE OF MY FEELINGS ABOUT THE SITUATION.

f**CK

Jean-Philippe

Comment 23 Jean-Philippe 2009-04-09 15:14:16 UTC
forum thread started to solve the issue (of me being quite innocent in applying suggestions):
http://forums.fedoraforum.org/showthread.php?t=219304

Sorry for the noise. Still angry though.
Jean-Philippe

Comment 24 Niels Haase 2009-05-07 22:01:39 UTC
*** Bug 499716 has been marked as a duplicate of this bug. ***

Comment 25 Kieran Clancy 2009-05-07 22:09:29 UTC
I have had the --print-reply workaround in place for some time now, which was working 100% of the time, but recently it has stopped working so often.

Occasionally now, I send the signal with --print-reply and NetworkManager doesn't re-enable the network.

When this happens, I even tried to send the org.freedesktop.NetworkManager.wake signal several times manually, but I just get the message:
Error org.freedesktop.NetworkManager.AlreadyAsleepOrAwake: Already awake

Next time it happens I will check /var/log/messages.

Comment 26 Niels Haase 2009-05-16 21:59:21 UTC
*** Bug 501094 has been marked as a duplicate of this bug. ***

Comment 27 Dan Nicholson 2009-07-09 01:20:19 UTC
FYI, upstream dbus bug is fdo #896.

https://bugs.freedesktop.org/show_bug.cgi?id=896

Comment 28 James 2009-09-07 21:04:59 UTC
What's the state of this at the moment? Can this be closed now, or are we still waiting on a dbus fix?

Comment 29 James 2009-09-15 08:01:17 UTC
Just popped up again (it's rare) with:

NetworkManager-0.7.1-8.git20090708.fc11.x86_64
dbus-1.2.12-2.fc11.x86_64

Comment 30 Joachim Deguara 2009-10-07 05:17:54 UTC
It is not too rare as I am hitting it too with f11.  The dbus bug mentioned in comment #27 has now been resolved.  Can we get a new (patched) version of dbus as the patch was pushed in July but the dbus-1.2.12-2 was built in June?

Comment 31 Joachim Deguara 2009-10-07 05:49:51 UTC
Nevermind, my bug was related to a typo by someone who created /usr/lib/pm-utils/sleep.d/56dhclient.  Looks like $DEVICE was used to incorrectly read teh vairable $device.  See bug #527641

Comment 32 James 2009-11-07 10:47:57 UTC
I've not seen this so far now in Rawhide (dbus-1.2.16-8.fc12.x86_64), anyone else?

Comment 33 Dan Williams 2009-11-11 01:14:29 UTC
*** Bug 493784 has been marked as a duplicate of this bug. ***

Comment 34 Bug Zapper 2009-11-18 10:34:20 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 35 Dan Williams 2009-11-18 20:39:56 UTC
-> F12 since this is still an issue there until/if it gets fixed in dbus...

Comment 36 James 2009-11-18 21:06:27 UTC
Upstream claims the dbus bug was fixed; is this reflected in the Fedora build? (I've not seen it so far.)

Comment 37 Dan Williams 2009-11-18 21:25:07 UTC
you're correct, upstream commit is http://cgit.freedesktop.org/dbus/dbus/commit/?id=87ddff6b24d9b9d4bba225c33890db25022d8cbe and F-12 ships with dbus-1.2.16 which has the fix.