Bug 739836 - NetworkManager makes reboot take several minutes
NetworkManager makes reboot take several minutes
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: NetworkManager (Show other bugs)
16
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Dan Williams
Fedora Extras Quality Assurance
RejectedBlocker, AcceptedNTH
:
: 743518 772400 798684 799005 812646 (view as bug list)
Depends On:
Blocks: F16-accepted/F16FinalFreezeExcept
  Show dependency treegraph
 
Reported: 2011-09-20 03:50 EDT by Kamil Páral
Modified: 2012-09-17 09:54 EDT (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-27 21:23:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
systemd debug log (229.93 KB, text/plain)
2011-09-20 03:50 EDT, Kamil Páral
no flags Details
messages (2.84 MB, text/plain)
2011-09-22 05:17 EDT, Kamil Páral
no flags Details
Shutdown debug log (264.05 KB, text/plain)
2011-11-14 15:34 EST, Ron Yorston
no flags Details
dmesg shutdown log (122.23 KB, text/plain)
2012-01-14 04:43 EST, hannes
no flags Details
strace of networkmanager (2.56 MB, text/plain)
2012-01-14 04:45 EST, hannes
no flags Details
NetworkManager.service status messages at shutdown with rsyslog mod (5.65 KB, text/x-log)
2012-04-04 18:12 EDT, Saurav Sengupta
no flags Details
Patch to prevent NM from handling interruptible signals when already doing it (426 bytes, patch)
2012-04-05 20:05 EDT, Rodrigo Araujo
no flags Details | Diff
[PATCH] improve signal handling (7.25 KB, patch)
2012-04-16 08:22 EDT, Jirka Klimes
no flags Details | Diff
[PATCH] rework a bit signal handling (9.96 KB, patch)
2012-04-16 13:43 EDT, Yann Droneaud
no flags Details | Diff

  None (edit)
Description Kamil Páral 2011-09-20 03:50:36 EDT
Created attachment 523968 [details]
systemd debug log

Description of problem:
Clean install of Fedora 16 Beta RC1 i386 DVD default package set. After booting into GDM, switching to console, logging in as root and issuing 'reboot' command, the restart takes several minutes. NetworkManager service seems to be to blame.

I captures systemd debug messages from boot to reboot. This pops up:

NetworkManager.service stopping timed out. Killing.
NetworkManager.service changed stop-sigterm -> stop-sigkill

See log.

OTOH, when logging into a session as a standard user and restarting machine from user menu, the restart is quick.

Version-Release number of selected component (if applicable):
NetworkManager-0.9.0-1.fc16.i686
systemd-35-1.fc16.i686

How reproducible:
always

Steps to Reproduce:
1. boot to gdm
2. switch to VT2
3. log in as root
4. run 'reboot'
5. wait, wait, wait
Comment 1 Kamil Páral 2011-09-20 03:52:32 EDT
Proposing as F16 Blocker, because:

"All release-blocking desktops' offered mechanisms (if any) for shutting down, logging out and rebooting must work"
https://fedoraproject.org/wiki/Fedora_16_Beta_Release_Criteria

It works eventually (therefore not marking as Beta blocker), but it is not an optimal behavior.
Comment 2 Kamil Páral 2011-09-22 05:10:32 EDT
I have retested this many times. It doesn't happen always. It happens in 30-50% of reboots. I tested with NetworkManager-0.9.0-1 and I also updated to NetworkManager-0.9.0-2, still the same.

With the system fully updated (from updates-testing), I see this message in systemd log:

NetworkManager.service stopping timed out. Killing.
NetworkManager.service: main process exited, code=killed, status=9
Unit NetworkManager.service entered failed state.
Comment 3 Kamil Páral 2011-09-22 05:17:36 EDT
Created attachment 524361 [details]
messages
Comment 4 Tim Flink 2011-10-03 12:45:30 EDT
Discussed in the 2011-09-30 blocker bug review meeting. Rejected as a Fedora 16 final blocker as it does not violate any release criteria - the system still shuts down.

However, delayed shutdown is an annoyance and this would be nice to have fixed so it was accepted as a NTH bug for Fedora 16 final.
Comment 5 Yann Droneaud 2011-11-04 06:09:48 EDT
I think this bug was also reported on Fedora devel list, see thread "F16 - random shutdown delays - systemd related ?"
http://lists.fedoraproject.org/pipermail/devel/2011-November/158968.html
Comment 6 Michal Schmidt 2011-11-04 08:02:33 EDT
Could you please produce a log of the problem using the method described in this message?:
http://lists.fedoraproject.org/pipermail/devel/2011-November/158955.html
Comment 7 Kamil Páral 2011-11-07 05:32:58 EST
I can no longer reproduce with F16 GOLD. Feel free to close if there is no one else to reproduce this.

NetworkManager-0.9.1.90-5.git20110927
systemd-36-3
Comment 8 Michal Schmidt 2011-11-07 07:17:18 EST
*** Bug 743518 has been marked as a duplicate of this bug. ***
Comment 9 Michal Schmidt 2011-11-07 07:18:51 EST
We now have a log of apparently the same problem from a different user:
https://bugzilla.redhat.com/show_bug.cgi?id=743518#c16
Comment 10 Ron Yorston 2011-11-14 15:34:07 EST
Created attachment 533610 [details]
Shutdown debug log

Just a 'me too' on this one. I've been getting delays during shutdown since I first installed F16 beta.

I'm currently on NetworkManager-0.9.1.90-5.git20110927.
Comment 11 Oleg Andreev 2011-11-22 04:17:19 EST
Just my 2 cents:
 1. When NetworkManager hangs during termination, it first prints

NetworkManager[XXX]: <warn> disconnected by the system bus.

    and only then

NetworkManager[XXX]: <info> caught signal 15, shutting down normally.

 2. When NetworkManager terminates properly, it prints

NetworkManager[XXX]: <info> caught signal 15, shutting down normally.

   two times in a row. Why does he do it (or should I ask this question on GNOME bugzilla?)

Slightly offtopic:
 1. Why does dbus.service exit before dependent services (e.g. NetworkManager) are terminated? Is it the desired behavior?
 2. This bug is partially mitigated by setting TimeoutSec to reasonable value in NetworkManager.service. Unfortunately I have only stumbled on this parameter when reading systemd source code. I believe that this parameter should be noted in Known Bugs or other documentation.
Comment 12 Michal Schmidt 2011-11-22 07:28:27 EST
(In reply to comment #11)
>  2. This bug is partially mitigated by setting TimeoutSec to reasonable value
> in NetworkManager.service. Unfortunately I have only stumbled on this parameter
> when reading systemd source code. I believe that this parameter should be noted
> in Known Bugs or other documentation.

TimeoutSec is documented in "man systemd.service".
Comment 13 Michal Schmidt 2012-01-11 19:29:03 EST
*** Bug 772400 has been marked as a duplicate of this bug. ***
Comment 14 hannes 2012-01-12 07:21:02 EST
Just another me too. Switched to wicd for the time being but a fixed NetworkManager would be great.
Comment 15 Jirka Klimes 2012-01-12 07:28:39 EST
(In reply to comment #14)
> Just another me too. Switched to wicd for the time being but a fixed
> NetworkManager would be great.

We would like to fix that. But I was not able to reproduce and thus don't know what causes that.

Would you provide more details about how to reproduce, what package versions do you have, etc?
What do you have in /var/log/messages?
Comment 16 Michal Schmidt 2012-01-12 12:36:52 EST
It would be useful to run NM under strace. Here's how:

Make sure SELinux is permissive or disabled.
Save this script as /root/nm-wrap.sh and make it executable:

#!/bin/sh
exec strace -o /root/nm-trace-$(date +%F--%T).txt -f -tt -s 1024 \
  /usr/sbin/NetworkManager --no-daemon

In /lib/systemd/system/NetworkManager.service change the ExecStart line to:
ExecStart=/root/nm-wrap.sh

Let the modifications take effect:
systemctl daemon-reload
systemctl restart NetworkManager.service

Hopefully the bug will still be reproducible. When you reproduce it, attach the corresponding /root/nm-trace-* log here.
Comment 17 hannes 2012-01-14 04:42:28 EST
Morning,

I didn't have to wait right now but received some messages during shutdown.
Comment 18 hannes 2012-01-14 04:43:17 EST
Created attachment 555210 [details]
dmesg shutdown log
Comment 19 hannes 2012-01-14 04:45:02 EST
Created attachment 555211 [details]
strace of networkmanager

strace of networkmanager
Comment 20 hannes 2012-01-22 18:16:10 EST
Ok, since I had selinux disabled the bug didn't occur and I thought it was fixed by several systemd updates during the last weeks. So I enabled selinux again and relabeled the file system. Now the bug is back and it seems to be clearly related to selinux.
Comment 21 Freddy Willemsen 2012-01-23 03:27:15 EST
I am not convinced selinux is the problem. I am having the same issue. Thing is though, if I use the debug stuff from Comment 16, the slowdown does not occur. As soon as I remove the strace stuff again, shutdown takes about 2-3 minutes (Ultimately resulting in NetworkManager being killed). It does seem te be intermittent, can't really put my finger on it yet.
Comment 22 hannes 2012-01-23 08:29:24 EST
Ah true, I made this changes pretty simultaneously. I could try to put the trace command back and see if it's still appearing.
Comment 23 hannes 2012-02-24 17:09:56 EST
Ok, so it's unrelated to selinux and does not occur so often, when the trace script is used. What I observed is that if it doesn't shut down cleanly it doesn't create a trace output file. So perhaps that could give an additional hint on what is really failing here.
Comment 24 Michal Schmidt 2012-03-02 06:56:47 EST
*** Bug 798684 has been marked as a duplicate of this bug. ***
Comment 25 Saurav Sengupta 2012-03-04 04:22:23 EST
The TimeoutSec parameter does seem to be doing the trick as a workaround.
Comment 26 Michal Schmidt 2012-03-05 07:40:18 EST
*** Bug 799005 has been marked as a duplicate of this bug. ***
Comment 27 Rodrigo Araujo 2012-03-31 12:57:01 EDT
Didn't anybody notice that if you restart NetworkManager on the command line it is fast, and leaves a lot of output in /var/log/messages, but when you poweroff or restart the system you have no related information there until a new rsyslog restart?

Should anyone care, the problem is that NetworkManager writes to syslog. If rsyslog is terminated before NetworkManager when shutting down the system, NetworkManager hangs.

It happens sometimes on some systems, and happens most times on faster systems where there's a higher probability of more services (rsyslog included) being terminated faster. I've verified it as always happening on 2 different systems with core i7 CPU and unencrypted partitions, and happening rarely on another with a core i3 CPU and LUKS encrypted partitions.

Once I've added the following line to the [Unit] section of
/lib/systemd/system/rsyslog.service and reloaded the systemd daemon with `systemctl --system daemon-reload', the problem was gone for good:

DefaultDependencies=no

I took that line from syslog.target and it seemed appropriate as, if I understand correctly, it makes rsyslog.service to be one of the last services to be shut off. Should there be a better way (I'm still getting used to systemd) I'd like to hear about it. Either way, I think you have here enough data to release a fix for this very annoying issue.

Regards.
Comment 28 Marc Muehlfeld 2012-04-01 07:47:47 EDT
I can confirm, that if I add the entry to rsyslog.service, shutdown always was quick, where it hangs almost everytime before.
Comment 29 hannes 2012-04-02 01:45:31 EDT
So this bug should be reassigned to the rsyslog maintainer, so that we hopefully get his opinion on this. Perhaps there's a reason for this missing entry, who knows.
I'll reassign it.
Comment 30 Michal Schmidt 2012-04-02 06:47:42 EDT
(In reply to comment #27)
> Should anyone care, the problem is that NetworkManager writes to syslog. If
> rsyslog is terminated before NetworkManager when shutting down the system,
> NetworkManager hangs.

Even if rsyslog terminates before NetworkManager, the hang is unexpected. NM's writes to /dev/log (syslog.socket) will activate systemd-kmsg-syslogd.service which will read the messages out from the socket and write them to the kernel log buffer.

So while changing the ordering of rsyslog.service may be a useful workaround for some, I don't think the reason for the hang is sufficiently understood.
Comment 31 Saurav Sengupta 2012-04-03 11:34:33 EDT
Setting DefaultDependencies=No for rsyslog.service as mentioned in comment #27 does not resolve the hang. I have experienced it even after making the modification. So far I have only seen that setting the TImeoutSec parameter in NetworkManager.service provides a workaround, as mentioned in comment #11.
Comment 32 Rodrigo Araujo 2012-04-03 19:07:02 EDT
I have to agree with Michal Schmidt, as I tried to reproduce the hang by shutting down rsyslog and stopping NetworkManager, but no luck: its output goes to the kernel log buffer and appears when running dmesg. Hang only occurs to me when powering off or rebooting (and without setting any of the 2 workarounds mentioned, of course).

Saurav,

One thing to try, to ensure the bug you're experiencing is the same, would be to manually stop NetworkManager, making sure it isn't running, before rebooting. Can you do that?

If so, can you check if you can, with the setting I mentioned in rsyslog.service, see any additional information that can give new hints in /var/log/messages regarding Network Manager?

If it logs stuff before rsyslog ends, I would also try with:

[logging]
level=DEBUG

to check if we can get even more info and take us in the right direction.
If you see anything interesting there, maybe it would be good to share it here.

Also can you try changing the log level in /etc/NetworkManager.conf like:

[logging]
level=ERR  # to check if it still hangs with less output

Another option would be to try with syslog-ng instead of rsyslog, and see if it also happens, or if it logs anything useful that we are missing. I still haven't had the time to try that.

Best regards, hope these suggestions can be useful.
Comment 33 Marc Muehlfeld 2012-04-04 02:37:54 EDT
Since I added DefaultDependencies=no to rsyslog.service three days ago, I shutdown my computer several times. And everytime it was perfekt fast like it should.

Before that workaround I "pkill NetworkManager" as root before I shutdown my computer.
Comment 34 Saurav Sengupta 2012-04-04 06:51:43 EDT
@Rodrigo (comment #32): Strangely, the Fedora 16 KDE spin does not hang so often. The default GNOME edition does. I tried the rsyslog workaround in the KDE spin and the hang occurred. But because of the almost non-existent occurrence on the KDE spin, I am unable to reproduce the conditions. I will be switching to the GNOME edition again and then I will do what you have suggested, investigate the matter, and report back here.
Comment 35 Saurav Sengupta 2012-04-04 18:10:28 EDT
@Rodrigo (comment #32): With DefaultDependencies=no in the Unit section of rsyslog.service, I got the hang again and have again found that it is caused by NetworkManager.service not stopping in time, as earlier. I am attaching the section of /var/log/messages which contains the status information showing NetworkManager.service to be the culprit.
Comment 36 Saurav Sengupta 2012-04-04 18:12:59 EDT
Created attachment 575228 [details]
NetworkManager.service status messages at shutdown with rsyslog mod

This is the section of /var/log/messages which contains the status messages for NetworkManager.service at shutdown when it did not stop in time. This was with the rsyslog modification in place as mentioned in comment #27.
Comment 37 Saurav Sengupta 2012-04-04 18:15:47 EDT
For attachment #575228 [details] I would like to call attention to line no. 22 and then lines no. 47-49.
Comment 38 Rodrigo Araujo 2012-04-05 20:05:13 EDT
Created attachment 575571 [details]
Patch to prevent NM from handling interruptible signals when already doing it
Comment 39 Rodrigo Araujo 2012-04-05 20:26:57 EDT
Saurav,

analyzing your log, something came to my attention:

Apr  5 03:25:50 nemo NetworkManager[858]: <info> caught signal 15, shutting down normally.
(...)
Apr  5 03:25:50 nemo NetworkManager[858]: <info> exiting (success)
Apr  5 03:25:50 nemo abrtd: Got signal 15, exiting
Apr  5 03:25:50 nemo NetworkManager[858]: NetworkManager[858]: <info> exiting (success)
Apr  5 03:25:50 nemo NetworkManager[858]: NetworkManager[858]: <info> caught signal 15, shutting down normally.

Seems like NM is getting the same signal twice, or somehow it's just getting itself in a loop. I found strange that it would let itself be interrupted by the same signal it is already handling, however analyzing the code I found this in src/main.c:


nm_signal_handler (int signo)
{
        static int in_fatal = 0, x;

        /* avoid loops */
        if (in_fatal > 0)
                return;
        ++in_fatal;

        switch (signo) {
(...)
        case SIGINT:
        case SIGTERM:
                /* let the fatal signals interrupt us */
                --in_fatal;
                nm_log_info (LOGD_CORE, "caught signal %d, shutting down normally.", signo);


I don't know why it should "let the fatal signals interrupt us" when we are handling a fatal signal already and shutting down, but probably there is a good reason?

Anyway I changed the code and created a patch, see attachment 575571 [details], to comment the line that resets the in_fatal flag when handling SIGTERM (and SIGINT). After recompiling NM, I removed my "fix" in rsyslog, rebooted to ensure I was affected, installed the NM I had compiled and the system does not hang now.

Can anyone also verify this? The patch is very simple and applies cleanly on fc16 and rawhide versions of NM (although I wasn't able to make the current rawhide version work but it had nothing to do with this patch). Please don't forget to reset any previous "fixes" like the TimeOutSec or rsyslog DefaultDependencies before trying, to ensure we don't have a false positive.

So it seems to me that:
- somehow NM is getting a SIGTERM twice (systemd problem? other service shutting down that does it?) or for some other reason this part of the code is called again, causing the hang;
- the rsyslog "fix" maybe just made that in some systems the order of simultaneous services shutting down is slightly changed, so whatever makes that NM gets TERM-signaled twice doesn't affect it; however, it helped us since the whole shutdown process is now logged on /var/log/messages so I think it (or something like it) still should be considered to be included in future versions;
- perhaps its better for NM mantainers to check this as they'll know better if it's really necessary to "let the fatal signals interrupt" for some more advanced reason;

I hope this gets us closer to solve the issue :)
Best regards.
Comment 40 Saurav Sengupta 2012-04-06 13:15:01 EDT
@Rodrigo: I went through the source file. The --in_fatal is present for some other signals as well. The file is very poorly documented and I can't figure out the purpose of using in_fatal as a counter. There is definitely more to it than meets the eye. For example, why is the signal allowed to loop via the default switch case at all? What does it mean? And what does quit_early, part of the SIGTERM handling code, do? I too think that it would be helpful if some NetworkManager developer could comment on this issue. And is this not really a NetworkManager bug instead of an rsyslog bug?
Comment 41 hannes 2012-04-09 10:27:28 EDT
Ok, reassigned it to NetworkManager, since I also hit it with a changed unit file. Sorry for changing it in first place.
Comment 42 Rodrigo Araujo 2012-04-09 16:31:11 EDT
Hannes,

if you are experiencing the bug, can you try the patch I attached?

Best regards.
Comment 43 Michal Schmidt 2012-04-16 07:06:08 EDT
*** Bug 812646 has been marked as a duplicate of this bug. ***
Comment 44 Jirka Klimes 2012-04-16 08:22:52 EDT
Created attachment 577696 [details]
[PATCH] improve signal handling

Even if I'm not able to reproduce the problem myself, I think there is a race
or a similar tricky issue while handling signals.

The attached patch should fix that.

I've prepared a scratch build for F17 with the patch incorporated. Would someone
who is able to reproduce the bug please test the build?

http://koji.fedoraproject.org/koji/taskinfo?taskID=3994257
Comment 45 Yann Droneaud 2012-04-16 13:42:16 EDT
(In reply to comment #44)
> Created attachment 577696 [details]
> [PATCH] improve signal handling
> 
> Even if I'm not able to reproduce the problem myself, I think there is a race
> or a similar tricky issue while handling signals.
> 
> The attached patch should fix that.
>

Seems a bit dangerous. As you wrote, it seems to require all threads to have the correct blocking signal set.

Why not, in the current code, set the default signal handler when one signal is caught. If the signal is triggered twice, there's a problem, so it's time to panic, let's kill the program. Patch to come.
Comment 46 Yann Droneaud 2012-04-16 13:43:43 EDT
Created attachment 577796 [details]
[PATCH] rework a bit signal handling

Here's a bunch of little patches to improve signal handling. (Not tested).
Comment 47 Jirka Klimes 2012-04-17 03:06:14 EDT
(In reply to comment #45)
> (In reply to comment #44)
> > Created attachment 577696 [details]
> > [PATCH] improve signal handling
> > 
> > Even if I'm not able to reproduce the problem myself, I think there is a race
> > or a similar tricky issue while handling signals.
> > 
> > The attached patch should fix that.
> >
> 
> Seems a bit dangerous. As you wrote, it seems to require all threads to have
> the correct blocking signal set.
> 
What is not reliable was the handling signals with sigaction() in multithreaded
application and many restrictions what can be run in a signal handler, possible races, etc.
sigwait() solves these issues and blocking signals is OK and actually necessary for sigwait() function. It will atomically unblock the signals and wait until one is delivered itself.

See e.g. http://docs.oracle.com/cd/E19455-01/806-5257/gen-75415/index.html
or links in the commit.

hannes, could you try the build from comment #44?
Comment 48 hannes 2012-04-17 03:28:20 EDT
I don't have f17 yet and am still on f16. But the problem is that it appears so rare that I could hardly say for sure that the bug is gone after a decent period of time. I also experienced it when I was not connected to any wifi but I couldn't find a common thing in all those situations.
Comment 49 Rodrigo Araujo 2012-04-21 18:43:17 EDT
A recent update via yum bumped my NM version to 0.9.4-2.git20120403, overwriting one I had installed that had my patch from Comment 38, and I began to experience the issue once again when I shut down the system.

I tried the buid provided by Jirka Klimes in Comment 44 by rebuilding it on this system. No problems rebuilding it (Fedora 16), freshened the rpms and the bug seems to be gone.

Here's hope the next "oficial" update via yum already has this fix. Meanwhile I'll stay with NetworkManager{,gtk,glib,gnome}-0.9.4-4test.git20120403.fc16.x86_64 installed since those packages solve my problem (if I have any serious issue with them I'll report back here).
Comment 50 Jirka Klimes 2012-04-24 11:11:09 EDT
The fix is in upstream:
http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=217c5bf6ac2284261e5c868d393d4b7d02ca5569

Fedora updates will follow shortly.
Comment 51 hannes 2012-04-24 16:58:25 EDT
Thanks a lot for your work on this issue! It really is appreciated!
Comment 52 luca.santoro80 2012-04-24 17:44:59 EDT
Dear Support,
  I continue to have this issue sometime when I'm connected to my Wyreless and I shutdown without closing VPN connections...
Comment 53 luca.santoro80 2012-04-29 14:14:12 EDT
I updated my Dedora 16 to latest patches and Netwok Manager package has release 1:0.9.4-3.git20120403.fc16 but sometimes the bug happend.
Comment 54 Sebastian Krämer 2012-05-01 06:54:57 EDT
I too use that package in F17 with kernel 3.3.4. There is a long delay during shutdown which I think is caused by this bug (hang, then finally the process gets killed). This happens always (not just randomly) and also when using logout functionality from xfce-session (not just from console as someone reported earlier).
Anything (logs?) needed that I can help with to debug this further?
Comment 55 Michal Schmidt 2012-05-01 13:07:15 EDT
(In reply to comment #52)
> Dear Support,

Just to clear the misconception: Bugzilla is not an avenue for technical assistance or support, but simply a bug tracking system. You do not have a support contract for Fedora. If you need a supported product, you can buy a subscription for Red Hat Enterprise Linux.

>   I continue to have this issue sometime when I'm connected to my Wyreless and
> I shutdown without closing VPN connections...
(In reply to comment #53)
> I updated my Dedora 16 to latest patches and Netwok Manager package has release
> 1:0.9.4-3.git20120403.fc16 but sometimes the bug happend.

That's probably because Jirka has not added the fix to a Fedora update yet. The fix is only in the upstream git repository so far.

(In reply to comment #54)
> I too use that package in F17 with kernel 3.3.4. There is a long delay during
> shutdown which I think is caused by this bug (hang, then finally the process
> gets killed). This happens always (not just randomly) and also when using
> logout functionality from xfce-session (not just from console as someone
> reported earlier).
> Anything (logs?) needed that I can help with to debug this further?

You may be seeing this bug, or perhaps bug 816842. The method from comment #6 is still effective to find the hanging service on shutdown.
Comment 56 luca.santoro80 2012-05-02 03:40:27 EDT
Dear Michal,
  I think we are helping you to find bugs but if you don't think the same I can stop to do this.

  I moved fron Open Suse to Fedora because, in our company, we are thinking to RHL instead of SLES and so I'm trying first the client but, if there are this kind of bugs, I think we will not move...

  Thanks and regards,

Luca
Comment 57 Michal Schmidt 2012-05-02 06:51:14 EDT
(In reply to comment #56)
> Dear Michal,
>   I think we are helping you to find bugs 

Yes. And I have never said anything to the contrary.

> but if you don't think the same I can stop to do this.

Look. We appreciate bugreports. I was merely pointing out an apparent misunderstanding. Fedora is a different product than Red Hat Enterprise Linux. RHEL comes with official support, Fedora does not.

And this bug is likely to get fixed soon enough.
Comment 58 Fedora Update System 2012-05-02 09:15:18 EDT
NetworkManager-0.9.4-5.git20120502.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.4-5.git20120502.fc16
Comment 59 Fedora Update System 2012-05-02 09:20:14 EDT
NetworkManager-0.9.4.0-8.git20120502.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.4.0-8.git20120502.fc17
Comment 60 Fedora Update System 2012-05-02 16:31:43 EDT
Package NetworkManager-0.9.4.0-8.git20120502.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.4.0-8.git20120502.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-7066/NetworkManager-0.9.4.0-8.git20120502.fc17
then log in and leave karma (feedback).
Comment 61 Michal Schmidt 2012-05-02 17:46:44 EDT
Jirka,

In my F17 VM NetworkManager-0.9.4.0-8.git20120502.fc17 actually makes the problem worse. The service refuses to stop normally. It reproducibly stays in "deactivating" state until it's killed by SIGKILL from systemd:

NetworkManager.service - Network Manager
	  Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled)
	  Active: deactivating (final-sigterm) since Wed, 02 May 2012 23:40:01 +0200; 5s ago
	 Process: 1564 ExecStart=/usr/sbin/NetworkManager --no-daemon (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/NetworkManager.service
		  └ 1567 /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhcli...


Apparently the dhclient process that was spawned by NM ignores SIGTERM. This may be a hint:

With NetworkManager-0.9.4.0-8.git20120502.fc17.x86_64:
$ grep SigBlk /proc/$(pidof dhclient)/status
SigBlk:	00000000000056ef

With NetworkManager-0.9.4.0-7.git20120403.fc17.x86_64:
$ grep SigBlk /proc/$(pidof dhclient)/status
SigBlk:	0000000000000000
Comment 62 Freddy Willemsen 2012-05-03 02:39:38 EDT
Confirmed, my F17 VM takes a long time to shut down after update to NetworkManager-0.9.4.0-8.git20120502.fc17.
Comment 63 Michal Schmidt 2012-05-03 04:24:38 EDT
fork() copies the signal mask to the child. execve() preserves the signal mask.

NM uses g_spawn_async() to start helper processes. One of the parameters is a pointer to a child_setup function. It's possible to reset the signal mask in those functions.
Comment 64 Jirka Klimes 2012-05-21 09:34:10 EDT
(In reply to comment #63)
> fork() copies the signal mask to the child. execve() preserves the signal
> mask.
> 
> NM uses g_spawn_async() to start helper processes. One of the parameters is
> a pointer to a child_setup function. It's possible to reset the signal mask
> in those functions.

Yes, I forgot about child processes and that they inherit signal mask.
http://www.gnu.org/software/libc/manual/html_node/Creating-a-Process.html
Now, the original signal mask is restored for child processes so that they can react to signals.
Thanks Michal for pointing to it.

Upstream fix: 78dda3b093b69f0c13773449dc732be5de40f620
Comment 65 Marc Muehlfeld 2012-05-21 09:36:22 EDT
Just for information: F17 works fine out of the box.
Comment 66 Fedora Update System 2012-05-21 13:11:07 EDT
NetworkManager-0.9.4.0-9.git20120521.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.4.0-9.git20120521.fc17
Comment 67 Sergio Monteiro Basto 2012-05-21 13:16:59 EDT
(In reply to comment #66)
> NetworkManager-0.9.4.0-9.git20120521.fc17 has been submitted as an update
> for Fedora 17.
> https://admin.fedoraproject.org/updates/NetworkManager-0.9.4.0-9.git20120521.
> fc17

can we have the same package for F16 please ?
Comment 68 Fedora Update System 2012-05-21 13:17:27 EDT
NetworkManager-0.9.4-6.git20120521.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.4-6.git20120521.fc16
Comment 69 Fedora Update System 2012-05-21 22:20:46 EDT
Package NetworkManager-0.9.4-6.git20120521.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.4-6.git20120521.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-8211/NetworkManager-0.9.4-6.git20120521.fc16
then log in and leave karma (feedback).
Comment 70 Fedora Update System 2012-05-27 21:23:15 EDT
NetworkManager-0.9.4-6.git20120521.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 71 Fedora Update System 2012-05-29 06:28:20 EDT
NetworkManager-0.9.4.0-9.git20120521.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 72 rbz 2012-08-27 16:32:47 EDT
I have what appears to be the same problem (same symptom - system takes a long time to shutdown).
I _am_ running the supposedly fixed version of NetworkManager:
NetworkManager-0.9.4-6.git20120521.fc16.x86_64
The OS is F16:
Linux rubicon.tson.de 3.4.9-1.fc16.x86_64 #1 SMP Wed Aug 15 20:45:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
When shutting down, the system logs (repeatedly):
> NetworkManager[1272]: <error> [1346097791.395177] [nm-dbus-manager.c:278] nm_dbus_manager_init_bus(): Could not get the system bus.  Make sure the message bus daemon is running!  Message: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused

I followed the advice in comment 6 to create a debug log... But I don't see how to add it here - maybe because this bug is closed?
Comment 73 Sandro Bonazzola 2012-09-17 09:54:08 EDT
(In reply to comment #72)
> I have what appears to be the same problem (same symptom - system takes a
> long time to shutdown).
> I _am_ running the supposedly fixed version of NetworkManager:

Same here with NetworkManager-0.9.4.0-9.git20120521.fc17.i686
From the log:

[   98.909121] systemd[1]: Job NetworkManager.service/stop finished,
result=done
[   98.909889] systemd[1]: Accepted connection on private bus.
[   98.910231] systemd[1]: Got D-Bus request:
org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
[   98.910298] systemd[1]: NetworkManager.service: cgroup is empty
[   98.910356] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
[  188.435840] systemd[1]: dbus.service stopping timed out (2). Killing.
[  188.436000] systemd[1]: dbus.service changed final-sigterm -> final-sigkill
[  188.436016] systemd[1]: Running GC...
[  188.436349] systemd[1]: Received SIGCHLD from PID 685 (modem-manager).
[  188.436391] systemd[1]: Got SIGCHLD for process 685 (modem-manager)
[  188.436467] systemd[1]: Child 685 died (code=killed, status=9/KILL)

Maybe the same patch or something similar should be applied to modem-manager also?

Note You need to log in before you can comment on or make changes to this bug.