Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1054364 - nm-online returns success with no network connectivity
Summary: nm-online returns success with no network connectivity
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 20
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Jirka Klimes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-16 17:11 UTC by Tim Taylor
Modified: 2016-01-04 06:03 UTC (History)
8 users (show)

Fixed In Version: NetworkManager-0.9.9.0-38.git20131003.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-22 03:59:30 UTC
Type: Bug


Attachments (Terms of Use)

Description Tim Taylor 2014-01-16 17:11:18 UTC
Description of problem:
nm-online is always returning success (exit code 0) regardless of whether a network connection exists or not.


Version-Release number of selected component (if applicable):
NetworkManager-0.9.9.0-24.git20131003.fc20.x86_64


How reproducible:
always.

Steps to Reproduce:
1. Disconnect wired cable and disable wireless networks.
2. Execute nm-online

Actua1 results: 
# nmcli general status
STATE         CONNECTIVITY  WIFI-HW  WIFI      WWAN-HW  WWAN     
disconnected  none          enabled  disabled  enabled  disabled

2. Use nm-online to determine if connected:
# nm-online; echo "Exit code: $?"
Exit code: 0


Expected results:
Should get "Exit code: 1"

Additional info:

Comment 1 Freddy Willemsen 2014-01-17 18:36:39 UTC
Confirmed. Got a VirtualBox VM and when I disconnect network, nm-online happily responds with 0 exit status, no matter what.

$ nmcli general status
STATE   CONNECTIVITY  WIFI-HW  WIFI     WWAN-HW  WWAN     
asleep  none          enabled  enabled  enabled  disabled 

$ nm-online; echo "Exit code: $?"
Exit code: 0

Comment 2 Jirka Klimes 2014-03-12 12:53:38 UTC
Oh, it seems to be broken upstream since August last year and we didn't notice :(

I've pushed a fix to an upstream branch for review:
jk/rh1054364-nm-online

Comment 3 Dan Williams 2014-03-12 13:28:13 UTC
I know we changed the semantics of the tool last year at some point, so I'd like to get danw's review on this when he gets back from PTO.

Comment 4 Thomas Haller 2014-03-12 13:40:04 UTC
The order or the following statements seems wrong:

state = nm_client_get_state (client);
if (!nm_client_get_manager_running (client)) {



Also, I don't understand the point of
»···if (exit_no_nm && (state != NM_STATE_CONNECTING)) {
»···»···g_object_unref (client);
»···»···return 1;
»···}
Could you please add a comment why it is this way?

Comment 5 Jirka Klimes 2014-03-13 08:31:28 UTC
(In reply to Thomas Haller from comment #4)
> The order or the following statements seems wrong:
> 
> state = nm_client_get_state (client);
> if (!nm_client_get_manager_running (client)) {
> 
That's fine because you will get NM_STATE_UNKNOWN when NM doesn't run.

> 
> 
> Also, I don't understand the point of
> »···if (exit_no_nm && (state != NM_STATE_CONNECTING)) {
> »···»···g_object_unref (client);
> »···»···return 1;
> »···}
> Could you please add a comment why it is this way?
It is to correctly implement '--exit' option. nm-online should quit immediately when NM is not running or it is not connecting.

Comment 6 Dan Winship 2014-03-18 15:53:52 UTC
(In reply to Dan Williams from comment #3)
> I know we changed the semantics of the tool last year at some point, so I'd
> like to get danw's review on this when he gets back from PTO.

Hm... I don't remember nm-online having a man page. I thought it basically only existed for the purpose of NetworkManager-wait-online, so there was no problem with changing its semantics.

If we want to keep the old behavior as well, then we need to add a flag for NetworkManager-wait-online to use to get the behavior it wants (which is to check *only* the startup property, and not care at all whether we are actually online or not).

Comment 7 Dan Williams 2014-03-20 22:35:34 UTC
nm-online's purpose was pretty much just for blocking boot until you're connected (or until a time/startup), which is now handled by NM-wait-online.  I know we had different behavior in the past, and when you landed the 'startup' stuff to support NM-wait-online we did change behavior somewhat.

People are apparently using it for other stuff than that though, so I guess we should add another argument like --wait-for-startup or something that NM-wait-online would use, and then support the old behavior too.  What do you think?

Comment 8 Thomas Haller 2014-03-21 11:20:40 UTC
Repushed a fixup! for refactoring the timeouts.

It is surprisingly difficult (to me) to synchronize the timeouts so that both the progress bar and the ticking of the seconds looks smooth, while minimizing wakeups. :)

Comment 9 Dan Winship 2014-03-21 12:34:39 UTC
(In reply to Dan Williams from comment #7)
> People are apparently using it for other stuff than that though, so I guess
> we should add another argument like --wait-for-startup or something that
> NM-wait-online would use, and then support the old behavior too.  What do
> you think?

Yup, that's what I was thinking

Comment 10 Jirka Klimes 2014-03-25 09:56:27 UTC
* I have introduced '--wait-for-startup' option to wait for startup instead of a connection.
* Squashed Thomas' timeout fixes
* rebased to master
Please re-review jk/rh1054364-nm-online.

Note:
Do we really want to wait just for startup in NM-wait-online?
There is a bug 1073419 about NetworkManager-wait-online.service declaring "online" just after initialization of loopback. Even though in this case the problem was probably caused by NM declaring "startup complete" right after the loopback. I'm not sure whether OpenSLP daemon needs a network connection to setup multicasting.

Comment 11 Dan Winship 2014-03-25 13:38:16 UTC
(In reply to Jirka Klimes from comment #10)
> Do we really want to wait just for startup in NM-wait-online?
> There is a bug 1073419 about NetworkManager-wait-online.service declaring
> "online" just after initialization of loopback. Even though in this case the
> problem was probably caused by NM declaring "startup complete" right after
> the loopback. I'm not sure whether OpenSLP daemon needs a network connection
> to setup multicasting.

"startup" is supposed to become false at the point when every network connection that *could* be activated at startup *has been* activated. ie, at the point where it would be useless to wait any longer, because we know for sure that nothing is going to change after that. It's hard to say for sure exactly what's happening in bug 1073419 because it doesn't include the complete journal, but it looks like maybe the virtio ethernet device isn't showing up in the initial device scan?

Comment 12 Dan Winship 2014-03-25 14:05:22 UTC
> nm-online: fix nm-online to report online status correctly (rh #1054364)

This makes it so that plain "nm-online" waits for *both* startup to be complete, *and* for at least one network connection to be available. So if that's intentional, you should update the man page too. (ie, previously, nm-online during startup would return as soon as any network interface was activated. Now it will wait for all activatable ONBOOT connections to be activated).

Alternatively, you could just switch it back to the old behavior.

> nm-online: introduce '--wait-for-startup' option waiting for NM startup

I'd swap the order of this patch and the previous one, rather than breaking NM-wait-online in the first patch and then unbreaking it in the second.

Also, given that nm-online without --wait-for-startup also waits for startup to complete now, the description of that flag isn't quite complete.

Also, you have to update the ExecStart in data/NetworkManager-wait-online.service.in to use the new flag.

> nm-online: refactor calculation of wait timeout and allow sub second precision

This seems unnecessary...

Comment 13 Jirka Klimes 2014-03-26 17:28:52 UTC
(In reply to Dan Winship from comment #12)
> > nm-online: fix nm-online to report online status correctly (rh #1054364)
> 
> This makes it so that plain "nm-online" waits for *both* startup to be
> complete, *and* for at least one network connection to be available. So if
> that's intentional, you should update the man page too. (ie, previously,
> nm-online during startup would return as soon as any network interface was
> activated. Now it will wait for all activatable ONBOOT connections to be
> activated).
> 
> Alternatively, you could just switch it back to the old behavior.
> 
Thinking more about it we should not change the behaviour again. So nm-online without arguments will wait for a connection. With --wait-for-startup it will wait for startup.

> > nm-online: introduce '--wait-for-startup' option waiting for NM startup
> 
> I'd swap the order of this patch and the previous one, rather than breaking
> NM-wait-online in the first patch and then unbreaking it in the second.
> 
I squashed the commits.

> Also, given that nm-online without --wait-for-startup also waits for startup
> to complete now, the description of that flag isn't quite complete.
> 
> Also, you have to update the ExecStart in
> data/NetworkManager-wait-online.service.in to use the new flag.
> 
Done.

> > nm-online: refactor calculation of wait timeout and allow sub second precision
> 
> This seems unnecessary...
Yep, that's not necessary. And we may drop the commit for this bug.
Let it for Thomas to defend it, he seems like liking the stuff.

Comment 14 Dan Winship 2014-03-26 19:08:51 UTC
looks good now

Comment 15 Dan Williams 2014-03-28 22:23:50 UTC
+1 here too, I'm happy with everything up to the timeout refactor.  I haven't revewied the timeout refactor patch yet, waiting to see what thomas' reply to danw is on that one.  But the rest looks good.

Comment 16 Jirka Klimes 2014-03-31 06:52:13 UTC
Commits pushed to upstream master:
0d1bdff Merge fixes for nm-online (rh #rh1054364)
0a85bff nm-online: fix considering the --quiet option
ea962ce trivial: correct nm-online's '--exit' option description
520d281 systemd: update NetworkManager-wait-online.service to wait for startup
20fb078 nm-online: fix nm-online to report online status correctly (rh #1054364)

Timeout refactoring moved to branch th/nm-online-timeout.

Comment 17 Albert Pool 2014-03-31 19:18:08 UTC
Hello,

I am also experiencing this bug on a system where the nfs mounts are already done before the network connection is up (IP address assignment can take over 20 seconds here in the network - that needs to be fixed too). I have traced it down to nm-online returning too early or just immediately.

Is there any chance of the fixes appearing in the Fedora repository updates soon?

Comment 18 Fedora Update System 2014-04-08 20:12:18 UTC
NetworkManager-0.9.9.0-34.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-34.git20131003.fc20

Comment 19 Fedora Update System 2014-04-09 13:23:52 UTC
Package NetworkManager-0.9.9.0-34.git20131003.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-34.git20131003.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-4964/NetworkManager-0.9.9.0-34.git20131003.fc20
then log in and leave karma (feedback).

Comment 20 Fedora Update System 2014-04-10 09:28:26 UTC
NetworkManager-0.9.9.0-35.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-35.git20131003.fc20

Comment 21 Fedora Update System 2014-04-14 14:48:58 UTC
NetworkManager-0.9.9.0-36.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-36.git20131003.fc20

Comment 22 Fedora Update System 2014-04-15 11:57:04 UTC
NetworkManager-0.9.9.0-37.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-37.git20131003.fc20

Comment 23 Albert Pool 2014-04-15 15:22:54 UTC
Have not been able to reproduce any problems yet so I think nm-online.
Using version NetworkManager-0.9.9.0-35.git20131003.fc20

I however had to undo the patch that changed NetworkManager-wait-online to include -s command-line argument. I need the system to wait for an actual network connection before network dependent services like nfs mount are done, instead of just waiting until networkmanager has started.

I do not think it makes sense to check with -s here, since any network dependent services will just fail to start this way instead of waiting for connection, because networkmanager is there but the connection not yet. The system is not online yet. And that I think is the purpose of "wait-online": waiting until the system is online.

Comment 24 Thomas Haller 2014-04-15 16:04:09 UTC
(In reply to Albert Pool from comment #23)
> Have not been able to reproduce any problems yet so I think nm-online.
> Using version NetworkManager-0.9.9.0-35.git20131003.fc20
> 
> I however had to undo the patch that changed NetworkManager-wait-online to
> include -s command-line argument. I need the system to wait for an actual
> network connection before network dependent services like nfs mount are
> done, instead of just waiting until networkmanager has started.
> 
> I do not think it makes sense to check with -s here, since any network
> dependent services will just fail to start this way instead of waiting for
> connection, because networkmanager is there but the connection not yet. The
> system is not online yet. And that I think is the purpose of "wait-online":
> waiting until the system is online.

--wait-for-startup should block until NM has connected *all* its devices, or until all devices are in a state, where no further state progression is to be expected.

If -s does not what you want, please open a new bug because it is a different, related issue.


Please also provide logfiles with debug-logging enabled.


You do this by adding

[logging]
  level=DEBUG
  domains=ALL

to /etc/NetworkManager/NetworkManager.conf.
Then something like: `journalctl -u NetworkManager -b 0 | gzip > nm-log.txt.gz`



Thank you!!

Comment 25 Albert Pool 2014-04-15 17:19:59 UTC
Ok, then I misunderstood the manpage. If -s worked as you just described, I wouldn't object. The systems I am having problems with, have only one network interface.

It seems however that -s did not do what it should, since looking at the journalctl log of a failed boot attempt told me that nm-online had still returned immediately, before the connection was made. Reading your last comment this does not sound like expected behaviour. I.e.: the original problem for which this bug was opened, came back for me when -s was added.

For the moment, I have worked around this by removing -s. I have not had any problems with NFS mount anymore after booting some 5-10 times without -s, while before it failed to mount about 1/3 of the boot attempts because the network was not up yet.

But to me it looks like the original bug has not yet been fixed, if there is still an nm-online command-line option that can trigger the buggy behaviour.

I'll try to get a log of a failed boot attempt as you described. But I do not have much time for it at the moment; I will probably have to postpone this until next week.

Comment 26 Fedora Update System 2014-04-17 16:41:39 UTC
NetworkManager-0.9.9.0-38.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-38.git20131003.fc20

Comment 27 Fedora Update System 2014-04-22 03:59:30 UTC
NetworkManager-0.9.9.0-38.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 28 Albert Pool 2014-04-22 21:05:31 UTC
On the machine for which I initially posted here (March 31), the problem appears to be fixed.
The system on the that machine is freshly reinstalled and upgraded.
On April 15 I used an identical machine but there I have probably messed too much.

Since a clean Fedora 20 works right with latest updates installed I don't mind setting the bug as fixed.


Note You need to log in before you can comment on or make changes to this bug.