Bug 1446367 - New IPv6 DAD support lets activation without carrier hang indefinitely
Summary: New IPv6 DAD support lets activation without carrier hang indefinitely
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Lubomir Rintel
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-27 18:33 UTC by Thomas Haller
Modified: 2017-08-01 09:27 UTC (History)
9 users (show)

Fixed In Version: NetworkManager-1.8.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 09:27:08 UTC
Target Upstream Version:


Attachments (Terms of Use)
Proposed patch (1.62 KB, text/plain)
2017-04-30 15:53 UTC, Lubomir Rintel
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2299 normal SHIPPED_LIVE Moderate: NetworkManager and libnl3 security, bug fix and enhancement update 2017-08-01 12:40:28 UTC

Description Thomas Haller 2017-04-27 18:33:54 UTC
DAD can only complete if the device has carrier. Hence, with the new DAD support, NM waits until the device has carrier before even trying to do DAD.

As a result, activating a connection with static IPv6 addresses on a device without carrier hangs, and nmcli fails after timeout.

It's not clear what to do.

 - at the very least, if the device is set to have ipv6.may-fail=yes and the 
   device has some static IPv4 addresses, the fully activated state should be 
   reached together with IPv4. Currently, the device hangs in IP config state.
   Note that ipv6.may-fail only helps if IPv4 completes.

 - the activation hangs indefinitely waiting for carrier. That may or may
   not be correct, but at least it's different from what happens with
   waiting for other address methods to complete. Why would waiting for
   carrier block indefinitely, but waiting for a DHCP response time-out?

 - for SLAAC/DHCP mode, the activation request is rejected right away if
   the device has no carrier (regardless of ignore-carrier setting).
   Maybe, if there are static IPv6 addresses, NM should do the same. Which 
   basically would mean, you cannot activate a connection without carrier 
   anymore. That seems bad.
   

Other ideas welcome. Anyway, the new behavior might seem reasonable in parts, but it seems to have grave consequences during upgrade to 7.4.


This needs more evaluation.

Comment 2 Lubomir Rintel 2017-04-30 15:53:57 UTC
Created attachment 1275293 [details]
Proposed patch

Well, I believe what we do is the correct thing to do; we ought not default to disabling DAD by default in an case (and currently have no way to disable it) and until DAD finishes the addresses are not useful (programs can't even bind to them). Thus pretending the connection is "activated" certainly is not the correct thing to do.

Nevertheless the hanging nmcli and thus also poor interaction with network.service's ifup is something that needs fixing.

I'd prefer to work around this the same way as we treat master connections that have no slaves -- don't bother waiting for states beyond IP_CONFIG. Sadly, currently there doesn't seem to be any way for the client to discover the pending IPv6 DAD is the only thing that blocks activation.

Thus I've decided to revert the old behaviour and activate the connection if the carrier is not present. Hopefully whoever activates a connection without carrier  knows what are they doing.

At the very least this behavior is consistent with what we've been doing previously.

(In reply to Thomas Haller from comment #0)
> DAD can only complete if the device has carrier. Hence, with the new DAD
> support, NM waits until the device has carrier before even trying to do DAD.
> 
> As a result, activating a connection with static IPv6 addresses on a device
> without carrier hangs, and nmcli fails after timeout.
> 
> It's not clear what to do.
> 
>  - at the very least, if the device is set to have ipv6.may-fail=yes and the 
>    device has some static IPv4 addresses, the fully activated state should
> be 
>    reached together with IPv4. Currently, the device hangs in IP config
> state.
>    Note that ipv6.may-fail only helps if IPv4 completes.
> 
>  - the activation hangs indefinitely waiting for carrier. That may or may
>    not be correct, but at least it's different from what happens with
>    waiting for other address methods to complete. Why would waiting for
>    carrier block indefinitely, but waiting for a DHCP response time-out?

Starting DHCP without carrier is silly and probably just done by accident.

Nevertheless, I believe considering the auto methods (be it DHCP or SLAAC) is not too useful. Where this really matters is the manual configuration, which, with IPv4 succeeds immediately, but awaits DAD for IPv6.

>  - for SLAAC/DHCP mode, the activation request is rejected right away if
>    the device has no carrier (regardless of ignore-carrier setting).
>    Maybe, if there are static IPv6 addresses, NM should do the same. Which 
>    basically would mean, you cannot activate a connection without carrier 
>    anymore. That seems bad.

Well, I can see why this would upset the users, especially those with existing configurations that don't care about the addresses being tentative until the carrier appears.

Comment 3 Thomas Haller 2017-05-10 10:12:07 UTC
_LOGI (LOGD_DEVICE | LOGD_IP6, "IPv6 DAD: carrier missing and ignored

Seems a bit too much noise for info level. _LOGD()?



Later, when carrier comes up, will DAD start?

Comment 5 Vladimir Benes 2017-05-22 08:30:50 UTC
When I have 'ipv6.may-fail no' the connection up still hangs. Is it something we want to fix as well? Should I file separate bug? ipv4.may-fail no doesn't matter.

Comment 6 Vladimir Benes 2017-05-22 08:37:12 UTC
and I can see no difference in NetworkManager-1.8.0-0.4.rc3.el7 where this shouldn't be fixed.

with this test:
    @nmcli_general_finish_dad_without_carrier
    Scenario: nmcli - general - finish dad with no carrier
    * Add a new connection of type "ethernet" and options "ifname testX con-name ethernet0 autoconnect no"
    * Prepare simulated veth device "testX" wihout carrier
    * Execute "nmcli con modify ethernet0 ipv4.may-fail no ipv4.method manual ipv4.addresses 1.2.3.4/24"
    * Execute "nmcli con modify ethernet0 ipv6.method manual ipv6.addresses 2001::2/128"
    * Bring "up" connection "ethernet0"
    * "connected:ethernet0" is visible with command "nmcli -t -f STATE,CONNECTION device" in "60" seconds
    Then "1.2.3.4" is visible with command "ip a s testX" in "60" seconds
    Then "2001::2" is visible with command "ip a s testX" in "60" seconds

Comment 7 Lubomir Rintel 2017-05-26 13:28:55 UTC
(In reply to Vladimir Benes from comment #5)
> When I have 'ipv6.may-fail no' the connection up still hangs. Is it
> something we want to fix as well? Should I file separate bug? ipv4.may-fail
> no doesn't matter.

No, that is the correct behavior. The connection would proceed activating when the carrier appears and DAD can start.

The scenario looks like this:

* NetworkManager-config-server is installed (thus the carrier is ignored)
* The carrier is off
* The connection has manually configured ipv6 addressing
* The connection has ipv6.may-fail yes

With the older NetworkManager, the connection should hang while activating, while with the fixed one it should reach the active state.

That pretty much looks like your test case. Does the test case succeed? Does DAD start? (you would see the "tentative" addresses being added on "testX" then the "tentative" flag disappear with "ip monitor"). It should not. Maybe the veth devices are different in this respect?

Comment 8 errata-xmlrpc 2017-08-01 09:27:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299


Note You need to log in before you can comment on or make changes to this bug.