1243958 – network-online.target met before IPv6 is up

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1243958 - network-online.target met before IPv6 is up

Summary: network-online.target met before IPv6 is up

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	7.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Beniamino Galvani
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1301628 1313485
TreeView+	depends on / blocked

Reported:	2015-07-16 17:03 UTC by Shabba
Modified:	2023-03-24 13:34 UTC (History)
CC List:	12 users (show)
Fixed In Version:	NetworkManager-1.4.0-0.1.git20160606.b769b4df.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-03 19:14:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	NMT-439	0	None	None	None	2023-03-24 13:34:39 UTC
Red Hat Product Errata	RHSA-2016:2581	0	normal	SHIPPED_LIVE	Low: NetworkManager security, bug fix, and enhancement update	2016-11-03 12:08:07 UTC

Description Shabba 2015-07-16 17:03:07 UTC

Description of problem:

I have a system where a daemon(nginx) is a multi-user target in systemd. nginx opens ports for listening on explicit IPv4/IPv6 addresses found in its config file. In its service file, I originally had:

After=nss-lookup.target remote-fs.target

This was resulting in a failure to start the service by systemd, with the following logged:

2015/07/16 12:01:41 [emerg] 682#0: bind() to 192.0.2.1:80 failed (99: Cannot assign requested address)

I determined this was occurring because systemd was starting nginx before networking was up. I altered my service file to have:

After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

This changed the error to:

2015/07/16 12:44:25 [emerg] 1325#0: bind() to [2001:db8::1]:80 failed (99: Cannot assign requested address)

If I start the daemon manually after networking is up, it works fine. This indicates to me that the target network-online, is being reached prior to IPv6 being up, which is not correct behavior since network-online means more than just IPv4 being up.

Version-Release number of selected component (if applicable):

systemd-208-20.el7_1.5.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Install a dual stack system with a static IPv4 and IPv6 address
2. Install nginx
3. Configure nginx to listen for http requests on IPv4:80 and IPv6:80
4. Configure systemd to start nginx with After=network-online.target
5. Reboot; it will fail to start because IPv6 is not up yet.

Actual results:

Jul 16 12:47:31 build1 systemd: nginx.service: control process exited, code=exited status=1

Expected results:

nginx starts

Additional info:

Comment 2 Lukáš Nykrýn 2015-07-17 07:02:11 UTC

We just provide the synchronization point, it's up to services that manages network to order some of their wait-online before that and make sure that the network is really up.
I assume that you are using NM, because the network-scripts are totally synchronous and their are ordered even before network.target.

Comment 3 Thomas Haller 2015-07-17 09:24:23 UTC

Could you attach a debug-logfile or NetworkManager?


Enable debug-logging by configuring

[logging]
level=DEBUG

in /etc/NetworkManager/NetworkManager.conf


Afterwards, reboot the machine, ensure the issue got successfully reproduced and show the output of `journalctl -u NetworkManager -b0`



Thank you!!

Comment 4 Alex Vorona 2015-09-16 05:45:13 UTC

Issue seems to be not related to NetworkManager. It's due to IPv6 DAD, you can find similar bugs like really old but still not fixed #782042.

Comment 5 Dan Williams 2016-03-15 17:04:49 UTC

Note that if you're not setting ipv6.may-fail=no already, you'll want to do that.  That ensures that NM will block startup for IPv6 to complete on that specific connection.

nmcli connection modify "<name of connection or UUID>" ipv6.may-fail no

Comment 6 Beniamino Galvani 2016-03-16 16:07:00 UTC

(In reply to Dan Williams from comment #5)
> Note that if you're not setting ipv6.may-fail=no already, you'll want to do
> that.  That ensures that NM will block startup for IPv6 to complete on that
> specific connection.
> 
> nmcli connection modify "<name of connection or UUID>" ipv6.may-fail no

This is required, but probably not sufficient, as we signal the
activation before IPv6 DAD terminates when addresses are still
tentative and so a bind() to a static address will fail:

 $ nmcli c add type ethernet ifname veth0 con-name veth0+ autoconnect no ip4 1.2.3.4/32 ip6 fd01::42/64
 $ nmcli c mod veth0+ ipv6.may-fail no

 $ nmcli connection up veth0+ && socat -d -d TCP6-LISTEN:1234,bind=[fd01::42] UNIX-CLIENT:/tmp/foo.sock
  Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10)
  socat[18988] E bind(3, {AF=10 [fd01:0000:0000:0000:0000:0000:0000:0042]:1234}, 28): Cannot assign requested address
 $ ip a show veth0
 [...] inet6 fd01::42/64 scope global tentative [...]

 $ ip a show veth0
 [...] inet6 fd01::42/64 scope global [...]
 $ socat -d -d TCP6-LISTEN:1234,bind=[fd01::42] UNIX-CLIENT:/tmp/foo.sock
  socat[19443] N listening on AF=10 [fd01:0000:0000:0000:0000:0000:0000:0042]:1234

This is not consistent with IPv4 since we guarantee that if
ipv4.may-fail=no the IPv4 addresses will be usable immediately after
activation (of course, since duplicate detection is done in userspace
before assigning addresses to interface).

How about branch bg/ipv6-dad-rh1243958 ? AFAIU systemd-networkd does
something similar [1].

Maybe we should also add support for configuring the DAD timeout for
IPv6 (IPv4 already has it).

[1] https://github.com/systemd/systemd/issues/650

Comment 7 Mike McCune 2016-03-28 22:19:01 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 8 Thomas Haller 2016-04-19 14:37:14 UTC

(In reply to Beniamino Galvani from comment #6)

> How about branch bg/ipv6-dad-rh1243958 ? AFAIU systemd-networkd does
> something similar [1].

dad6_timeout() always fails the connection on timeout. Don't we have a check-fail, which only fails when may-fail=no?


Also, can you clear dad6_timer_id before the state-change in dad6_timeout()?


priv->dad6_ip6_config and priv->dad6_timer_id is set in activate_stage5_ip6_config_commit(), but there are few places where those fields are cleared. If you are sure that you don't overwrite a previous setting, add a

   g_assert (!priv->dad6_ip6_config);
   g_assert (!priv->dad6_timer_id);
   priv->dad6_ip6_config = nm_ip6_config_new (nm_device_get_ip_ifindex (self));
   ...


It general, it seems that after clearing dad6_timer_id, you can always clear dad6_ip6_config too.

Comment 9 Beniamino Galvani 2016-04-23 14:34:50 UTC

(In reply to Thomas Haller from comment #8)
> dad6_timeout() always fails the connection on timeout. Don't we have a
> check-fail, which only fails when may-fail=no?

We have a check_ip_failed(), but it doesn't fail immediately when may-fail=no. The correct fix for this is in bug 741347.

> Also, can you clear dad6_timer_id before the state-change in dad6_timeout()?

Ok.

> priv->dad6_ip6_config and priv->dad6_timer_id is set in
> activate_stage5_ip6_config_commit(), but there are few places where those
> fields are cleared.

Yes, it turns out that the fields could be overwritten, I added a check.

Branch repushed.

Comment 10 Thomas Haller 2016-04-26 13:01:34 UTC

if (nm_logging_enabled (LOGL_TRACE, LOGD_DEVICE | LOGD_IP6)) {
    ...
    _LOGt (...)
}

should be

if (_LOGt_enabled (LOGD_DEVICE | LOGD_IP6)) {
    ...
}


The reason is, that _LOGt() is a NOP unless compiled with --with-more-logging. Thus, you also want to use _LOGt_enabled() which returns always FALSE in that case.



https://bugzilla.gnome.org/show_bug.cgi?id=741347 seems important, and should be fixed.

Comment 11 Beniamino Galvani 2016-05-03 09:17:21 UTC

(In reply to Thomas Haller from comment #10)
> The reason is, that _LOGt() is a NOP unless compiled with
> --with-more-logging. Thus, you also want to use _LOGt_enabled() which
> returns always FALSE in that case.

Fixed.

> https://bugzilla.gnome.org/show_bug.cgi?id=741347 seems important, and
> should be fixed.

It has been merged now and I've updated bg/ipv6-dad-rh1243958 to use the new function for failing an IP method.

Comment 12 Lubomir Rintel 2016-05-05 14:45:37 UTC

Please add more comments there -- describe what ipv6_dad_start() does. It's not immediately obvious to me why would it start dad when kernel does that.

Also, perhaps we could add an option to use optimistic DAD; or just honor what the sysctl is set to? (If it even is a good idea; maybe it isn't)

Comment 13 Beniamino Galvani 2016-05-10 09:19:52 UTC

(In reply to Lubomir Rintel from comment #12)
> Please add more comments there -- describe what ipv6_dad_start() does. It's
> not immediately obvious to me why would it start dad when kernel does that.

I did some changes:
 - reworded commit message and comments
 - removed timer
 - better deal with DAD failures
 - added ipv6_get_tentative_addresses()

and repushed.

> Also, perhaps we could add an option to use optimistic DAD; or just honor
> what the sysctl is set to? (If it even is a good idea; maybe it isn't)

It seems that the optimistic flag can only be added by kernel, and to autoconfigured addresses. Since NM configures addresses by itself I don't think optimistic DAD would be possible, if I understand well how it works...

Comment 14 Thomas Haller 2016-05-11 16:00:53 UTC

+         tmp_config = nm_ip6_config_new_cloned (priv->ext_ip6_config_captured);
+         nm_ip6_config_intersect (tmp_config, priv->dad6_ip6_config);
+         num = nm_ip6_config_get_num_addresses (tmp_config);
+
+         for (i = 0; i < num; i++) {
+              addr = nm_ip6_config_get_address (tmp_config, i);
+              dad_pending |=    (addr->n_ifa_flags & IFA_F_TENTATIVE)
+                             && !(addr->n_ifa_flags & IFA_F_DADFAILED);
+         }

maybe you could add a function

  gboolean
  nm_ip6_config_has_any_dad_pending (const NMIP6Config *self,
                                     const NMIP6Config *candidates)

which basically iterates over the addresses of @self (ignoring addresses
which are not also in @candidates), and returns TRUE if there are any addresses that are TENTATIVE.

it allows a more efficient implementation, which doesn't involve cloning+intersecting the NMIP6Config instance.


Interesting, it uses nm_platform_ip6_address_get(). I didn't expect this function to be ever useful beside testing :)


s/nm_platform_get ()/NM_PLATFORM_GET/





priv->dad6_ip6_config = ipv6_get_tentative_addresses (self);

these two items are closely related, but their name is completely different. I think they should be named somehow similar (I like same prefixes)...
for example:
  priv->dad6_ip6_config = dad6_get_tentative_addresses (self);
  priv->dad6_ip6_config = dad6_ip6_config_get (self);




in ipv6_get_tentative_addresses(), could we cache nm_device_get_ip_ifindex() in a variable ifindex? Possibly with

  ifindex = nm_device_get_ip_ifindex();
  g_return_val_if_fail (ifindex > 0, NULL);





And can we construct dad6_config lazy, when needed:

     if (pl_addr && (pl_addr->n_ifa_flags & IFA_F_TENTATIVE)) {
         if (!dad6_config)
              dad6_config = nm_ip6_config_new (nm_device_get_ip_ifindex (self));

I wouldn't expect that finding dad-candidates happens in most of the cases.





+         /* Check if we have to wait for DAD */
+         if (priv->ip6_state == IP_CONF && !priv->dad6_ip6_config) {

doesn't that mean, we only create the dad6_ip6_config instance once? What if there are multiple calls to ip6_config_merge_and_apply(), don't we want to re-generate it every time?



(I like the commit message, and the patch in general)

Comment 15 Beniamino Galvani 2016-05-12 12:49:18 UTC

(In reply to Thomas Haller from comment #14)

> maybe you could add a function
> 
>   gboolean
>   nm_ip6_config_has_any_dad_pending (const NMIP6Config *self,
>                                      const NMIP6Config *candidates)
> 

> s/nm_platform_get ()/NM_PLATFORM_GET/

> these two items are closely related, but their name is completely different.
> I think they should be named somehow similar (I like same prefixes)...

> in ipv6_get_tentative_addresses(), could we cache nm_device_get_ip_ifindex()
> in a variable ifindex? 

> And can we construct dad6_config lazy, when needed:

Fixed all the above and repushed.


> +         /* Check if we have to wait for DAD */
> +         if (priv->ip6_state == IP_CONF && !priv->dad6_ip6_config) {
> 
> doesn't that mean, we only create the dad6_ip6_config instance once? What if
> there are multiple calls to ip6_config_merge_and_apply(), don't we want to
> re-generate it every time?

activate_stage5_ip6_config_commit() can be called multiple times but only one succeeds and changes the state from IP_CONFIG to IP_CHECK. When that state transition is about to happen we are guaranteed to have all necessary addresses (static, SLAAC or DHCP for managed conf), so I think it is safe to ignore, from a DAD perspective, any following call.

Comment 16 Lubomir Rintel 2016-05-21 11:59:09 UTC

For me it looks good now.

Comment 17 Beniamino Galvani 2016-05-22 12:35:17 UTC

Merged to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=7dbabe9bdca2d9c3b9df11bea52cdd500f699d31

Comment 19 Vladimir Benes 2016-09-24 07:20:09 UTC

nm-online exits when tentative addresses are gone (dad check was performed before), that is different to 1.0.6 behaviour.

Comment 21 errata-xmlrpc 2016-11-03 19:14:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2581.html

Note You need to log in before you can comment on or make changes to this bug.