Description of problem: I have a system where a daemon(nginx) is a multi-user target in systemd. nginx opens ports for listening on explicit IPv4/IPv6 addresses found in its config file. In its service file, I originally had: After=nss-lookup.target remote-fs.target This was resulting in a failure to start the service by systemd, with the following logged: 2015/07/16 12:01:41 [emerg] 682#0: bind() to 192.0.2.1:80 failed (99: Cannot assign requested address) I determined this was occurring because systemd was starting nginx before networking was up. I altered my service file to have: After=network-online.target remote-fs.target nss-lookup.target Wants=network-online.target This changed the error to: 2015/07/16 12:44:25 [emerg] 1325#0: bind() to [2001:db8::1]:80 failed (99: Cannot assign requested address) If I start the daemon manually after networking is up, it works fine. This indicates to me that the target network-online, is being reached prior to IPv6 being up, which is not correct behavior since network-online means more than just IPv4 being up. Version-Release number of selected component (if applicable): systemd-208-20.el7_1.5.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install a dual stack system with a static IPv4 and IPv6 address 2. Install nginx 3. Configure nginx to listen for http requests on IPv4:80 and IPv6:80 4. Configure systemd to start nginx with After=network-online.target 5. Reboot; it will fail to start because IPv6 is not up yet. Actual results: Jul 16 12:47:31 build1 systemd: nginx.service: control process exited, code=exited status=1 Expected results: nginx starts Additional info:
We just provide the synchronization point, it's up to services that manages network to order some of their wait-online before that and make sure that the network is really up. I assume that you are using NM, because the network-scripts are totally synchronous and their are ordered even before network.target.
Could you attach a debug-logfile or NetworkManager? Enable debug-logging by configuring [logging] level=DEBUG in /etc/NetworkManager/NetworkManager.conf Afterwards, reboot the machine, ensure the issue got successfully reproduced and show the output of `journalctl -u NetworkManager -b0` Thank you!!
Issue seems to be not related to NetworkManager. It's due to IPv6 DAD, you can find similar bugs like really old but still not fixed #782042.
Note that if you're not setting ipv6.may-fail=no already, you'll want to do that. That ensures that NM will block startup for IPv6 to complete on that specific connection. nmcli connection modify "<name of connection or UUID>" ipv6.may-fail no
(In reply to Dan Williams from comment #5) > Note that if you're not setting ipv6.may-fail=no already, you'll want to do > that. That ensures that NM will block startup for IPv6 to complete on that > specific connection. > > nmcli connection modify "<name of connection or UUID>" ipv6.may-fail no This is required, but probably not sufficient, as we signal the activation before IPv6 DAD terminates when addresses are still tentative and so a bind() to a static address will fail: $ nmcli c add type ethernet ifname veth0 con-name veth0+ autoconnect no ip4 1.2.3.4/32 ip6 fd01::42/64 $ nmcli c mod veth0+ ipv6.may-fail no $ nmcli connection up veth0+ && socat -d -d TCP6-LISTEN:1234,bind=[fd01::42] UNIX-CLIENT:/tmp/foo.sock Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10) socat[18988] E bind(3, {AF=10 [fd01:0000:0000:0000:0000:0000:0000:0042]:1234}, 28): Cannot assign requested address $ ip a show veth0 [...] inet6 fd01::42/64 scope global tentative [...] $ ip a show veth0 [...] inet6 fd01::42/64 scope global [...] $ socat -d -d TCP6-LISTEN:1234,bind=[fd01::42] UNIX-CLIENT:/tmp/foo.sock socat[19443] N listening on AF=10 [fd01:0000:0000:0000:0000:0000:0000:0042]:1234 This is not consistent with IPv4 since we guarantee that if ipv4.may-fail=no the IPv4 addresses will be usable immediately after activation (of course, since duplicate detection is done in userspace before assigning addresses to interface). How about branch bg/ipv6-dad-rh1243958 ? AFAIU systemd-networkd does something similar [1]. Maybe we should also add support for configuring the DAD timeout for IPv6 (IPv4 already has it). [1] https://github.com/systemd/systemd/issues/650
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
(In reply to Beniamino Galvani from comment #6) > How about branch bg/ipv6-dad-rh1243958 ? AFAIU systemd-networkd does > something similar [1]. dad6_timeout() always fails the connection on timeout. Don't we have a check-fail, which only fails when may-fail=no? Also, can you clear dad6_timer_id before the state-change in dad6_timeout()? priv->dad6_ip6_config and priv->dad6_timer_id is set in activate_stage5_ip6_config_commit(), but there are few places where those fields are cleared. If you are sure that you don't overwrite a previous setting, add a g_assert (!priv->dad6_ip6_config); g_assert (!priv->dad6_timer_id); priv->dad6_ip6_config = nm_ip6_config_new (nm_device_get_ip_ifindex (self)); ... It general, it seems that after clearing dad6_timer_id, you can always clear dad6_ip6_config too.
(In reply to Thomas Haller from comment #8) > dad6_timeout() always fails the connection on timeout. Don't we have a > check-fail, which only fails when may-fail=no? We have a check_ip_failed(), but it doesn't fail immediately when may-fail=no. The correct fix for this is in bug 741347. > Also, can you clear dad6_timer_id before the state-change in dad6_timeout()? Ok. > priv->dad6_ip6_config and priv->dad6_timer_id is set in > activate_stage5_ip6_config_commit(), but there are few places where those > fields are cleared. Yes, it turns out that the fields could be overwritten, I added a check. Branch repushed.
if (nm_logging_enabled (LOGL_TRACE, LOGD_DEVICE | LOGD_IP6)) { ... _LOGt (...) } should be if (_LOGt_enabled (LOGD_DEVICE | LOGD_IP6)) { ... } The reason is, that _LOGt() is a NOP unless compiled with --with-more-logging. Thus, you also want to use _LOGt_enabled() which returns always FALSE in that case. https://bugzilla.gnome.org/show_bug.cgi?id=741347 seems important, and should be fixed.
(In reply to Thomas Haller from comment #10) > The reason is, that _LOGt() is a NOP unless compiled with > --with-more-logging. Thus, you also want to use _LOGt_enabled() which > returns always FALSE in that case. Fixed. > https://bugzilla.gnome.org/show_bug.cgi?id=741347 seems important, and > should be fixed. It has been merged now and I've updated bg/ipv6-dad-rh1243958 to use the new function for failing an IP method.
Please add more comments there -- describe what ipv6_dad_start() does. It's not immediately obvious to me why would it start dad when kernel does that. Also, perhaps we could add an option to use optimistic DAD; or just honor what the sysctl is set to? (If it even is a good idea; maybe it isn't)
(In reply to Lubomir Rintel from comment #12) > Please add more comments there -- describe what ipv6_dad_start() does. It's > not immediately obvious to me why would it start dad when kernel does that. I did some changes: - reworded commit message and comments - removed timer - better deal with DAD failures - added ipv6_get_tentative_addresses() and repushed. > Also, perhaps we could add an option to use optimistic DAD; or just honor > what the sysctl is set to? (If it even is a good idea; maybe it isn't) It seems that the optimistic flag can only be added by kernel, and to autoconfigured addresses. Since NM configures addresses by itself I don't think optimistic DAD would be possible, if I understand well how it works...
+ tmp_config = nm_ip6_config_new_cloned (priv->ext_ip6_config_captured); + nm_ip6_config_intersect (tmp_config, priv->dad6_ip6_config); + num = nm_ip6_config_get_num_addresses (tmp_config); + + for (i = 0; i < num; i++) { + addr = nm_ip6_config_get_address (tmp_config, i); + dad_pending |= (addr->n_ifa_flags & IFA_F_TENTATIVE) + && !(addr->n_ifa_flags & IFA_F_DADFAILED); + } maybe you could add a function gboolean nm_ip6_config_has_any_dad_pending (const NMIP6Config *self, const NMIP6Config *candidates) which basically iterates over the addresses of @self (ignoring addresses which are not also in @candidates), and returns TRUE if there are any addresses that are TENTATIVE. it allows a more efficient implementation, which doesn't involve cloning+intersecting the NMIP6Config instance. Interesting, it uses nm_platform_ip6_address_get(). I didn't expect this function to be ever useful beside testing :) s/nm_platform_get ()/NM_PLATFORM_GET/ priv->dad6_ip6_config = ipv6_get_tentative_addresses (self); these two items are closely related, but their name is completely different. I think they should be named somehow similar (I like same prefixes)... for example: priv->dad6_ip6_config = dad6_get_tentative_addresses (self); priv->dad6_ip6_config = dad6_ip6_config_get (self); in ipv6_get_tentative_addresses(), could we cache nm_device_get_ip_ifindex() in a variable ifindex? Possibly with ifindex = nm_device_get_ip_ifindex(); g_return_val_if_fail (ifindex > 0, NULL); And can we construct dad6_config lazy, when needed: if (pl_addr && (pl_addr->n_ifa_flags & IFA_F_TENTATIVE)) { if (!dad6_config) dad6_config = nm_ip6_config_new (nm_device_get_ip_ifindex (self)); I wouldn't expect that finding dad-candidates happens in most of the cases. + /* Check if we have to wait for DAD */ + if (priv->ip6_state == IP_CONF && !priv->dad6_ip6_config) { doesn't that mean, we only create the dad6_ip6_config instance once? What if there are multiple calls to ip6_config_merge_and_apply(), don't we want to re-generate it every time? (I like the commit message, and the patch in general)
(In reply to Thomas Haller from comment #14) > maybe you could add a function > > gboolean > nm_ip6_config_has_any_dad_pending (const NMIP6Config *self, > const NMIP6Config *candidates) > > s/nm_platform_get ()/NM_PLATFORM_GET/ > these two items are closely related, but their name is completely different. > I think they should be named somehow similar (I like same prefixes)... > in ipv6_get_tentative_addresses(), could we cache nm_device_get_ip_ifindex() > in a variable ifindex? > And can we construct dad6_config lazy, when needed: Fixed all the above and repushed. > + /* Check if we have to wait for DAD */ > + if (priv->ip6_state == IP_CONF && !priv->dad6_ip6_config) { > > doesn't that mean, we only create the dad6_ip6_config instance once? What if > there are multiple calls to ip6_config_merge_and_apply(), don't we want to > re-generate it every time? activate_stage5_ip6_config_commit() can be called multiple times but only one succeeds and changes the state from IP_CONFIG to IP_CHECK. When that state transition is about to happen we are guaranteed to have all necessary addresses (static, SLAAC or DHCP for managed conf), so I think it is safe to ignore, from a DAD perspective, any following call.
For me it looks good now.
Merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=7dbabe9bdca2d9c3b9df11bea52cdd500f699d31
nm-online exits when tentative addresses are gone (dad check was performed before), that is different to 1.0.6 behaviour.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2581.html