Bug 1129500 - DHCPv6-only ifup-eth fails to even run dhclient -6 due to a link-local IPv6 address still being 'tentative'
Summary: DHCPv6-only ifup-eth fails to even run dhclient -6 due to a link-local IPv6 a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dhcp
Version: 20
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jiri Popelka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-13 00:53 UTC by Ondřej Svoboda
Modified: 2014-12-17 14:45 UTC (History)
7 users (show)

Fixed In Version: dhcp-4.2.7-2.fc20
Doc Type: Bug Fix
Doc Text:
Cause Sometimes when DHCPv6 client (i.e. dhclient with '-6' command line option) is started to configure network interface which is not fully "up". Consequence dhclient fails to run, because network interface does not have link-local address yet, which is needed for DHCPv6 client. Fix A wait loop was added into dhclient-script. Result dhclient doesn't fail due to missing link-local address.
Clone Of:
: 1130803 1130804 (view as bug list)
Environment:
Last Closed: 2014-08-27 01:33:13 UTC


Attachments (Terms of Use)
ifcfg file on the client side of a veth pair (192 bytes, text/plain)
2014-08-13 00:53 UTC, Ondřej Svoboda
no flags Details
commands to reproduce the bug (359 bytes, application/x-shellscript)
2014-08-13 00:55 UTC, Ondřej Svoboda
no flags Details

Description Ondřej Svoboda 2014-08-13 00:53:01 UTC
Created attachment 926230 [details]
ifcfg file on the client side of a veth pair

For testing VDSM's (part of oVirt) IPv6 capabilities, I have set up a simple virtual ethernet environment with dnsmasq acting as a DHCPv6 server on one side and ifup (run by VDSM or alone) on the other.

Only DHCPV6C is enabled in an ifcfg file – the setup is meant to rely on DHCPv6.

As there is no delay after assigning a link-local address to the 'client' side, the address is reported as 'tentative' (by e.g. 'ip addr show veth_45') and cannot yet be used. dhclient -6 thus refuses to start. From the journal, its error output is as following:

Can't bind to dhcp address: Cannot assign requested address
Please make sure there is no other dhcp server
running and that there's no entry for dhcp or
bootp in /etc/inetd.conf.   Also make sure you
are not running HP JetAdmin software, which
includes a bootp server.

Giving the address some time to "stabilize" is enough (and a must). As a preliminary solution, I added a 'sleep 1' before 'dhclient -6' invocation in ifup-eth (before line 311 in https://git.fedorahosted.org/cgit/initscripts.git/tree/sysconfig/network-scripts/ifup-eth#n311)

A simple way to fix this issue is to busy-wait until 'ip -6 -o addr show veth_45' stops reporting the 'tentative' flag. A harder one is to wait in dhclient.

ifcfg-veth_45 is attached, I will also add a shell script with commands used for testing. For convenience, they are also below:

# Generated by VDSM version 4.16.0-175.gite9e8110.fc20
DEVICE=veth_45
#HWADDR=52:27:15:53:22:04
ONBOOT=no
BOOTPROTO=none
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=yes
DHCPV6C=yes
IPV6_AUTOCONF=no

ip link add name veth_63 type veth peer name veth_45
ip addr add dev veth_63 fdb3:84e5:4ff4:55e3::1/64
ip link set dev veth_63 up 
firewall-cmd --zone=trusted --change-interface=veth_63
dnsmasq --dhcp-authoritative -p 0 --dhcp-option=3 -O 6 -i veth_63 -I lo --bind-dynamic --dhcp-range=fdb3:84e5:4ff4:55e3::a,fdb3:84e5:4ff4:55e3::64,2m
ifup veth_45

Comment 1 Ondřej Svoboda 2014-08-13 00:55:24 UTC
Created attachment 926231 [details]
commands to reproduce the bug

Comment 2 Jiri Popelka 2014-08-13 14:19:31 UTC
(In reply to Ondřej Svoboda from comment #0)
> A simple way to fix this issue is to busy-wait until 'ip -6 -o addr show
> veth_45' stops reporting the 'tentative' flag. A harder one is to wait in
> dhclient.

We might add such a waiting into dhclient-script after we 'up' the interface in PREINIT6, see line 707 in
http://pkgs.fedoraproject.org/cgit/dhcp.git/tree/dhclient-script

Comment 3 Jiri Popelka 2014-08-13 14:41:04 UTC
Something like this works for me:

    PREINIT6)
        # ensure interface is up
        ip link set dev ${interface} up
        
        # remove any stale addresses from aborted clients
        ip -6 addr flush dev ${interface} scope global permanent

        for i in $(seq 3); do
            # tentative flag == DAD is still not complete
            tentative=$(ip -6 addr show dev ${interface} scope link | grep tentative)
            if [ -z "${tentative}" ] ; then
                # DAD is over
                exit_with_hooks 0
            fi
            sleep 1
        done

        exit_with_hooks 0
        ;;

Could you try to modify PREINIT6 in /usr/sbin/dhclient-script like that and see if it fixes the problem for you Ondrej ?

Comment 4 Ondřej Svoboda 2014-08-13 16:21:17 UTC
I confirm this approach works. It takes ~1.5 second for the link-local address to lose the tentative flag and approx. 2 seconds for the global address, "measured" using the following snippet in the same environment as originally.

 while true; do date +%X.%N; ip addr show dev veth_45; sleep 0.1; done;

To avoid waiting too much, could we shorten the period to less than half a second? If it would not be too resource-intensive (which I doubt), even go down to 0.1 s and have 30–50 cycles (in a similar situation, there is a 5s limit in dhclient-script [1]).

[1] http://pkgs.fedoraproject.org/cgit/dhcp.git/tree/dhclient-script#n563

Comment 5 Jiri Popelka 2014-08-14 09:43:21 UTC
Fixed in rawhide/F21 with:
http://pkgs.fedoraproject.org/cgit/dhcp.git/commit/?h=f21&id=991bd354d956cc2f31ba75689ddcca0021706f0b

I'd rather not push this into F20.

Comment 6 Ondřej Svoboda 2014-08-14 11:38:38 UTC
It does not happen 100% of time but the very existence of the link-local address is not guaranteed that early (after 100ms). We also have to check that

 linklocal=$(ip -6 addr show dev ${interface} scope link)
 …
 if [ ! -z "${linklocal}" -a -z "${tentative}" ] ; then

Dan, I assume we need this change backported to EL because that is where VDSM is supposed to run. What systems do we want to support DHCPv6 in?

Comment 7 Jiri Popelka 2014-08-14 11:56:44 UTC
Ondrej, what about this ?

for i in $(seq 50); do
    linklocal=$(ip -6 addr show dev ${interface} scope link)
    tentative=$(echo "${linklocal}" | grep tentative)
    [[ -n "${linklocal}" &&  -z "${tentative}" ]] && exit_with_hooks 0
    sleep 0.1
done

Comment 8 Ondřej Svoboda 2014-08-14 12:11:32 UTC
Jirka, thanks, this is it exactly and work reliably :-)

Comment 9 Dan Kenigsberg 2014-08-14 12:52:33 UTC
Jiri, can we have this backported to f20, el7 and el6? We would oVirt's Vdsm to start using dhcp6.

Comment 10 Jiri Popelka 2014-08-14 13:07:19 UTC
(In reply to Dan Kenigsberg from comment #9)
> Jiri, can we have this backported to f20

Sure, dhcp-4.2.7-2.fc20 should do the trick:
http://pkgs.fedoraproject.org/cgit/dhcp.git/commit/?h=f20&id=88bb3be48ee88795f43ba90c4ef1bf3a33a03c96

> el7 and el6? 

You have to clone this bug for these products if you want it to be fixed there.

Comment 11 Fedora Update System 2014-08-15 07:56:52 UTC
dhcp-4.2.7-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/FEDORA-2014-8287/dhcp-4.2.7-2.fc20

Comment 12 Fedora Update System 2014-08-16 00:29:50 UTC
Package dhcp-4.2.7-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dhcp-4.2.7-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-8287/dhcp-4.2.7-2.fc20
then log in and leave karma (feedback).

Comment 13 Ondřej Svoboda 2014-08-17 23:34:39 UTC
4.2.7-2.fc20 works fine! :-) I cloned the bug to EL6 and EL7.

Comment 14 Fedora Update System 2014-08-27 01:33:13 UTC
dhcp-4.2.7-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.