Bug 1129500

Summary: DHCPv6-only ifup-eth fails to even run dhclient -6 due to a link-local IPv6 address still being 'tentative'
Product: [Fedora] Fedora Reporter: Ondřej Svoboda <osvoboda>
Component: dhcpAssignee: Jiri Popelka <jpopelka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 20CC: danken, jonathan, jpopelka, lnykryn, thozza, vpavlin, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: dhcp-4.2.7-2.fc20 Doc Type: Bug Fix
Doc Text:
Cause Sometimes when DHCPv6 client (i.e. dhclient with '-6' command line option) is started to configure network interface which is not fully "up". Consequence dhclient fails to run, because network interface does not have link-local address yet, which is needed for DHCPv6 client. Fix A wait loop was added into dhclient-script. Result dhclient doesn't fail due to missing link-local address.
Story Points: ---
Clone Of:
: 1130803 1130804 (view as bug list) Environment:
Last Closed: 2014-08-27 01:33:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
ifcfg file on the client side of a veth pair
none
commands to reproduce the bug none

Description Ondřej Svoboda 2014-08-13 00:53:01 UTC
Created attachment 926230 [details]
ifcfg file on the client side of a veth pair

For testing VDSM's (part of oVirt) IPv6 capabilities, I have set up a simple virtual ethernet environment with dnsmasq acting as a DHCPv6 server on one side and ifup (run by VDSM or alone) on the other.

Only DHCPV6C is enabled in an ifcfg file – the setup is meant to rely on DHCPv6.

As there is no delay after assigning a link-local address to the 'client' side, the address is reported as 'tentative' (by e.g. 'ip addr show veth_45') and cannot yet be used. dhclient -6 thus refuses to start. From the journal, its error output is as following:

Can't bind to dhcp address: Cannot assign requested address
Please make sure there is no other dhcp server
running and that there's no entry for dhcp or
bootp in /etc/inetd.conf.   Also make sure you
are not running HP JetAdmin software, which
includes a bootp server.

Giving the address some time to "stabilize" is enough (and a must). As a preliminary solution, I added a 'sleep 1' before 'dhclient -6' invocation in ifup-eth (before line 311 in https://git.fedorahosted.org/cgit/initscripts.git/tree/sysconfig/network-scripts/ifup-eth#n311)

A simple way to fix this issue is to busy-wait until 'ip -6 -o addr show veth_45' stops reporting the 'tentative' flag. A harder one is to wait in dhclient.

ifcfg-veth_45 is attached, I will also add a shell script with commands used for testing. For convenience, they are also below:

# Generated by VDSM version 4.16.0-175.gite9e8110.fc20
DEVICE=veth_45
#HWADDR=52:27:15:53:22:04
ONBOOT=no
BOOTPROTO=none
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=yes
DHCPV6C=yes
IPV6_AUTOCONF=no

ip link add name veth_63 type veth peer name veth_45
ip addr add dev veth_63 fdb3:84e5:4ff4:55e3::1/64
ip link set dev veth_63 up 
firewall-cmd --zone=trusted --change-interface=veth_63
dnsmasq --dhcp-authoritative -p 0 --dhcp-option=3 -O 6 -i veth_63 -I lo --bind-dynamic --dhcp-range=fdb3:84e5:4ff4:55e3::a,fdb3:84e5:4ff4:55e3::64,2m
ifup veth_45

Comment 1 Ondřej Svoboda 2014-08-13 00:55:24 UTC
Created attachment 926231 [details]
commands to reproduce the bug

Comment 2 Jiri Popelka 2014-08-13 14:19:31 UTC
(In reply to Ondřej Svoboda from comment #0)
> A simple way to fix this issue is to busy-wait until 'ip -6 -o addr show
> veth_45' stops reporting the 'tentative' flag. A harder one is to wait in
> dhclient.

We might add such a waiting into dhclient-script after we 'up' the interface in PREINIT6, see line 707 in
http://pkgs.fedoraproject.org/cgit/dhcp.git/tree/dhclient-script

Comment 3 Jiri Popelka 2014-08-13 14:41:04 UTC
Something like this works for me:

    PREINIT6)
        # ensure interface is up
        ip link set dev ${interface} up
        
        # remove any stale addresses from aborted clients
        ip -6 addr flush dev ${interface} scope global permanent

        for i in $(seq 3); do
            # tentative flag == DAD is still not complete
            tentative=$(ip -6 addr show dev ${interface} scope link | grep tentative)
            if [ -z "${tentative}" ] ; then
                # DAD is over
                exit_with_hooks 0
            fi
            sleep 1
        done

        exit_with_hooks 0
        ;;

Could you try to modify PREINIT6 in /usr/sbin/dhclient-script like that and see if it fixes the problem for you Ondrej ?

Comment 4 Ondřej Svoboda 2014-08-13 16:21:17 UTC
I confirm this approach works. It takes ~1.5 second for the link-local address to lose the tentative flag and approx. 2 seconds for the global address, "measured" using the following snippet in the same environment as originally.

 while true; do date +%X.%N; ip addr show dev veth_45; sleep 0.1; done;

To avoid waiting too much, could we shorten the period to less than half a second? If it would not be too resource-intensive (which I doubt), even go down to 0.1 s and have 30–50 cycles (in a similar situation, there is a 5s limit in dhclient-script [1]).

[1] http://pkgs.fedoraproject.org/cgit/dhcp.git/tree/dhclient-script#n563

Comment 5 Jiri Popelka 2014-08-14 09:43:21 UTC
Fixed in rawhide/F21 with:
http://pkgs.fedoraproject.org/cgit/dhcp.git/commit/?h=f21&id=991bd354d956cc2f31ba75689ddcca0021706f0b

I'd rather not push this into F20.

Comment 6 Ondřej Svoboda 2014-08-14 11:38:38 UTC
It does not happen 100% of time but the very existence of the link-local address is not guaranteed that early (after 100ms). We also have to check that

 linklocal=$(ip -6 addr show dev ${interface} scope link)
 …
 if [ ! -z "${linklocal}" -a -z "${tentative}" ] ; then

Dan, I assume we need this change backported to EL because that is where VDSM is supposed to run. What systems do we want to support DHCPv6 in?

Comment 7 Jiri Popelka 2014-08-14 11:56:44 UTC
Ondrej, what about this ?

for i in $(seq 50); do
    linklocal=$(ip -6 addr show dev ${interface} scope link)
    tentative=$(echo "${linklocal}" | grep tentative)
    [[ -n "${linklocal}" &&  -z "${tentative}" ]] && exit_with_hooks 0
    sleep 0.1
done

Comment 8 Ondřej Svoboda 2014-08-14 12:11:32 UTC
Jirka, thanks, this is it exactly and work reliably :-)

Comment 9 Dan Kenigsberg 2014-08-14 12:52:33 UTC
Jiri, can we have this backported to f20, el7 and el6? We would oVirt's Vdsm to start using dhcp6.

Comment 10 Jiri Popelka 2014-08-14 13:07:19 UTC
(In reply to Dan Kenigsberg from comment #9)
> Jiri, can we have this backported to f20

Sure, dhcp-4.2.7-2.fc20 should do the trick:
http://pkgs.fedoraproject.org/cgit/dhcp.git/commit/?h=f20&id=88bb3be48ee88795f43ba90c4ef1bf3a33a03c96

> el7 and el6? 

You have to clone this bug for these products if you want it to be fixed there.

Comment 11 Fedora Update System 2014-08-15 07:56:52 UTC
dhcp-4.2.7-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/FEDORA-2014-8287/dhcp-4.2.7-2.fc20

Comment 12 Fedora Update System 2014-08-16 00:29:50 UTC
Package dhcp-4.2.7-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dhcp-4.2.7-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-8287/dhcp-4.2.7-2.fc20
then log in and leave karma (feedback).

Comment 13 Ondřej Svoboda 2014-08-17 23:34:39 UTC
4.2.7-2.fc20 works fine! :-) I cloned the bug to EL6 and EL7.

Comment 14 Fedora Update System 2014-08-27 01:33:13 UTC
dhcp-4.2.7-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.