Bug 610219 - When dhcp server is moved after failover into partner-down state, it doesn't take over partner's leases after MCLT expiration
Summary: When dhcp server is moved after failover into partner-down state, it doesn't ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: dhcp
Version: 5.7
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Jiri Popelka
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 621838
TreeView+ depends on / blocked
 
Reported: 2010-07-01 19:13 UTC by Martin Osvald 🛹
Modified: 2018-12-09 16:42 UTC (History)
7 users (show)

Fixed In Version: dhcp-3.0.5-24.el5
Doc Type: Bug Fix
Doc Text:
Previously, moving the server from the "communication-interrupted" state to the "partner-down" state did not force the server to take over the partner's leases. Consequently, clients could not get an IP address from the pool of the previously terminated DHCP server. With this update, a failover server in "partner-down" state is able to re-allocate leases to clients.
Clone Of:
Environment:
Last Closed: 2011-07-21 09:04:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch which fixes the bug (2.15 KB, patch)
2010-07-01 19:13 UTC, Martin Osvald 🛹
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1038 0 normal SHIPPED_LIVE dhcp bug fix and enhancement update 2011-07-20 15:43:54 UTC

Description Martin Osvald 🛹 2010-07-01 19:13:42 UTC
Created attachment 428530 [details]
patch which fixes the bug

Description of problem:

If someone knows that one of failover partners will be down for some time and wants to instruct the other one to take over all partners leases, the mechanism for doing it doesn't work - more precisely - moving the server from communication-interrupted into partner-down state doesn't force dhcpd to take over the partner's leases. 


Version-Release number of selected component (if applicable):

RHEL5 dhcp-3.0.5-23.el5 (arch independent)


How reproducible:

Always


Steps to Reproduce:

1. Install RHEL5.5 on three virtual machines, on two of them install dhcp package (use net user type of network to be able to connect outside of the host to /update system/download dhcpd package/).

2. Run the following commands as root on the host to setup network for virtual machines (use different bridge if you use br1 already):

 $ tunctl -t tap1 -u root
 $ tunctl -t tap2 -u root
 $ tunctl -t tap3 -u root
 $ ip link set tap1 up
 $ ip link set tap2 up
 $ ip link set tap3 up
 $ brctl addbr br1
 $ brctl addif br1 tap1
 $ brctl addif br1 tap2
 $ brctl addif br1 tap3
 $ ip link set br1 up
 $ ifconfig br1 192.168.1.1 netmask 255.255.255.0

3. Terminate and start all the virtual machines again using the following commands for:

primary:

$ qemu-kvm -hda /data/vm-images/RHEL5.5-Server-20100322.0-i386-DVD.img_it797933_dhcpmaster -m 1000 -net nic,macaddr=DE:AD:BE:EF:25:2A -net tap,ifname=tap1,script=no,downscript=no -cdrom /data/install/RHEL5.5-Server-20100322.0-i386-DVD.iso -boot c

secondary:

$ qemu-kvm -hda /data/vm-images/RHEL5.5-Server-20100322.0-i386-DVD.img_it797933_dhcpslave -m 1000 -net nic,macaddr=DE:AD:BE:EF:25:2C -net tap,ifname=tap2,script=no,downscript=no -cdrom /data/install/RHEL5.5-Server-20100322.0-i386-DVD.iso -boot c

client:

$ qemu-kvm -hda /data/vm-images/RHEL5.5-Server-20100322.0-i386-DVD.img_it797933_dhcpclient -m 1000 -net nic,macaddr=DE:AD:BE:EF:25:2D -net tap,ifname=tap3,script=no,downscript=no -cdrom /data/install/RHEL5.5-Server-20100322.0-i386-DVD.iso -boot c

4. Manually setup network on the primary:

$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.1.210
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
HWADDR=DE:AD:BE:EF:25:2A
$

5. Manually setup network on the secondary:

$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.1.211
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
HWADDR=DE:AD:BE:EF:25:2C
$

6. You can set up also network on the client manually, but something similar should be there already/automatically so you don't have to and we will call dhclient directly anyway:

$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
HWADDR=DE:AD:BE:EF:25:2D
$

7. Restart machines or network scripts

8. Setup dhcpd on the primary (mclt is set to low value intentionaly to reproduce the issue in a short time. Also range of ip addresses is short intentionally, don't set shorter ip range as the dhcpd contains another bug which prevents correct ip balancing when ip range contains < 5 adresses, leading to zero backup queue - I will file this bug soon):

$ cat /etc/dhcpd.conf
ddns-update-style interim;
ignore client-updates;

failover peer "testicek" {
       primary;
       address 192.168.1.210;
       port 519;
       peer address 192.168.1.211;
       peer port 519;
       max-response-delay 60;
       max-unacked-updates 10;
       mclt 60;
       split 128;
       load balance max seconds 3;
}
include "/etc/dhcpd.conf.common";
$

$ cat /etc/dhcpd.conf.common
default-lease-time 3600;
max-lease-time 3600;
ddns-update-style none;
omapi-port 7911;

subnet 192.168.1.0 netmask 255.255.255.0 {
 not authoritative;
pool {
deny dynamic bootp clients;
range 192.168.1.5 192.168.1.10;
option routers 192.168.1.1;
option subnet-mask  255.255.255.0;
failover peer "testicek";
}
}
$

9. Setup dhcpd on the secondary:

$ cat /etc/dhcpd.conf
ddns-update-style interim;
ignore client-updates;

failover peer "testicek" {
       secondary;
       address 192.168.1.211;
       port 519;
       peer address 192.168.1.210;
       peer port 519;
       max-response-delay 60;
       max-unacked-updates 10;
       load balance max seconds 3;
}
include "/etc/dhcpd.conf.common";
$

$ cat /etc/dhcpd.conf.common
default-lease-time 3600;
max-lease-time 3600;
ddns-update-style none;
omapi-port 7911;

subnet 192.168.1.0 netmask 255.255.255.0 {
 not authoritative;
pool {
deny dynamic bootp clients;
range 192.168.1.5 192.168.1.10;
option routers 192.168.1.1;
option subnet-mask  255.255.255.0;
failover peer "testicek";
}
}
$

10. Start dhcpd servers on both virtual machines (e.g. by executing dhcpd init script):

 $ /etc/init.d/dhcpd start

11. Run the following on the client machine:

 $ /etc/init.d/network stop
 $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a1;
 $ dhclient -d -lf ~/test_lease_file

The above will assign the client an address.

12. End the dhclient by pressing Ctrl+c straight after the client gets ip address and terminate dhcpd on the machine which offered and acked the lease for the client:

 $ /etc/init.d/dhcpd stop

13. Run the dhclient on the client machine 3 times and every time the dhclient gets ip address terminate it with Ctrl+c (note that mac address is unique every time):

 $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a2;
 $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file

wait and press Ctrl+c

 $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a3;
 $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file

wait and press Ctrl+c

 $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a4;
 $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file

wait and press Ctrl+c

The above commands will occupy 3 leases from backup queue and lead to occupying all leases assignable by the running dhcpd. If you run dhclient again you will see the following message in /var/log/messages:

Jul  1 19:19:41 <hostname> dhcpd: DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases

14. Move the running dhcpd into partner-down state, by executing the following script on the server (if you want to compare the behaviour of dhcp-3.0.5 with dhcp-3.1 and 4.1 with the same setup, you will have to change 'set local-state = 1' to 'set local-state = 4'):

$ cat change_state.sh 
#!/bin/sh
omshell << EOF
connect
new failover-state
set name = "testicek"
open
set local-state = 1
update
EOF
$

$ . change_state.sh

After running the above script you will see the following message in /var/log/messages:

Jul  1 19:19:48 <hostname> dhcpd: failover peer testicek: I move from communications-interrupted to partner-down

15. Wait one minute to elapse MCLT and call dhclient on the client machine again:

 $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a5;
 $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file


Actual results:

Client doesn't get an ip address from pool of previously terminated dhcp server.

Note: In the following results ip addresses in range 192.168.1.5 - 192.168.1.7 were load balanced to primary and 192.168.1.8 - 192.168.1.10 were load balanced to secondary (according to 'binding state' from leases file).

primary:

$ dhcpd -d -f -lf ~/lala
Internet Systems Consortium DHCP Server V3.0.5-RedHat
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
Wrote 0 leases to leases file.
Listening on LPF/eth0/de:ad:be:ef:25:2a/192.168.1/24
Sending on   LPF/eth0/de:ad:be:ef:25:2a/192.168.1/24
Sending on   Socket/fallback/fallback-net
failover peer testicek: I move from recover to startup
failover peer testicek: peer moves from unknown-state to recover
failover peer testicek: requesting full update from peer
failover peer testicek: I move from startup to recover
Sent update request all message to testicek
failover peer testicek: peer moves from recover to recover
failover peer testicek: requesting full update from peer
Sent update done message to testicek
Update request all from testicek: nothing pending
failover peer testicek: peer update completed.
failover peer testicek: I move from recover to recover-done
failover peer testicek: peer moves from recover to recover-done
failover peer testicek: I move from recover-done to normal
failover peer testicek: peer moves from recover-done to normal
pool 9a2f840 192.168.1/24 total 6  free 6  backup 0  lts -3
pool 9a2f840 192.168.1/24  total 6  free 6  backup 0  lts 3
pool 9a2f840 192.168.1/24 total 6  free 3  backup 3  lts 0
DHCPDISCOVER from de:ad:be:ef:25:a1 via eth0
DHCPOFFER on 192.168.1.7 to de:ad:be:ef:25:a1 via eth0
DHCPREQUEST for 192.168.1.7 (192.168.1.210) from de:ad:be:ef:25:a1 via eth0
DHCPACK on 192.168.1.7 to de:ad:be:ef:25:a1 via eth0
<Ctrl+c>
$

secondary:

$ dhcpd -d -f -lf ~/lala
Internet Systems Consortium DHCP Server V3.0.5-RedHat
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
Wrote 0 leases to leases file.
Listening on LPF/eth0/de:ad:be:ef:25:2c/192.168.1/24
Sending on   LPF/eth0/de:ad:be:ef:25:2c/192.168.1/24
Sending on   Socket/fallback/fallback-net
failover peer testicek: I move from recover to startup
failover peer testicek: peer moves from unknown-state to recover
failover peer testicek: requesting full update from peer
failover peer testicek: I move from startup to recover
Sent update request all message to testicek
failover peer testicek: peer moves from recover to recover
failover peer testicek: requesting full update from peer
Sent update done message to testicek
Update request all from testicek: nothing pending
failover peer testicek: peer update completed.
failover peer testicek: I move from recover to recover-done
failover peer testicek: peer moves from recover to recover-done
failover peer testicek: I move from recover-done to normal
failover peer testicek: peer moves from recover-done to normal
pool 90bd800 192.168.1/24 total 6  free 6  backup 0  lts 3
pool response: 3 leases
pool 90bd800 192.168.1/24 total 6  free 3  backup 3  lts 0
DHCPDISCOVER from de:ad:be:ef:25:a1 via eth0: load balance to peer testicek
DHCPREQUEST for 192.168.1.7 (192.168.1.210) from de:ad:be:ef:25:a1 via eth0: lease owned by peer
peer testicek: disconnected
failover peer testicek: I move from normal to communications-interrupted
DHCPDISCOVER from de:ad:be:ef:25:a2 via eth0
DHCPOFFER on 192.168.1.8 to de:ad:be:ef:25:a2 via eth0
DHCPREQUEST for 192.168.1.8 (192.168.1.211) from de:ad:be:ef:25:a2 via eth0
DHCPACK on 192.168.1.8 to de:ad:be:ef:25:a2 via eth0
DHCPDISCOVER from de:ad:be:ef:25:a3 via eth0
DHCPOFFER on 192.168.1.9 to de:ad:be:ef:25:a3 via eth0
DHCPREQUEST for 192.168.1.9 (192.168.1.211) from de:ad:be:ef:25:a3 via eth0
DHCPACK on 192.168.1.9 to de:ad:be:ef:25:a3 via eth0
DHCPDISCOVER from de:ad:be:ef:25:a4 via eth0
DHCPOFFER on 192.168.1.10 to de:ad:be:ef:25:a4 via eth0
DHCPREQUEST for 192.168.1.10 (192.168.1.211) from de:ad:be:ef:25:a4 via eth0
DHCPACK on 192.168.1.10 to de:ad:be:ef:25:a4 via eth0
DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases
DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases
failover peer testicek: I move from communications-interrupted to partner-down
DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases <<<---
DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases <<<---
DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases <<<---

client:

$ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a1;
$ dhclient -d -lf ~/lala
Internet Systems Consortium DHCP Client V3.0.5-RedHat
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/

Listening on LPF/eth0/de:ad:be:ef:25:a1
Sending on   LPF/eth0/de:ad:be:ef:25:a1
Sending on   Socket/fallback
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
DHCPOFFER from 192.168.1.210
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.210
bound to 192.168.1.7 -- renewal in 32 seconds.
<Ctrl+c>
$

$ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a2;
$ dhclient -d -lf ~/lala
...<snip>
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7
DHCPOFFER from 192.168.1.211
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.211
bound to 192.168.1.8 -- renewal in 32 seconds.
<Ctrl+c>
$

$ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a3;
$ dhclient -d -lf ~/lala
...<snip>
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
DHCPOFFER from 192.168.1.211
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.211
bound to 192.168.1.9 -- renewal in 31 seconds.
<Ctrl+c>
$

$ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a4;
$ dhclient -d -lf ~/lala
...<snip>
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
DHCPOFFER from 192.168.1.211
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.211
bound to 192.168.1.10 -- renewal in 25 seconds.
<Ctrl+c>
$

$ date -u
Thu Jul  1 18:19:13 UTC 2010
$ date -u
Thu Jul  1 18:22:54 UTC 2010
$

$ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a5;
$ dhclient -d -lf ~/lala
...<snip>
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8   <<<---
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 12  <<<---
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 19  <<<---
<Ctrl+c>
$


Expected results:

Client should be able to get an ip address from pool of previously terminated dhcp server.

secondary:

$ dhcpd -d -f -lf ~/lala
...<snip>
failover peer testicek: I move from communications-interrupted to partner-down
DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0
DHCPOFFER on 192.168.1.6 to de:ad:be:ef:25:a5 via eth0
DHCPREQUEST for 192.168.1.6 (192.168.1.211) from de:ad:be:ef:25:a5 via eth0
DHCPACK on 192.168.1.6 to de:ad:be:ef:25:a5 via eth0 <<<---

client:

$ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a5;
$ dhclient -d -lf ~/lala
...<snip>
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
DHCPOFFER from 192.168.1.211
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.211
bound to 192.168.1.6 -- renewal in 24 seconds.
<Ctrl+c>
$


Additional info:

This bug is caused by the fact, that dhcd doesn't contain (in 3.0.5) a code which would in case of failover and partner-down state assign adresses also from free queue. Attached patch solves the bug. The patch was backported from the upstream dhcp-3.1-ESV (http://ftp.isc.org/isc/dhcp/dhcp-3.1-ESV.tar.gz).

Comment 9 Tomas Capek 2011-05-31 08:16:50 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, moving the server from the "communication-interrupted" state to the "partner-down" state did not force the server to take over the partner's leases. Consequently, clients could not get an IP address from the pool of the previously terminated DHCP server. With this update, a failover server in "partner-down" state is able to re-allocate leases to clients.

Comment 10 Jan Stodola 2011-06-06 07:38:57 UTC
Tested with dhcp-3.0.5-29.el5 using reproducer from comment 0.

After stopping the first DHCP server, moving the second one into partner-down state and waiting until MCLT has expired, DHCP client received new IP address from the range, which was originally load-balanced to the first DHCP server.

Moving to VERIFIED.

Comment 11 errata-xmlrpc 2011-07-21 09:04:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1038.html


Note You need to log in before you can comment on or make changes to this bug.