Created attachment 428530 [details] patch which fixes the bug Description of problem: If someone knows that one of failover partners will be down for some time and wants to instruct the other one to take over all partners leases, the mechanism for doing it doesn't work - more precisely - moving the server from communication-interrupted into partner-down state doesn't force dhcpd to take over the partner's leases. Version-Release number of selected component (if applicable): RHEL5 dhcp-3.0.5-23.el5 (arch independent) How reproducible: Always Steps to Reproduce: 1. Install RHEL5.5 on three virtual machines, on two of them install dhcp package (use net user type of network to be able to connect outside of the host to /update system/download dhcpd package/). 2. Run the following commands as root on the host to setup network for virtual machines (use different bridge if you use br1 already): $ tunctl -t tap1 -u root $ tunctl -t tap2 -u root $ tunctl -t tap3 -u root $ ip link set tap1 up $ ip link set tap2 up $ ip link set tap3 up $ brctl addbr br1 $ brctl addif br1 tap1 $ brctl addif br1 tap2 $ brctl addif br1 tap3 $ ip link set br1 up $ ifconfig br1 192.168.1.1 netmask 255.255.255.0 3. Terminate and start all the virtual machines again using the following commands for: primary: $ qemu-kvm -hda /data/vm-images/RHEL5.5-Server-20100322.0-i386-DVD.img_it797933_dhcpmaster -m 1000 -net nic,macaddr=DE:AD:BE:EF:25:2A -net tap,ifname=tap1,script=no,downscript=no -cdrom /data/install/RHEL5.5-Server-20100322.0-i386-DVD.iso -boot c secondary: $ qemu-kvm -hda /data/vm-images/RHEL5.5-Server-20100322.0-i386-DVD.img_it797933_dhcpslave -m 1000 -net nic,macaddr=DE:AD:BE:EF:25:2C -net tap,ifname=tap2,script=no,downscript=no -cdrom /data/install/RHEL5.5-Server-20100322.0-i386-DVD.iso -boot c client: $ qemu-kvm -hda /data/vm-images/RHEL5.5-Server-20100322.0-i386-DVD.img_it797933_dhcpclient -m 1000 -net nic,macaddr=DE:AD:BE:EF:25:2D -net tap,ifname=tap3,script=no,downscript=no -cdrom /data/install/RHEL5.5-Server-20100322.0-i386-DVD.iso -boot c 4. Manually setup network on the primary: $ cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=none ONBOOT=yes IPADDR=192.168.1.210 NETMASK=255.255.255.0 GATEWAY=192.168.1.1 HWADDR=DE:AD:BE:EF:25:2A $ 5. Manually setup network on the secondary: $ cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=none ONBOOT=yes IPADDR=192.168.1.211 NETMASK=255.255.255.0 GATEWAY=192.168.1.1 HWADDR=DE:AD:BE:EF:25:2C $ 6. You can set up also network on the client manually, but something similar should be there already/automatically so you don't have to and we will call dhclient directly anyway: $ cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=dhcp ONBOOT=yes HWADDR=DE:AD:BE:EF:25:2D $ 7. Restart machines or network scripts 8. Setup dhcpd on the primary (mclt is set to low value intentionaly to reproduce the issue in a short time. Also range of ip addresses is short intentionally, don't set shorter ip range as the dhcpd contains another bug which prevents correct ip balancing when ip range contains < 5 adresses, leading to zero backup queue - I will file this bug soon): $ cat /etc/dhcpd.conf ddns-update-style interim; ignore client-updates; failover peer "testicek" { primary; address 192.168.1.210; port 519; peer address 192.168.1.211; peer port 519; max-response-delay 60; max-unacked-updates 10; mclt 60; split 128; load balance max seconds 3; } include "/etc/dhcpd.conf.common"; $ $ cat /etc/dhcpd.conf.common default-lease-time 3600; max-lease-time 3600; ddns-update-style none; omapi-port 7911; subnet 192.168.1.0 netmask 255.255.255.0 { not authoritative; pool { deny dynamic bootp clients; range 192.168.1.5 192.168.1.10; option routers 192.168.1.1; option subnet-mask 255.255.255.0; failover peer "testicek"; } } $ 9. Setup dhcpd on the secondary: $ cat /etc/dhcpd.conf ddns-update-style interim; ignore client-updates; failover peer "testicek" { secondary; address 192.168.1.211; port 519; peer address 192.168.1.210; peer port 519; max-response-delay 60; max-unacked-updates 10; load balance max seconds 3; } include "/etc/dhcpd.conf.common"; $ $ cat /etc/dhcpd.conf.common default-lease-time 3600; max-lease-time 3600; ddns-update-style none; omapi-port 7911; subnet 192.168.1.0 netmask 255.255.255.0 { not authoritative; pool { deny dynamic bootp clients; range 192.168.1.5 192.168.1.10; option routers 192.168.1.1; option subnet-mask 255.255.255.0; failover peer "testicek"; } } $ 10. Start dhcpd servers on both virtual machines (e.g. by executing dhcpd init script): $ /etc/init.d/dhcpd start 11. Run the following on the client machine: $ /etc/init.d/network stop $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a1; $ dhclient -d -lf ~/test_lease_file The above will assign the client an address. 12. End the dhclient by pressing Ctrl+c straight after the client gets ip address and terminate dhcpd on the machine which offered and acked the lease for the client: $ /etc/init.d/dhcpd stop 13. Run the dhclient on the client machine 3 times and every time the dhclient gets ip address terminate it with Ctrl+c (note that mac address is unique every time): $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a2; $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file wait and press Ctrl+c $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a3; $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file wait and press Ctrl+c $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a4; $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file wait and press Ctrl+c The above commands will occupy 3 leases from backup queue and lead to occupying all leases assignable by the running dhcpd. If you run dhclient again you will see the following message in /var/log/messages: Jul 1 19:19:41 <hostname> dhcpd: DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases 14. Move the running dhcpd into partner-down state, by executing the following script on the server (if you want to compare the behaviour of dhcp-3.0.5 with dhcp-3.1 and 4.1 with the same setup, you will have to change 'set local-state = 1' to 'set local-state = 4'): $ cat change_state.sh #!/bin/sh omshell << EOF connect new failover-state set name = "testicek" open set local-state = 1 update EOF $ $ . change_state.sh After running the above script you will see the following message in /var/log/messages: Jul 1 19:19:48 <hostname> dhcpd: failover peer testicek: I move from communications-interrupted to partner-down 15. Wait one minute to elapse MCLT and call dhclient on the client machine again: $ > ~/test_lease_file; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a5; $ dhclient -d -1 -lf /var/lib/dhclient/some_name_for_lease_file Actual results: Client doesn't get an ip address from pool of previously terminated dhcp server. Note: In the following results ip addresses in range 192.168.1.5 - 192.168.1.7 were load balanced to primary and 192.168.1.8 - 192.168.1.10 were load balanced to secondary (according to 'binding state' from leases file). primary: $ dhcpd -d -f -lf ~/lala Internet Systems Consortium DHCP Server V3.0.5-RedHat Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Wrote 0 leases to leases file. Listening on LPF/eth0/de:ad:be:ef:25:2a/192.168.1/24 Sending on LPF/eth0/de:ad:be:ef:25:2a/192.168.1/24 Sending on Socket/fallback/fallback-net failover peer testicek: I move from recover to startup failover peer testicek: peer moves from unknown-state to recover failover peer testicek: requesting full update from peer failover peer testicek: I move from startup to recover Sent update request all message to testicek failover peer testicek: peer moves from recover to recover failover peer testicek: requesting full update from peer Sent update done message to testicek Update request all from testicek: nothing pending failover peer testicek: peer update completed. failover peer testicek: I move from recover to recover-done failover peer testicek: peer moves from recover to recover-done failover peer testicek: I move from recover-done to normal failover peer testicek: peer moves from recover-done to normal pool 9a2f840 192.168.1/24 total 6 free 6 backup 0 lts -3 pool 9a2f840 192.168.1/24 total 6 free 6 backup 0 lts 3 pool 9a2f840 192.168.1/24 total 6 free 3 backup 3 lts 0 DHCPDISCOVER from de:ad:be:ef:25:a1 via eth0 DHCPOFFER on 192.168.1.7 to de:ad:be:ef:25:a1 via eth0 DHCPREQUEST for 192.168.1.7 (192.168.1.210) from de:ad:be:ef:25:a1 via eth0 DHCPACK on 192.168.1.7 to de:ad:be:ef:25:a1 via eth0 <Ctrl+c> $ secondary: $ dhcpd -d -f -lf ~/lala Internet Systems Consortium DHCP Server V3.0.5-RedHat Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Wrote 0 leases to leases file. Listening on LPF/eth0/de:ad:be:ef:25:2c/192.168.1/24 Sending on LPF/eth0/de:ad:be:ef:25:2c/192.168.1/24 Sending on Socket/fallback/fallback-net failover peer testicek: I move from recover to startup failover peer testicek: peer moves from unknown-state to recover failover peer testicek: requesting full update from peer failover peer testicek: I move from startup to recover Sent update request all message to testicek failover peer testicek: peer moves from recover to recover failover peer testicek: requesting full update from peer Sent update done message to testicek Update request all from testicek: nothing pending failover peer testicek: peer update completed. failover peer testicek: I move from recover to recover-done failover peer testicek: peer moves from recover to recover-done failover peer testicek: I move from recover-done to normal failover peer testicek: peer moves from recover-done to normal pool 90bd800 192.168.1/24 total 6 free 6 backup 0 lts 3 pool response: 3 leases pool 90bd800 192.168.1/24 total 6 free 3 backup 3 lts 0 DHCPDISCOVER from de:ad:be:ef:25:a1 via eth0: load balance to peer testicek DHCPREQUEST for 192.168.1.7 (192.168.1.210) from de:ad:be:ef:25:a1 via eth0: lease owned by peer peer testicek: disconnected failover peer testicek: I move from normal to communications-interrupted DHCPDISCOVER from de:ad:be:ef:25:a2 via eth0 DHCPOFFER on 192.168.1.8 to de:ad:be:ef:25:a2 via eth0 DHCPREQUEST for 192.168.1.8 (192.168.1.211) from de:ad:be:ef:25:a2 via eth0 DHCPACK on 192.168.1.8 to de:ad:be:ef:25:a2 via eth0 DHCPDISCOVER from de:ad:be:ef:25:a3 via eth0 DHCPOFFER on 192.168.1.9 to de:ad:be:ef:25:a3 via eth0 DHCPREQUEST for 192.168.1.9 (192.168.1.211) from de:ad:be:ef:25:a3 via eth0 DHCPACK on 192.168.1.9 to de:ad:be:ef:25:a3 via eth0 DHCPDISCOVER from de:ad:be:ef:25:a4 via eth0 DHCPOFFER on 192.168.1.10 to de:ad:be:ef:25:a4 via eth0 DHCPREQUEST for 192.168.1.10 (192.168.1.211) from de:ad:be:ef:25:a4 via eth0 DHCPACK on 192.168.1.10 to de:ad:be:ef:25:a4 via eth0 DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases failover peer testicek: I move from communications-interrupted to partner-down DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases <<<--- DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases <<<--- DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0: peer holds all free leases <<<--- client: $ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a1; $ dhclient -d -lf ~/lala Internet Systems Consortium DHCP Client V3.0.5-RedHat Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Listening on LPF/eth0/de:ad:be:ef:25:a1 Sending on LPF/eth0/de:ad:be:ef:25:a1 Sending on Socket/fallback DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 DHCPOFFER from 192.168.1.210 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.210 bound to 192.168.1.7 -- renewal in 32 seconds. <Ctrl+c> $ $ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a2; $ dhclient -d -lf ~/lala ...<snip> DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7 DHCPOFFER from 192.168.1.211 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.211 bound to 192.168.1.8 -- renewal in 32 seconds. <Ctrl+c> $ $ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a3; $ dhclient -d -lf ~/lala ...<snip> DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5 DHCPOFFER from 192.168.1.211 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.211 bound to 192.168.1.9 -- renewal in 31 seconds. <Ctrl+c> $ $ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a4; $ dhclient -d -lf ~/lala ...<snip> DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5 DHCPOFFER from 192.168.1.211 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.211 bound to 192.168.1.10 -- renewal in 25 seconds. <Ctrl+c> $ $ date -u Thu Jul 1 18:19:13 UTC 2010 $ date -u Thu Jul 1 18:22:54 UTC 2010 $ $ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a5; $ dhclient -d -lf ~/lala ...<snip> DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 <<<--- DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 12 <<<--- DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 19 <<<--- <Ctrl+c> $ Expected results: Client should be able to get an ip address from pool of previously terminated dhcp server. secondary: $ dhcpd -d -f -lf ~/lala ...<snip> failover peer testicek: I move from communications-interrupted to partner-down DHCPDISCOVER from de:ad:be:ef:25:a5 via eth0 DHCPOFFER on 192.168.1.6 to de:ad:be:ef:25:a5 via eth0 DHCPREQUEST for 192.168.1.6 (192.168.1.211) from de:ad:be:ef:25:a5 via eth0 DHCPACK on 192.168.1.6 to de:ad:be:ef:25:a5 via eth0 <<<--- client: $ > ~/lala; ifconfig eth0 down; ifconfig eth0 hw ether de:ad:be:ef:25:a5; $ dhclient -d -lf ~/lala ...<snip> DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 DHCPOFFER from 192.168.1.211 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 192.168.1.211 bound to 192.168.1.6 -- renewal in 24 seconds. <Ctrl+c> $ Additional info: This bug is caused by the fact, that dhcd doesn't contain (in 3.0.5) a code which would in case of failover and partner-down state assign adresses also from free queue. Attached patch solves the bug. The patch was backported from the upstream dhcp-3.1-ESV (http://ftp.isc.org/isc/dhcp/dhcp-3.1-ESV.tar.gz).
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, moving the server from the "communication-interrupted" state to the "partner-down" state did not force the server to take over the partner's leases. Consequently, clients could not get an IP address from the pool of the previously terminated DHCP server. With this update, a failover server in "partner-down" state is able to re-allocate leases to clients.
Tested with dhcp-3.0.5-29.el5 using reproducer from comment 0. After stopping the first DHCP server, moving the second one into partner-down state and waiting until MCLT has expired, DHCP client received new IP address from the range, which was originally load-balanced to the first DHCP server. Moving to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1038.html