Bug 735103

Summary: dhcpd: failover: link startup timeout
Product: [Fedora] Fedora Reporter: Rick Murphy <rmurphy>
Component: bindAssignee: Adam Tkac <atkac>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 15CC: atkac, jpopelka, ovasik
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: dnsperf-1.0.1.0-25.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-14 22:30:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rick Murphy 2011-09-01 13:45:53 UTC
Description of problem:
I have recently upgraded my DHCP servers to Fedora 15. They are a failover pair operating between two systems now running Fedora 15. After the upgrade, the failover initialization fails, spitting out the following every 20 seconds:
dhcpd: failover: link startup timeout

This happens on both of the failover peers. I build ISC dhcpd 4.2.2 from source and replaced the dhcpd binary with the newly built one and the errors ceased. This allowed the server pair to properly initialize failover. 

Version-Release number of selected component (if applicable):
isc-dhcpd-4.2.1-P1

How reproducible:
Completely.

Steps to Reproduce:
1. Configure a failover pair according to the manpage
2. Start both dhcp servers
3. View the syslog
  
Actual results:
dhcpd: failover: link startup timeout
Peers do not enter 'normal' communications state

Expected results:
Normal communications.

Additional info:

Comment 1 Jiri Popelka 2011-09-05 11:11:45 UTC
Thank you for the report.
Would it be possible to rebuild the source RPM we have in Fedora 16 ?
It is also 4.2.2 but this way all our patches will be applied so we'll see
if the problem is in some of our patches or in 4.2.1.

1) Download
http://kojipkgs.fedoraproject.org/packages/dhcp/4.2.2/1.fc16/src/dhcp-4.2.2-1.fc16.src.rpm
2) su -c 'yum install rpmdevtools'
3) rpmdev-setuptree
4) rpmbuild --rebuild dhcp-4.2.2-1.fc16.src.rpm
5) cd ~/rpmbuild/RPMS/<your arch>
6) su -c 'yum --nogpgcheck localupdate *.rpm'
You can always downgrade with 'yum downgrade dhcp'

Or eventually you can try to build ISC dhcpd 4.2.1 (as you already did with 4.2.2) from source and try it. That will also show us on which side (ISC/Fedora) the problem is.

Thanks

Comment 2 Jiri Popelka 2011-09-05 13:13:43 UTC
I was able to reproduce the problem so you can ignore the previous commit.

Comment 3 Rick Murphy 2011-09-06 11:59:05 UTC
I did attempt the build of the fc16 source, but there's other dependencies that keep it from compiling:

In file included from ../includes/omapip/isclib.h:64:0,
                 from ../includes/dhcpd.h:95,
                 from bpf.c:35:
/usr/include/dns/client.h:146:19: error: unknown type name 'dns_client_t'
/usr/include/dns/client.h:149:37: error: unknown type name 'isc_appctx_t'
/usr/include/dns/client.h:151:28: error: unknown type name 'dns_client_t'

I suspect that the bind-lite-devel package needs to be updated as well, but I'll hold off on trying further now that you've reproduced it.

Comment 4 Jiri Popelka 2011-09-06 13:12:00 UTC
I see also this repeating message in log:
../../../../lib/isc/unix/socket.c:891: epoll_ctl(DEL), 10: Bad file descriptor

this seems serious, because the OMAPI (omshell tool) is also not working, see
http://lists.fedoraproject.org/pipermail/users/2011-August/402745.html

That message comes from BIND, so I'm adding BIND maintainer to CC.
Adam, does it ring a bell to you ?
The easiest way how to reproduce that message is just run 'omshell' and type 'connect' command.

So far it seems that the problem is in Fedora's change (bug #637017) in dhcp which allows us (since F15) to use system BIND libraries instead of bundled BIND libraries from dhcp sources.
When I build (F15 branch) dhcp without those 2 patches (rh637017.patch, sharedlib.patch) everything (failover, OMAPI) works as expected.

I'm still investigating it.

Comment 5 Adam Tkac 2011-09-07 16:22:33 UTC
Reassigning to bind, this seems like bind-libs-lite issue for me.

Comment 6 Fedora Update System 2011-09-09 11:29:42 UTC
dnsperf-1.0.1.0-25.fc16,dhcp-4.2.2-5.fc16,bind-dyndb-ldap-1.0.0-0.2.b1.fc16,bind-9.8.1-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/dnsperf-1.0.1.0-25.fc16,dhcp-4.2.2-5.fc16,bind-dyndb-ldap-1.0.0-0.2.b1.fc16,bind-9.8.1-2.fc16

Comment 7 Fedora Update System 2011-09-09 11:31:37 UTC
bind-9.8.1-1.fc15,bind-dyndb-ldap-1.0.0-0.2.b1.fc15,dhcp-4.2.1-11.P1.fc15,dnsperf-1.0.1.0-25.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/bind-9.8.1-1.fc15,bind-dyndb-ldap-1.0.0-0.2.b1.fc15,dhcp-4.2.1-11.P1.fc15,dnsperf-1.0.1.0-25.fc15

Comment 8 Fedora Update System 2011-09-09 15:09:16 UTC
Package dnsperf-1.0.1.0-25.fc16, dhcp-4.2.2-5.fc16, bind-dyndb-ldap-1.0.0-0.2.b1.fc16, bind-9.8.1-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dnsperf-1.0.1.0-25.fc16 dhcp-4.2.2-5.fc16 bind-dyndb-ldap-1.0.0-0.2.b1.fc16 bind-9.8.1-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/dnsperf-1.0.1.0-25.fc16,dhcp-4.2.2-5.fc16,bind-dyndb-ldap-1.0.0-0.2.b1.fc16,bind-9.8.1-2.fc16
then log in and leave karma (feedback).

Comment 9 Rick Murphy 2011-09-12 02:05:01 UTC
I'm currently running Fedora 15. I've tried the suggested update command and have a "No match for argument .." for each of the suggested updates.
Changing fc16 to fc15 doesn't find any update packages either.

If you'll push these updates to the Fedora 15 testing repository, I'll give them a try. Otherwise, I'll wait until Fedora 16 release and assume the fixes will be incorporated.

Comment 10 Adam Tkac 2011-09-12 09:32:46 UTC
(In reply to comment #9)
> I'm currently running Fedora 15. I've tried the suggested update command and
> have a "No match for argument .." for each of the suggested updates.
> Changing fc16 to fc15 doesn't find any update packages either.
> 
> If you'll push these updates to the Fedora 15 testing repository, I'll give
> them a try. Otherwise, I'll wait until Fedora 16 release and assume the fixes
> will be incorporated.

It takes some time (one day) before all updates are propagated to mirrors (push->updates are on master server->updates are on mirrors). Today I was able to fetch updated bind-* and dhcp-* packages via command written in comment #7.

Comment 11 Rick Murphy 2011-09-14 13:18:48 UTC
Updates installed and the problem is fixed. Comments left as requested.
Thanks, Adam.

Comment 12 Fedora Update System 2011-09-14 22:29:49 UTC
bind-9.8.1-1.fc15, bind-dyndb-ldap-1.0.0-0.2.b1.fc15, dhcp-4.2.1-11.P1.fc15, dnsperf-1.0.1.0-25.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2011-09-23 04:02:23 UTC
dnsperf-1.0.1.0-25.fc16, dhcp-4.2.2-5.fc16, bind-dyndb-ldap-1.0.0-0.2.b1.fc16, bind-9.8.1-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.