Red Hat Bugzilla – Bug 830314
ipa-replica-install named failed to start
Last modified: 2015-05-19 09:40:43 EDT
Description of problem:
The named service fails to start sometimes during ipa-replica-install. I've only seen this when two replicas are installing at the same time during testing. Then it wasn't consistent either.
Version-Release number of selected component (if applicable):
Somewhat but, not predictably that I've found yet.
Steps to Reproduce:
1. <setup IPA Master server on RHEL6.3>
2. <setup 2 replicas at the same time>
3. ipa-replica-install ipa-replica-install -U --setup-dns --forwarder=$DNSFORWARD -w $ADMINPW -p $ADMINPW /dev/shm/replica-info-$hostname_s.$DOMAIN.gpg
[1/8]: adding NS record to the zone
[2/8]: setting up reverse zone
[3/8]: setting up our own record
[4/8]: setting up kerberos principal
[5/8]: setting up named.conf
[6/8]: restarting named
named service failed to start
[7/8]: configuring named to start on boot
[8/8]: changing resolv.conf to point to ourselves
done configuring named.
named successfully starts
2012-06-08T14:47:12Z DEBUG [6/8]: restarting named
2012-06-08T14:47:12Z DEBUG args=/sbin/service named status
2012-06-08T14:47:12Z DEBUG stdout=named is stopped
2012-06-08T14:47:12Z DEBUG stderr=rndc: neither /etc/rndc.conf nor /etc/rndc.key was found
2012-06-08T14:47:12Z DEBUG Saving StateFile to '/var/lib/ipa/sysrestore/sysrestore.state'
2012-06-08T14:47:25Z DEBUG args=/sbin/service named restart
2012-06-08T14:47:25Z DEBUG stdout=Stopping named: [ OK ]
Generating /etc/rndc.key:[ OK ]
Starting named: [FAILED]
2012-06-08T14:47:25Z DEBUG stderr=
2012-06-08T14:47:25Z DEBUG duration: 12 seconds
Jun 8 10:47:25 beast named: bind to LDAP server failed: Timed out
Jun 8 10:47:25 beast named: loading configuration: failure
Jun 8 10:47:25 beast named: exiting (due to fatal error)
Nothing in /var/log/dirsrv/slapd-TESTRELM.COM/errors or access during this timeframe.
Nothing in krb5kdc.log during timeframe either.
FYI, In some cases, named crashes in automation too. It's not happening all the time but, sometimes.
IMHO root cause of this problem is somewhere in 389 DS. Directory server is not able to respond to an LDAP query within 10 seconds and for this reason it times out.
is this possibly related?
Or still think this is DS related?
No, unfortunatelly it isn't related.
https://fedorahosted.org/bind-dyndb-ldap/ticket/84 is about crash after a timeout. Timeout itself is caused by DS.
Problem described in this bug is:
timeout -> named is not able to read configuration because of timeout -> named exits because it don't know own configuration.
I can't manage to reproduce this bug. I get https://fedorahosted.org/freeipa/ticket/2950 sometimes, but never this one.
I'm trying to reproduce this again. I haven't seen it in a while but, I put in some delays in my tests that seemed to alleviate the problem. I've removed those. I think we may have added other delays in our automation though that may prevent this issue from occuring in our automated tests. I may have to try to manually reproduce it instead of using our tests.
I'll see what I can come up with.
This issue is not reproducible any more with the latest bits. Moving to QE to retest.
A week and a half or so ago I tried to reproduce this with no luck. I just retried 3 times with no luck. I'm closing this WORKSFORME since it seems that the issue has gone away. If we run into this again, I will reopen this case.
I do have a test in the ipa-replica-install test suite so we should catch this if it occurs again.