Bug 830314 - ipa-replica-install named failed to start
ipa-replica-install named failed to start
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ipa (Show other bugs)
6.3
Unspecified Unspecified
high Severity unspecified
: rc
: ---
Assigned To: Rob Crittenden
Namita Soman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-08 16:11 EDT by Scott Poore
Modified: 2015-05-19 09:40 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-24 13:57:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Scott Poore 2012-06-08 16:11:15 EDT
Description of problem:
The named service fails to start sometimes during ipa-replica-install.  I've only seen this when two replicas are installing at the same time during testing.  Then it wasn't consistent either.

Version-Release number of selected component (if applicable):
ipa-server-2.2.0-16.el6.x86_64
bind-9.8.2-0.10.rc1.el6.x86_64
bind-dyndb-ldap-1.1.0-0.9.b1.el6.x86_64
389-ds-base-1.2.10.2-15.el6.x86_64

How reproducible:
Somewhat but, not predictably that I've found yet.

Steps to Reproduce:
1. <setup IPA Master server on RHEL6.3>
2. <setup 2 replicas at the same time>
3.  ipa-replica-install ipa-replica-install -U --setup-dns --forwarder=$DNSFORWARD -w $ADMINPW -p $ADMINPW /dev/shm/replica-info-$hostname_s.$DOMAIN.gpg
  
Actual results:

Configuring named:
  [1/8]: adding NS record to the zone
  [2/8]: setting up reverse zone
  [3/8]: setting up our own record
  [4/8]: setting up kerberos principal
  [5/8]: setting up named.conf
  [6/8]: restarting named
named service failed to start
  [7/8]: configuring named to start on boot
  [8/8]: changing resolv.conf to point to ourselves
done configuring named.

Expected results:

named successfully starts

Additional info:

From /var/log/ipareplica-install.log:
...

2012-06-08T14:47:12Z DEBUG   [6/8]: restarting named
2012-06-08T14:47:12Z DEBUG args=/sbin/service named status 
2012-06-08T14:47:12Z DEBUG stdout=named is stopped

2012-06-08T14:47:12Z DEBUG stderr=rndc: neither /etc/rndc.conf nor /etc/rndc.key was found

2012-06-08T14:47:12Z DEBUG Saving StateFile to '/var/lib/ipa/sysrestore/sysrestore.state'
2012-06-08T14:47:25Z DEBUG args=/sbin/service named restart 
2012-06-08T14:47:25Z DEBUG stdout=Stopping named: [  OK  ]
Generating /etc/rndc.key:[  OK  ]
Starting named: [FAILED]

2012-06-08T14:47:25Z DEBUG stderr=
2012-06-08T14:47:25Z DEBUG   duration: 12 seconds
...

From /var/log/messages:
...
Jun  8 10:47:25 beast named[17512]: bind to LDAP server failed: Timed out
Jun  8 10:47:25 beast named[17512]: loading configuration: failure
Jun  8 10:47:25 beast named[17512]: exiting (due to fatal error)
...

Nothing in /var/log/dirsrv/slapd-TESTRELM.COM/errors or access during this timeframe.

Nothing in krb5kdc.log during timeframe either.
Comment 4 Rob Crittenden 2012-06-12 10:08:32 EDT
Upstream ticket:
https://fedorahosted.org/freeipa/ticket/2830
Comment 5 Scott Poore 2012-06-14 11:28:00 EDT
FYI, In some cases, named crashes in automation too.  It's not happening all the time but, sometimes.
Comment 8 Petr Spacek 2012-06-25 08:02:21 EDT
IMHO root cause of this problem is somewhere in 389 DS. Directory server is not able to respond to an LDAP query within 10 seconds and for this reason it times out.
Comment 9 Scott Poore 2012-07-12 12:16:45 EDT
is this possibly related?

https://fedorahosted.org/bind-dyndb-ldap/ticket/84

Or still think this is DS related?
Comment 10 Petr Spacek 2012-07-13 03:40:58 EDT
No, unfortunatelly it isn't related.

https://fedorahosted.org/bind-dyndb-ldap/ticket/84 is about crash after a timeout. Timeout itself is caused by DS.

Problem described in this bug is:
timeout -> named is not able to read configuration because of timeout -> named exits because it don't know own configuration.
Comment 11 Petr Viktorin 2012-08-22 18:13:52 EDT
I can't manage to reproduce this bug. I get https://fedorahosted.org/freeipa/ticket/2950 sometimes, but never this one.
Comment 13 Scott Poore 2012-08-31 11:24:18 EDT
I'm trying to reproduce this again.  I haven't seen it in a while but, I put in some delays in my tests that seemed to alleviate the problem.  I've removed those.  I think we may have added other delays in our automation though that may prevent this issue from occuring in our automated tests.  I may have to try to manually reproduce it instead of using our tests.  

I'll see what I can come up with.

Thanks
Comment 14 Dmitri Pal 2012-09-24 10:18:15 EDT
This issue is not reproducible any more with the latest bits. Moving to QE to retest.
Comment 15 Scott Poore 2012-09-24 13:57:12 EDT
A week and a half or so ago I tried to reproduce this with no luck.  I just retried 3 times with no luck.  I'm closing this WORKSFORME since it seems that the issue has gone away.  If we run into this again, I will reopen this case.

I do have a test in the ipa-replica-install test suite so we should catch this if it occurs again.

Note You need to log in before you can comment on or make changes to this bug.