Bug 1189154

Summary: DNS errors after IPA upgrade due to broken ReplSync
Product: Red Hat Enterprise Linux 7 Reporter: Scott Poore <spoore>
Component: 389-ds-baseAssignee: IPA Maintainers <ipa-maint>
Status: CLOSED ERRATA QA Contact: Namita Soman <nsoman>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: drieden, jgalipea, jkurik, lkrispen, mbasti, mkosek, nhosoi, nkinder, pvoborni, pzhukov, rcritten, rmeggins, spoore, sramling, tbordaz, vashirov
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.4.0-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: In the upgrade, before running the upgrade scripts, the servers should be stopped completely. But the command line "systemctl stop dirsrv.target" for shutting down all installed servers did not guarantee it. Consequence: It caused the upgrade script fail. Fix: In the upgrade, instead of "systemctl stop dirsrv.target", call "systemctl stop" for each server instance, which assures the server's shutdown when the command line returns. Result: There is no more upgrade script failure.
Story Points: ---
Clone Of:
: 1195295 (view as bug list) Environment:
Last Closed: 2015-11-19 11:43:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1195295    
Attachments:
Description Flags
setup-ds.pl output none

Description Scott Poore 2015-02-04 15:21:31 UTC
Description of problem:

Testing IPA Upgrades, I'm seeing DNS errors (and later resolution stops working).

I'm seeing a lot of named LDAP errors:

Feb  4 02:59:35 vm-idm-023 named[16726]: LDAP instance 'ipa' is being synchronized, please ignore message 'all zones loaded'
Feb  4 02:59:35 vm-idm-023 named[16726]: LDAP error: Critical extension is unavailable: unable to start SyncRepl session: is RFC 4533 supported by LDAP server?
Feb  4 02:59:35 vm-idm-023 named[16726]: ldap_syncrepl will reconnect in 60 seconds

Also, I just noticed that during the upgrade I see this:

Add failure missing required attribute "objectclass"

Looking in /var/log/ipaupgrade.log:

2015-02-03T20:52:12Z INFO New entry: cn=Content Synchronization,cn=plugins,cn=config
2015-02-03T20:52:12Z DEBUG ---------------------------------------------
2015-02-03T20:52:12Z DEBUG Initial value
2015-02-03T20:52:12Z DEBUG dn: cn=Content Synchronization,cn=plugins,cn=config
2015-02-03T20:52:12Z DEBUG only: set nsslapd-pluginEnabled to 'on', current value []
2015-02-03T20:52:12Z DEBUG only: updated value ['on']
2015-02-03T20:52:12Z DEBUG ---------------------------------------------
2015-02-03T20:52:12Z DEBUG Final value after applying updates
2015-02-03T20:52:12Z DEBUG dn: cn=Content Synchronization,cn=plugins,cn=config
2015-02-03T20:52:12Z DEBUG nsslapd-pluginEnabled:
2015-02-03T20:52:12Z DEBUG      on
2015-02-03T20:52:12Z ERROR Add failure missing required attribute "objectclass"
2015-02-03T20:52:12Z INFO Updating existing entry: cn=kerberos,dc=testrelm,dc=test

Thierry Bordaz found that this is a problem with adding ReplSync support:

"
Both error messages (are related to the same problem). There is a failure to ADD the ReplSync config entry.
[04/Feb/2015:02:22:11 +051800] conn=12 op=65 RESULT err=0 tag=101 nentries=1 etime=0
[04/Feb/2015:02:22:11 +051800] conn=12 op=66 SRCH base="cn=Content Synchronization,cn=plugins,cn=config" scope=0 filter="(objectClass=*)" attrs="aci * attributeTypes objectClasses"
[04/Feb/2015:02:22:11 +051800] conn=12 op=66 RESULT err=32 tag=101 nentries=0 etime=0
[04/Feb/2015:02:22:11 +051800] conn=12 op=67 ADD dn="cn=Content Synchronization,cn=plugins,cn=config"
[04/Feb/2015:02:22:12 +051800] conn=12 op=67 RESULT err=65 tag=105 nentries=0 etime=1

According to the returned message, the added entry was missing the 'objectclass' attribute and so the ADD failed.

The later in ipaupgrade errors were reported related to DNS/named, because it requires ReplSync support. But as ReplSync was not configured/enabled, 389-ds was failing to decode ReplSync controls

[04/Feb/2015:02:59:34 +051800] conn=100 op=3 SRCH base="(null)" scope=2 filter="(|(objectClass=idnsConfigObject)(objectClass=idnsZone)(objectClass=idnsForwardZone)(objectClass=idnsRecord))", failed to decode LDAP controls
[04/Feb/2015:02:59:34 +051800] conn=100 op=3 RESULT err=12 tag=101 nentries=0 etime=0


So this DNS errors are a different bug (not related to schema update). This new bug seems to be that the ADD 'content synchronization' plugin entry was missing objectclass
"

Version-Release number of selected component (if applicable):

ipa-server-4.1.0-16.el7.x86_64
389-ds-base-1.3.3.1-13.el7.x86_64

How reproducible:
often but, not 100% consistent.

Steps to Reproduce:
1.  Setup IPA Master on RHEL 7.0
2.  Setup IPA Replica on RHEL 7.0
3.  Upgrade IPA Master to RHEL 7.1
4.  Upgrade IPA Replica to RHEL 7.1
5.  dig <MASTER>

Actual results:
After upgrade DNS no longer works on one of the IPA servers.  And I see a lot of named ldap errors in /var/log/messages like listed above.

Expected results:
DNS works and resolves hostnames as expected with no ldap errors in log.

Additional info:

Comment 4 Martin Kosek 2015-02-05 15:25:55 UTC
Waiting on Scott, to provide reproduction or more logs for the failure. It should not happen under normal conditions, unless Directory Server upgrade script failed and did not create cn=Content Synchronization,cn=plugins,cn=config.

Comment 5 Scott Poore 2015-02-06 17:35:24 UTC
Created attachment 988985 [details]
setup-ds.pl output

Comment 6 Scott Poore 2015-02-06 21:01:17 UTC
Content Syncronization did appear to be created on the master when upgraded:

[root@vm-idm-003 ~]# ldapsearch -x -D "cn=Directory Manager" -w Secret123 -b "cn=Content Synchronization,cn=plugins,cn=config"
# extended LDIF
#
# LDAPv3
# base <cn=Content Synchronization,cn=plugins,cn=config> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# Content Synchronization, plugins, config
dn: cn=Content Synchronization,cn=plugins,cn=config
objectClass: top
objectClass: nsSlapdPlugin
objectClass: extensibleObject
cn: Content Synchronization
nsslapd-pluginPath: libcontentsync-plugin
nsslapd-pluginInitfunc: sync_init
nsslapd-pluginType: object
nsslapd-pluginEnabled: off
nsslapd-plugin-depends-on-named: Retro Changelog Plugin
nsslapd-pluginId: none
nsslapd-pluginVersion: none
nsslapd-pluginVendor: none
nsslapd-pluginDescription: none

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

What is the error from the ipaupgrade.log though?  Is that not related here?

Comment 9 Martin Bašti 2015-02-12 09:11:44 UTC
Ludwig investigated this bug, and he is pretty confident that it is a race condition between shutdown of ds and running the update script.

Additional info from Ludwig's investigation:

The DS upgrade runs
in offline mode, so it will stop the servers and call setup-ds.pl. If
the server is still running setup-ds exits and aborts upgrade. 

It looks like 389-ds-base rpm handles stopping dirsrv before running the setup-ds.pl script:
/bin/systemctl stop dirsrv.target > $output 2>&1 || :
echo remove pid files . . . > $output 2>&1 || :
/bin/rm -f /var/run/dirsrv*.pid /var/run/dirsrv*.startpid
# do the upgrade
echo upgrading instances . . . > $output 2>&1 || :
/usr/sbin/setup-ds.pl -l $output -u -s General.UpdateMode=offline > $output 2>&1 || : 

in a test on my machine I got:
shutting down all instances . . .
tail: /tmp/setup-389.log: file truncated
remove pid files . . .
tail: /tmp/setup-389.log: file truncated
upgrading instances . . .
tail: /tmp/setup-389.log: file truncated
[15/02/10:16:47:16] - [Setup] Info Error: offline mode selected but the server [slapd-elkris] is still running.
[15/02/10:16:47:16] - [Setup] Fatal Error: could not update the directory server.
[15/02/10:16:47:16] - [Setup] Fatal Exiting . . .

so the server should have been down, but it looks like it wasn't yet

Comment 10 Martin Bašti 2015-02-12 09:58:27 UTC
Could it be the same issue as: https://fedorahosted.org/freeipa/ticket/4709#comment:6 ?

systemctl stop dirsrv.target does not wait until instances are stopped

Comment 11 Scott Poore 2015-02-13 01:29:47 UTC
I guess it's possible that it's related to that freeipa ticket about dirsrv.target not waiting till it's complete.

I was able to successfully get several reproductions today using some slower VMs that I thought I'd seen the problem on more often.  That seemed to work.

Is there a way I can modify dirsrv.target to force it to wait until everything stops before returning?

If so, I can test that case against my slower VMs and see if it stops the error from occurring.

Thanks,
Scott

Comment 12 Noriko Hosoi 2015-02-13 19:07:22 UTC
Hi Scott,

Ludwig provided fixes on 389-ds-base.spec file.  I've applied them with some modifications and created this scratch build.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8734627

Could it be possible to run your upgrade test?

Thanks!
--noriko

Comment 13 Scott Poore 2015-02-13 21:12:37 UTC
Running tests now.  I'll see how it goes.  20 runs should hopefully be enough to confirm.  Especially since I saw in in 3/10 from the last batch.  It'll take me some time to get that done though.  I'll let you know when it is.

Thanks,
Scott

Comment 14 Scott Poore 2015-02-16 03:28:34 UTC
I've been having issues with test servers.  I'm still trying to get good clean tests here.  I'll give an update tomorrow to see how my tests are running.

Comment 15 Scott Poore 2015-02-18 16:43:27 UTC
FYI, I've been working through some test and test host issues.  I have however been able to get a lot of good runs so far with no DNS errors that I was seeing here.  I have 4 final test runs going now.  When those finish, if I don't see this error, I think we're good.

I'll update the bug when I have those results.

Thanks,
Scott

Comment 16 Scott Poore 2015-02-18 20:53:28 UTC
Ok, my last job finally finished and I have not seen this error again with the test fix.  I think we can go with that fix.

Let me know if you need me to test something else for this.

Comment 17 Sankar Ramalingam 2015-02-20 13:39:51 UTC
Verification steps provided by Scott is quite straight forward to verify from IPA. We can do that from DS QE team. Please share any other pointers to verify the same using standalone 389-ds-base.

Comment 18 Noriko Hosoi 2015-02-20 17:35:44 UTC
(In reply to Sankar Ramalingam from comment #17)
> Verification steps provided by Scott is quite straight forward to verify
> from IPA. We can do that from DS QE team. Please share any other pointers to
> verify the same using standalone 389-ds-base.

There's only a difference is in the spec file which is being used in the upgrade.  Before the fix, the next operation has started before shutting down the server was not completed.  The fix makes the shutdown wait, then the next operation begins.

We added debug options to the fix (in the 7.1.z build which hasn't been made yet), it might be useful to see the output you can get by doing...
export DEBUGPOSTTRANS=/path/to/upgrade_log_file
export DEBUGPOSTSETUPOPT=ddddd
Run rpm with -U option.
See /path/to/upgrade_log_file once the upgrade is done.

You should be able to see this output in the log file.  It does not prove the shutdown was completed in each systemctl stop, but at least we could say the code was executed.
+    echo stopping instance $inst >> $output 2>&1 || :
+    /bin/systemctl stop $inst >> $output 2>&1 || :

Comment 23 Pavel Zhukov 2015-03-18 08:45:16 UTC
Hit the bug twice (from two *fresh* installation) using 389-ds-base-1.3.3.1-15.el7_1.x86_64 where it's supposed to be fixed

Step to reproduce:
1) Install RHEL7
2) yum install ipa-server-3.3.3-28.el7_0.3.x86_64 bind bind-dyndb-ldap (need IPA3.X because of bug in RHEVM )
3) ipa-server-install --setup-dns (no errors)

dnsrecord-show works find and all records are in place but bind still doesn't resolve them. Plugins is disabled because 289ds was running at the moment of config file was modified

Comment 24 Ludwig 2015-03-18 09:12:47 UTC
do you have the content sync plugin in the dse.ldif or not ?

Comment 25 Pavel Zhukov 2015-03-18 09:23:46 UTC
I had it but it was disabled as well as Retro Changelog Plugin. 
Workaround was to enable both manually.

Comment 26 Ludwig 2015-03-18 09:26:43 UTC
then it is  a different problem, this bug was taht a failing DS upgrade did not add the plugin to the dse.ldif.

If you have it, disabled or not, then some other step in ipa install is failing

Comment 28 Viktor Ashirov 2015-07-21 11:32:25 UTC
Scott, have you seen recently DNS errors in your upgrade tests?

Comment 29 Scott Poore 2015-08-06 18:05:04 UTC
Viktor,

So far we haven't seen this bug again when testing with RHEL7.2 upgrades.

So, I think this can be closed.  If we do run into this again during upgrade testing, we can reopen it.

Let me know if you prefer I close this and I will.

Thanks,
Scott

Comment 31 Scott Poore 2015-08-10 19:13:47 UTC
Verified.

Version ::

389-ds-base-1.3.4.0-9.el7.x86_64
ipa-server-4.2.0-3.el7.x86_64

Results ::

So far we haven't seen these errors again with the patch.

Comment 32 Viktor Ashirov 2015-08-19 08:49:05 UTC
(In reply to Scott Poore from comment #31)
> Verified.
> 
> Version ::
> 
> 389-ds-base-1.3.4.0-9.el7.x86_64
> ipa-server-4.2.0-3.el7.x86_64
> 
> Results ::
> 
> So far we haven't seen these errors again with the patch.

Thank you, Scott!

Comment 33 errata-xmlrpc 2015-11-19 11:43:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html