Bug 1189154
Summary: | DNS errors after IPA upgrade due to broken ReplSync | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Scott Poore <spoore> | ||||
Component: | 389-ds-base | Assignee: | IPA Maintainers <ipa-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Namita Soman <nsoman> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.1 | CC: | drieden, jgalipea, jkurik, lkrispen, mbasti, mkosek, nhosoi, nkinder, pvoborni, pzhukov, rcritten, rmeggins, spoore, sramling, tbordaz, vashirov | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | 389-ds-base-1.3.4.0-1.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: In the upgrade, before running the upgrade scripts, the servers should be stopped completely. But the command line "systemctl stop dirsrv.target" for shutting down all installed servers did not guarantee it.
Consequence: It caused the upgrade script fail.
Fix: In the upgrade, instead of "systemctl stop dirsrv.target", call "systemctl stop" for each server instance, which assures the server's shutdown when the command line returns.
Result: There is no more upgrade script failure.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1195295 (view as bug list) | Environment: | |||||
Last Closed: | 2015-11-19 11:43:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1195295 | ||||||
Attachments: |
|
Description
Scott Poore
2015-02-04 15:21:31 UTC
Waiting on Scott, to provide reproduction or more logs for the failure. It should not happen under normal conditions, unless Directory Server upgrade script failed and did not create cn=Content Synchronization,cn=plugins,cn=config. Created attachment 988985 [details]
setup-ds.pl output
Content Syncronization did appear to be created on the master when upgraded: [root@vm-idm-003 ~]# ldapsearch -x -D "cn=Directory Manager" -w Secret123 -b "cn=Content Synchronization,cn=plugins,cn=config" # extended LDIF # # LDAPv3 # base <cn=Content Synchronization,cn=plugins,cn=config> with scope subtree # filter: (objectclass=*) # requesting: ALL # # Content Synchronization, plugins, config dn: cn=Content Synchronization,cn=plugins,cn=config objectClass: top objectClass: nsSlapdPlugin objectClass: extensibleObject cn: Content Synchronization nsslapd-pluginPath: libcontentsync-plugin nsslapd-pluginInitfunc: sync_init nsslapd-pluginType: object nsslapd-pluginEnabled: off nsslapd-plugin-depends-on-named: Retro Changelog Plugin nsslapd-pluginId: none nsslapd-pluginVersion: none nsslapd-pluginVendor: none nsslapd-pluginDescription: none # search result search: 2 result: 0 Success # numResponses: 2 # numEntries: 1 What is the error from the ipaupgrade.log though? Is that not related here? Ludwig investigated this bug, and he is pretty confident that it is a race condition between shutdown of ds and running the update script. Additional info from Ludwig's investigation: The DS upgrade runs in offline mode, so it will stop the servers and call setup-ds.pl. If the server is still running setup-ds exits and aborts upgrade. It looks like 389-ds-base rpm handles stopping dirsrv before running the setup-ds.pl script: /bin/systemctl stop dirsrv.target > $output 2>&1 || : echo remove pid files . . . > $output 2>&1 || : /bin/rm -f /var/run/dirsrv*.pid /var/run/dirsrv*.startpid # do the upgrade echo upgrading instances . . . > $output 2>&1 || : /usr/sbin/setup-ds.pl -l $output -u -s General.UpdateMode=offline > $output 2>&1 || : in a test on my machine I got: shutting down all instances . . . tail: /tmp/setup-389.log: file truncated remove pid files . . . tail: /tmp/setup-389.log: file truncated upgrading instances . . . tail: /tmp/setup-389.log: file truncated [15/02/10:16:47:16] - [Setup] Info Error: offline mode selected but the server [slapd-elkris] is still running. [15/02/10:16:47:16] - [Setup] Fatal Error: could not update the directory server. [15/02/10:16:47:16] - [Setup] Fatal Exiting . . . so the server should have been down, but it looks like it wasn't yet Could it be the same issue as: https://fedorahosted.org/freeipa/ticket/4709#comment:6 ? systemctl stop dirsrv.target does not wait until instances are stopped I guess it's possible that it's related to that freeipa ticket about dirsrv.target not waiting till it's complete. I was able to successfully get several reproductions today using some slower VMs that I thought I'd seen the problem on more often. That seemed to work. Is there a way I can modify dirsrv.target to force it to wait until everything stops before returning? If so, I can test that case against my slower VMs and see if it stops the error from occurring. Thanks, Scott Hi Scott, Ludwig provided fixes on 389-ds-base.spec file. I've applied them with some modifications and created this scratch build. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8734627 Could it be possible to run your upgrade test? Thanks! --noriko Running tests now. I'll see how it goes. 20 runs should hopefully be enough to confirm. Especially since I saw in in 3/10 from the last batch. It'll take me some time to get that done though. I'll let you know when it is. Thanks, Scott I've been having issues with test servers. I'm still trying to get good clean tests here. I'll give an update tomorrow to see how my tests are running. FYI, I've been working through some test and test host issues. I have however been able to get a lot of good runs so far with no DNS errors that I was seeing here. I have 4 final test runs going now. When those finish, if I don't see this error, I think we're good. I'll update the bug when I have those results. Thanks, Scott Ok, my last job finally finished and I have not seen this error again with the test fix. I think we can go with that fix. Let me know if you need me to test something else for this. Verification steps provided by Scott is quite straight forward to verify from IPA. We can do that from DS QE team. Please share any other pointers to verify the same using standalone 389-ds-base. (In reply to Sankar Ramalingam from comment #17) > Verification steps provided by Scott is quite straight forward to verify > from IPA. We can do that from DS QE team. Please share any other pointers to > verify the same using standalone 389-ds-base. There's only a difference is in the spec file which is being used in the upgrade. Before the fix, the next operation has started before shutting down the server was not completed. The fix makes the shutdown wait, then the next operation begins. We added debug options to the fix (in the 7.1.z build which hasn't been made yet), it might be useful to see the output you can get by doing... export DEBUGPOSTTRANS=/path/to/upgrade_log_file export DEBUGPOSTSETUPOPT=ddddd Run rpm with -U option. See /path/to/upgrade_log_file once the upgrade is done. You should be able to see this output in the log file. It does not prove the shutdown was completed in each systemctl stop, but at least we could say the code was executed. + echo stopping instance $inst >> $output 2>&1 || : + /bin/systemctl stop $inst >> $output 2>&1 || : Hit the bug twice (from two *fresh* installation) using 389-ds-base-1.3.3.1-15.el7_1.x86_64 where it's supposed to be fixed Step to reproduce: 1) Install RHEL7 2) yum install ipa-server-3.3.3-28.el7_0.3.x86_64 bind bind-dyndb-ldap (need IPA3.X because of bug in RHEVM ) 3) ipa-server-install --setup-dns (no errors) dnsrecord-show works find and all records are in place but bind still doesn't resolve them. Plugins is disabled because 289ds was running at the moment of config file was modified do you have the content sync plugin in the dse.ldif or not ? I had it but it was disabled as well as Retro Changelog Plugin. Workaround was to enable both manually. then it is a different problem, this bug was taht a failing DS upgrade did not add the plugin to the dse.ldif. If you have it, disabled or not, then some other step in ipa install is failing Scott, have you seen recently DNS errors in your upgrade tests? Viktor, So far we haven't seen this bug again when testing with RHEL7.2 upgrades. So, I think this can be closed. If we do run into this again during upgrade testing, we can reopen it. Let me know if you prefer I close this and I will. Thanks, Scott Verified. Version :: 389-ds-base-1.3.4.0-9.el7.x86_64 ipa-server-4.2.0-3.el7.x86_64 Results :: So far we haven't seen these errors again with the patch. (In reply to Scott Poore from comment #31) > Verified. > > Version :: > > 389-ds-base-1.3.4.0-9.el7.x86_64 > ipa-server-4.2.0-3.el7.x86_64 > > Results :: > > So far we haven't seen these errors again with the patch. Thank you, Scott! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2351.html |