Bug 1947762

Summary: IPA server does not start
Product: [Fedora] Fedora Reporter: Dirk <dirk.streubel>
Component: 389-ds-baseAssignee: thierry bordaz <tbordaz>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 34CC: abokovoy, awilliam, mreynolds, progier, robatino, spichugi, SpikeFedora, tbordaz, vashirov
Target Milestone: ---Keywords: TestCaseProvided
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: openqa sync-to-jira
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-02 21:54:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1956560    

Description Dirk 2021-04-09 07:52:57 UTC

here on Fedora 34 i have a made a Update last Night. After Rebooting the System the IPA service does not start:

[root@ipa2 ~]# systemctl status ipa
× ipa.service - Identity, Policy, Audit
     Loaded: loaded (/usr/lib/systemd/system/ipa.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2021-04-09 09:17:14 CEST; 5s ago
    Process: 1092 ExecStart=/usr/sbin/ipactl start (code=exited, status=1/FAILURE)
   Main PID: 1092 (code=exited, status=1/FAILURE)
        CPU: 1.444s

Apr 09 09:17:06 ipa2.linux.schnell.er ipactl[1092]: Assuming stale, cleaning and proceeding
Apr 09 09:17:07 ipa2.linux.schnell.er ipactl[1092]: Failed to read data from service file: Failed to get list of services to probe status!
Apr 09 09:17:07 ipa2.linux.schnell.er ipactl[1092]: Configured hostname 'ipa2.linux.schnell.er' does not match any master server in LDAP:
Apr 09 09:17:07 ipa2.linux.schnell.er ipactl[1092]: No master found because of error: no such entry
Apr 09 09:17:07 ipa2.linux.schnell.er ipactl[1092]: Shutting down
Apr 09 09:17:14 ipa2.linux.schnell.er ipactl[1092]: Starting Directory Service
Apr 09 09:17:14 ipa2.linux.schnell.er systemd[1]: ipa.service: Main process exited, code=exited, status=1/FAILURE
Apr 09 09:17:14 ipa2.linux.schnell.er systemd[1]: ipa.service: Failed with result 'exit-code'.
Apr 09 09:17:14 ipa2.linux.schnell.er systemd[1]: Failed to start Identity, Policy, Audit.
Apr 09 09:17:14 ipa2.linux.schnell.er systemd[1]: ipa.service: Consumed 1.444s CPU time.
[root@ipa2 ~]# ipactl stop
Stopping Directory Service
ipa: INFO: The ipactl command was successful

[root@ipa2 ~]# ipactl start
Starting Directory Service
Failed to read data from service file: Failed to get list of services to probe status!
Configured hostname 'ipa2.linux.schnell.er' does not match any master server in LDAP:
No master found because of error: no such entry
Shutting down

After downgrading 389-ds-base-libs and 389-ds-base from 2.0.4-1.fc34 to 2.0.3-3.fc34 everything is working fine.

Comment 1 Adam Williamson 2021-04-09 19:54:10 UTC
We've seen the same problem in openQA testing. We have tests in openQA that deploy a domain controller (and client) as an earlier release, then upgrade to the release under test and check the controller still works; that type of test failed on the 389-ds-base-2.0.4-1.fc34 update, where it tested upgrade from current F33 to F34 with the update included:


the same type of test also failed for F33 to F35 upgrade on today's Rawhide compose, where 389-ds-base-2.0.4-1.fc35 landed:


The test for update from F34 to Rawhide did not fail, I believe because it actually started with 389-ds-base 2.0.4-1.fc34 (looks like that test runs with updates-testing enabled, which it probably shouldn't).

ab wondered if this could be caused by an unclean shutdown during upgrade. The openQA tests upgrade according to the official guidelines: they run `dnf --releasever=XX system-upgrade download` then `dnf system-upgrade reboot`. They make no attempt to trigger any reboot 'manually' after that; they rely on the upgrade process to boot, run the upgrade, and then reboot. If that doesn't happen - i.e. if the system does not eventually return to a login screen without further action on the part of the test system - the test just times out and dies.

Neither the video nor the log of a failed test show any unclean shutdown. You can download https://openqa.fedoraproject.org/tests/849781/file/role_deploy_domain_controller_check-var_log.tar.gz and verify this; running `journalctl -b-1` on the journal it contains shows the log messages from the upgrade boot, which show the IPA upgrade script attempting to run during the upgrade but failing in this way, followed by a clean completion of the upgrade process and a clean shutdown.

Comment 2 thierry bordaz 2021-05-04 15:53:23 UTC
The problem seems to be introduced with Issue 4469 - Backend redesing phase 3a (https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html).
More specifically #4469 made some changes in the access of the entryrdn database. Entryrdn database looks valid (after upgrade) but upgraded DS can not rebuild a DN using this database.
To process any request the server need to fetch entries using their DN. For this it uses entryrdn DB but fails. The reason of the failure is not yet identified.

Comment 3 Adam Williamson 2021-05-10 18:52:11 UTC
*** Bug 1956560 has been marked as a duplicate of this bug. ***

Comment 4 Adam Williamson 2021-05-10 18:52:50 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2021-9f1db7e096 has introduced this to F34 again.

Comment 5 Dirk 2021-05-11 06:12:12 UTC
Just an info. With the latest package of 389-ds-base-2.0.4-2 from repo testing-update i have the same problems. Downgrade to Version 389-ds-base-2.0.3 and everything works fine.

Comment 6 thierry bordaz 2021-05-11 13:10:24 UTC
The RC of search failure is that  database are opened with the wrong file extension that is '.db' in 2.0.3 but '.db4' in 2.0.4.
It should not be '.db4' so I guess it is due to an invalid build. The extension should be '.db' for any DB_VERSION > 5.x (taken from db.h). Continue investigation

Comment 7 Pierre Rogier 2021-05-12 08:17:02 UTC
Indeed, it is a side effect of moving the bdb dependencies out of the backend:
db.h is no more included in back-ldbm.h 
so DB_VERSION_MAJOR is no more defined and 
#if 1000 * DB_VERSION_MAJOR + 100 * DB_VERSION_MINOR >= 5000 became false 
Should move that check in db-bdb plugin and define a callback to get the filename-suffix from the plugin

To track the upstream change, I open issue #4765: database suffix unexpectdly changed from .db to .db4

Comment 8 thierry bordaz 2021-05-13 13:43:14 UTC
Fix pushed upstream => POST

Comment 9 thierry bordaz 2021-05-20 09:52:44 UTC
An upstream build (389-ds-base-2.0.4-3.fc35) fixes this issue. It contains [2] on top of 2.0.4-1.

[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1749065
[2] https://github.com/389ds/389-ds-base/issues/4765

Comment 11 thierry bordaz 2021-05-20 11:28:57 UTC
TESTCASE from frenaud

# dnf update -y
# dnf install -y freeipa-server freeipa-server-dns koji
# mkdir /tmp/pkgs; cd /tmp/pkgs

# koji download-build --arch x86_64 --arch noarch 1733367  <--- this is the failing build 2.0.4-1
# koji download-build --debuginfo --arch x86_64 --arch noarch 1749065  <--- this it the valid build 2.0.4-3

# rpm -qa freeipa-server 389-ds-base

# edit /etc/host to add "<ip_add> `hostname`.ipa.test"
# hostname `hostname`.ipa.test
# ipa-server-install --setup-dns --auto-forwarders -a Secret123 -p Secret123 --domain ipa.test --realm IPA.TEST -U

# dnf update -y /tmp/pkgs/*rpm
# rpm -qa freeipa-server 389-ds-base
# ipactl stop
# ipactl start --ignore-service-failures
# ldapsearch -D cn=directory\ manager -w Secret123 -b dc=ipa,dc=test
search: 2
result: 32 No such object

Comment 12 Adam Williamson 2021-05-20 16:10:00 UTC
Please avoid private comments on Fedora bugs unless they really need to be private. I don't see anything in the above two comments that needs to be private, could you please mark them public unless I'm missing something, Thierry?

State should be MODIFIED or ON_QA if the build is done and tagged, yeah. Rawhide hasn't been composing for the last couple of days so I can't confirm the fix until we get a compose.

Comment 14 Adam Williamson 2021-05-20 17:42:28 UTC
Note the bug is filed against F34, but it has not gone into F34 stable. Several updates for F34 have been submitted with the bug but they've all been rejected. Another was submitted just today:


please do backport the fix to F34 as well and then we might be able to actually approve an F34 update.

Comment 15 Adam Williamson 2021-07-02 21:54:35 UTC
The fix is now in for both F34 and Rawhide, closing.