Created attachment 1734932 [details] /var/log tarball from the 20201130.n.0 failure In Fedora-Rawhide-20201129.n.0 and later, FreeIPA server deployment fails. At the end of the deployment process, the attempt to start ipa.service fails, apparently due to bind not starting correctly, with these errors: Nov 30 05:14:09 ipa001.domain.local named[33077]: loading DynDB instance 'ipa' driver '/usr/lib64/bind/ldap.so' Nov 30 05:14:09 ipa001.domain.local named[33077]: bind-dyndb-ldap version 11.6 compiled at 00:00:00 Nov 23 2020, compiler 10.2.1 20201112 (Red Hat 10.2.1-8) Nov 30 05:14:09 ipa001.domain.local named[33077]: unable to open directory 'dyndb-ldap', working directory is '/var/named': permission denied Nov 30 05:14:09 ipa001.domain.local named[33077]: LDAP config validation failed for database 'ipa': permission denied Nov 30 05:14:09 ipa001.domain.local named[33077]: dynamic database 'ipa' configuration failed: permission denied Nov 30 05:14:09 ipa001.domain.local named[33077]: loading configuration: permission denied Nov 30 05:14:09 ipa001.domain.local named[33077]: exiting (due to fatal error) Nov 30 05:14:09 ipa001.domain.local systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE Nov 30 05:14:09 ipa001.domain.local systemd[1]: named.service: Failed with result 'exit-code'. Nov 30 05:14:09 ipa001.domain.local systemd[1]: Failed to start Berkeley Internet Name Domain (DNS). Filing against bind-dyndb-ldap as we seem to be in that code. I don't see any relevant AVCs in the log, so I don't *think* this is SELinux. Proposing as an F34 Beta blocker as a violation of Basic criterion "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages" - https://fedoraproject.org/wiki/Basic_Release_Criteria#FreeIPA_server_requirements
Unfortunately, the tarball does not include /var/named and I cannot reproduce this failure on a fresh Rawhide deployment myself.
Can you provide output of those two commands? ls -ldZ /var/named/{,dyndb-ldap} getsebool -a | grep named
Here ya go. ls -ldZ /var/named/{,dyndb-ldap} drwxrwx--T. 6 root named system_u:object_r:named_zone_t:s0 145 Dec 10 17:28 /var/named/ drwxrwx---. 3 root root system_u:object_r:named_zone_t:s0 17 Dec 10 17:27 /var/named/dyndb-ldap getsebool -a | grep named named_tcp_bind_http_port --> off named_write_master_zones --> on
I've got it reproduced in a lab. Something modifies dyndb-ldap folder permissions to root:root: [root@replica1 ~]# rpm -qlv bind-dyndb-ldap |grep named/dyndb-ldap drwxrwx--- 2 root named 0 Nov 23 14:12 /var/named/dyndb-ldap [root@replica1 ~]# ls -ld /var/named/dyndb-ldap drwxrwx---. 3 root root 4096 Dec 11 09:34 /var/named/dyndb-ldap Investigating.
Something in the deployment process changes ownership of /var/named/dyndb-ldap from root:named to root:root. This causes startup of named to fail because bind-dyndb-ldap driver cannot operate on /var/named/dyndb-ldap/* content. https://openqa.fedoraproject.org/tests/739654#step/role_deploy_domain_controller/35 shows it for latest Rawhide compose: ... Dec 13 19:31:59 localhost named[33269]: generating session key for dynamic DNS Dec 13 19:31:59 localhost named[33269]: sizing zone task pool based on 6 zones Dec 13 19:31:59 localhost named[33269]: none:106: 'max-cache-size 90%' - setting to 1759MB (out of 1954MB) Dec 13 19:32:00 localhost named[33269]: set up managed keys zone for view _default, file '/var/named/dynamic/managed-keys.bind' Dec 13 19:32:00 localhost named[33269]: loading DynDB instance 'ipa' driver '/usr/lib64/bind/ldap.so' Dec 13 19:32:00 localhost named[33269]: bind-dyndb-ldap version 11.6 compiled at 00:00:00 Nov 23 2020, compiler 10.2.1 20201112 (Red Hat 10.2.1-8) Dec 13 19:32:00 localhost named[33269]: unable to open directory 'dyndb-ldap', working directory is '/var/named': permission denied Dec 13 19:32:00 localhost named[33269]: LDAP config validation failed for database 'ipa': permission denied Dec 13 19:32:00 localhost named[33269]: dynamic database 'ipa' configuration failed: permission denied Dec 13 19:32:00 localhost named[33269]: loading configuration: permission denied Dec 13 19:32:00 localhost named[33269]: exiting (due to fatal error) Dec 13 19:32:00 localhost systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE Dec 13 19:32:00 localhost systemd[1]: named.service: Failed with result 'exit-code'. Dec 13 19:32:00 localhost systemd[1]: Failed to start Berkeley Internet Name Domain (DNS). If we look at bind-dyndb-ldap package, it has intended permissions: [root@m2 ~]# rpm -qlv bind-dyndb-ldap|grep var drwxrwx--- 2 root named 0 Nov 23 16:12 /var/named/dyndb-ldap Interesting enough, if I'd re-run ipa-server-install after fixing the ownership back to root:named, everything succeeds. So there is something happening during installation that triggers a change. Since named user account is created during named installation with [root@m2 ~]# rpm -q --scripts bind|grep var/named /usr/sbin/useradd -u 25 -r -N -M -g named -s /sbin/nologin -d /var/named -c Named named >/dev/null 2>&1 || :; the home directory /var/named should not be created or otherwise modified during installation. Yet, we have no explicit handling of /var/named/dyndb-ldap in the installer code. There are places where we create /var/named/dyndb-ldap/ipa but we use Python's os.mkdir() there which should not modify permissions of the parent folder.
So for purposes of figuring out that "why" - I filed this bug as broken since 20201129.n.0. The previous successful compose was 20201119.n.0; the tests also failed in that compose, but on a different error in bind, `initializing DST: no engine`. I'm not sure if that error would have happened before or after this one, i.e. I'm not sure whether 20201119.n.0 *also* contained this bug or not. The tests passed in the compose before that, 20201118.n.0, so we can say the bug wasn't around then.
20201119.n.0 was including FreeIPA 4.9.0rc1 which had a bug in not setting up bind properly. So it may have had the same issue but we never got to it as named was unable to start. We have some idea with https://github.com/freeipa/freeipa/pull/5340, I'll do a build once that fix passes CI.
It passed CI...
I built freeipa-4.9.0-0.6.rc3.fc34 with the patches from PR#5340: https://bodhi.fedoraproject.org/updates/FEDORA-2020-0ff70f500a
I hacked the server deployment test on openQA staging to run with that build, but it still failed in apparently the same way :( https://openqa.stg.fedoraproject.org/tests/988234
I think I found the issue. bind-dyndb-ldap depends on 'named' group and after some refactoring work to support both bind-pkcs11 and plain bind, it lost 'Requires: bind' for plain bind case (rawhide). I added Requires(pre): bind/bind-pkcs11 for both ways and verified that it indeed works and orders the packages properly. https://bodhi.fedoraproject.org/updates/FEDORA-2020-b1076a69a2 provides an updated package.
That fix looks good, I tested it on openQA staging and the tests passed. Should show up fixed on prod with the next Rawhide compose. Thanks.
Is indeed fixed now.