Bug 1902811 - FreeIPA server deployment fails since Fedora-Rawhide-20201129.n.0 with bind "unable to open directory 'dyndb-ldap', working directory is '/var/named': permission denied" error
Summary: FreeIPA server deployment fails since Fedora-Rawhide-20201129.n.0 with bind "...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: bind-dyndb-ldap
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Petr Vobornik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks: F34BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2020-11-30 18:32 UTC by Adam Williamson
Modified: 2021-01-02 19:55 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-02 19:55:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log tarball from the 20201130.n.0 failure (2.66 MB, application/gzip)
2020-11-30 18:32 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2020-11-30 18:32:52 UTC
Created attachment 1734932 [details]
/var/log tarball from the 20201130.n.0 failure

In Fedora-Rawhide-20201129.n.0 and later, FreeIPA server deployment fails. At the end of the deployment process, the attempt to start ipa.service fails, apparently due to bind not starting correctly, with these errors:

Nov 30 05:14:09 ipa001.domain.local named[33077]: loading DynDB instance 'ipa' driver '/usr/lib64/bind/ldap.so'
Nov 30 05:14:09 ipa001.domain.local named[33077]: bind-dyndb-ldap version 11.6 compiled at 00:00:00 Nov 23 2020, compiler 10.2.1 20201112 (Red Hat 10.2.1-8)
Nov 30 05:14:09 ipa001.domain.local named[33077]: unable to open directory 'dyndb-ldap', working directory is '/var/named': permission denied
Nov 30 05:14:09 ipa001.domain.local named[33077]: LDAP config validation failed for database 'ipa': permission denied
Nov 30 05:14:09 ipa001.domain.local named[33077]: dynamic database 'ipa' configuration failed: permission denied
Nov 30 05:14:09 ipa001.domain.local named[33077]: loading configuration: permission denied
Nov 30 05:14:09 ipa001.domain.local named[33077]: exiting (due to fatal error)
Nov 30 05:14:09 ipa001.domain.local systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE
Nov 30 05:14:09 ipa001.domain.local systemd[1]: named.service: Failed with result 'exit-code'.
Nov 30 05:14:09 ipa001.domain.local systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).

Filing against bind-dyndb-ldap as we seem to be in that code. I don't see any relevant AVCs in the log, so I don't *think* this is SELinux.

Proposing as an F34 Beta blocker as a violation of Basic criterion "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages" - https://fedoraproject.org/wiki/Basic_Release_Criteria#FreeIPA_server_requirements

Comment 1 Alexander Bokovoy 2020-12-01 09:22:23 UTC
Unfortunately, the tarball does not include /var/named and I cannot reproduce this failure on a fresh Rawhide deployment myself.

Comment 2 Petr Menšík 2020-12-10 20:17:45 UTC
Can you provide output of those two commands?

ls -ldZ /var/named/{,dyndb-ldap}
getsebool -a | grep named

Comment 3 Adam Williamson 2020-12-10 22:31:49 UTC
Here ya go.

ls -ldZ /var/named/{,dyndb-ldap}
drwxrwx--T. 6 root named system_u:object_r:named_zone_t:s0 145 Dec 10 17:28 /var/named/
drwxrwx---. 3 root root  system_u:object_r:named_zone_t:s0  17 Dec 10 17:27 /var/named/dyndb-ldap

getsebool -a | grep named
named_tcp_bind_http_port --> off
named_write_master_zones --> on

Comment 4 Alexander Bokovoy 2020-12-11 12:40:38 UTC
I've got it reproduced in a lab. Something modifies dyndb-ldap folder permissions to root:root:

[root@replica1 ~]# rpm -qlv bind-dyndb-ldap |grep named/dyndb-ldap
drwxrwx---    2 root     named                       0 Nov 23 14:12 /var/named/dyndb-ldap

[root@replica1 ~]# ls -ld /var/named/dyndb-ldap
drwxrwx---. 3 root root 4096 Dec 11 09:34 /var/named/dyndb-ldap

Investigating.

Comment 5 Alexander Bokovoy 2020-12-14 09:34:20 UTC
Something in the deployment process changes ownership of
/var/named/dyndb-ldap from root:named to root:root. This causes startup
of named to fail because bind-dyndb-ldap driver cannot operate on
/var/named/dyndb-ldap/* content.

https://openqa.fedoraproject.org/tests/739654#step/role_deploy_domain_controller/35
shows it for latest Rawhide compose:

...
Dec 13 19:31:59 localhost named[33269]: generating session key for dynamic DNS
Dec 13 19:31:59 localhost named[33269]: sizing zone task pool based on 6 zones
Dec 13 19:31:59 localhost named[33269]: none:106: 'max-cache-size 90%' - setting to 1759MB (out of 1954MB)
Dec 13 19:32:00 localhost named[33269]: set up managed keys zone for view _default, file '/var/named/dynamic/managed-keys.bind'
Dec 13 19:32:00 localhost named[33269]: loading DynDB instance 'ipa' driver '/usr/lib64/bind/ldap.so'
Dec 13 19:32:00 localhost named[33269]: bind-dyndb-ldap version 11.6 compiled at 00:00:00 Nov 23 2020, compiler 10.2.1 20201112 (Red Hat 10.2.1-8)
Dec 13 19:32:00 localhost named[33269]: unable to open directory 'dyndb-ldap', working directory is '/var/named': permission denied
Dec 13 19:32:00 localhost named[33269]: LDAP config validation failed for database 'ipa': permission denied
Dec 13 19:32:00 localhost named[33269]: dynamic database 'ipa' configuration failed: permission denied
Dec 13 19:32:00 localhost named[33269]: loading configuration: permission denied
Dec 13 19:32:00 localhost named[33269]: exiting (due to fatal error)
Dec 13 19:32:00 localhost systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE
Dec 13 19:32:00 localhost systemd[1]: named.service: Failed with result 'exit-code'.
Dec 13 19:32:00 localhost systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).

If we look at bind-dyndb-ldap package, it has intended permissions:

[root@m2 ~]# rpm -qlv bind-dyndb-ldap|grep var
drwxrwx---    2 root     named                       0 Nov 23 16:12 /var/named/dyndb-ldap

Interesting enough, if I'd re-run ipa-server-install after fixing the
ownership back to root:named, everything succeeds. So there is something
happening during installation that triggers a change. Since named user
account is created during named installation with

[root@m2 ~]# rpm -q --scripts bind|grep var/named
  /usr/sbin/useradd  -u 25 -r -N -M -g named -s /sbin/nologin -d /var/named -c Named named >/dev/null 2>&1 || :;

the home directory /var/named should not be created or otherwise
modified during installation.

Yet, we have no explicit handling of /var/named/dyndb-ldap in the installer code. There are places where we create /var/named/dyndb-ldap/ipa but we use Python's os.mkdir() there which should not modify permissions of the parent folder.

Comment 6 Adam Williamson 2020-12-14 18:19:15 UTC
So for purposes of figuring out that "why" - I filed this bug as broken since 20201129.n.0. The previous successful compose was 20201119.n.0; the tests also failed in that compose, but on a different error in bind, `initializing DST: no engine`. I'm not sure if that error would have happened before or after this one, i.e. I'm not sure whether 20201119.n.0 *also* contained this bug or not.

The tests passed in the compose before that, 20201118.n.0, so we can say the bug wasn't around then.

Comment 7 Alexander Bokovoy 2020-12-14 19:57:33 UTC
20201119.n.0 was including FreeIPA 4.9.0rc1 which had a bug in not setting up bind properly. So it may have had the same issue but we never got to it as named was unable to start.

We have some idea with https://github.com/freeipa/freeipa/pull/5340, I'll do a build once that fix passes CI.

Comment 8 Adam Williamson 2020-12-15 17:12:11 UTC
It passed CI...

Comment 9 Alexander Bokovoy 2020-12-16 06:32:37 UTC
I built freeipa-4.9.0-0.6.rc3.fc34 with the patches from PR#5340: https://bodhi.fedoraproject.org/updates/FEDORA-2020-0ff70f500a

Comment 10 Adam Williamson 2020-12-16 18:07:39 UTC
I hacked the server deployment test on openQA staging to run with that build, but it still failed in apparently the same way :( https://openqa.stg.fedoraproject.org/tests/988234

Comment 11 Alexander Bokovoy 2020-12-17 08:58:45 UTC
I think I found the issue. bind-dyndb-ldap depends on 'named' group and after some refactoring work to support both bind-pkcs11 and plain bind, it lost 'Requires: bind' for plain bind case (rawhide).

I added Requires(pre): bind/bind-pkcs11 for both ways and verified that it indeed works and orders the packages properly.

https://bodhi.fedoraproject.org/updates/FEDORA-2020-b1076a69a2 provides an updated package.

Comment 12 Adam Williamson 2020-12-17 20:10:58 UTC
That fix looks good, I tested it on openQA staging and the tests passed. Should show up fixed on prod with the next Rawhide compose. Thanks.

Comment 13 Adam Williamson 2021-01-02 19:55:33 UTC
Is indeed fixed now.


Note You need to log in before you can comment on or make changes to this bug.