Bug 1880628 - FreeIPA server doesn't get along well with systemd-resolved (need to manually disable it)
Summary: FreeIPA server doesn't get along well with systemd-resolved (need to manually...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: freeipa
Version: 33
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: IPA Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa AcceptedFreezeException
Depends On:
Blocks: F33BetaFreezeException
TreeView+ depends on / blocked
 
Reported: 2020-09-18 20:06 UTC by Adam Williamson
Modified: 2020-10-23 21:26 UTC (History)
14 users (show)

Fixed In Version: freeipa-4.8.10-5.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-09 23:56:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure freeipa issue 8275 0 None None None 2020-09-22 06:59:04 UTC

Description Adam Williamson 2020-09-18 20:06:34 UTC
In Fedora 33, systemd-resolved is installed and enabled by default (see https://fedoraproject.org/wiki/Changes/systemd-resolved ).

FreeIPA servers don't seem to be very compatible with systemd-resolved. I noticed this because the openQA replica tests started failing after the latest systemd update - systemd-246.4-2.fc33 - landed in Fedora-33-20200917.n.0 . The failure happens during replica deployment, and seems to be in a check that the 'master' server can resolve the hostname of the replica, which it can't. I think the problem is that it winds up asking the systemd-resolved local stub resolver rather than the bind server that was set up as part of the FreeIPA server deployment (and to which a record is added for the replica, I think, as part of the replica deployment process).

If I tweak the openQA tests to disable systemd-resolved, thus:

systemctl stop systemd-resolved.service
systemctl disable systemd-resolved.service
rm -f /etc/resolv.conf
systemctl restart NetworkManager

before doing the server and replica deployments, the tests pass.

Not sure how we want to handle this. Have ipa-server-install and ipa-replica-install disable systemd-resolved if it's running and they were asked to set up bind? Ask people to do it in the instructions? Something else?

Comment 1 Adam Williamson 2020-09-18 20:12:15 UTC
I don't think this quite meets any of the blocker requirements. Notably I think any *other* system in the domain should actually be able to look up other systems in the domain successfully, because the bind server probably does serve external requests; systemd-resolved only runs a local stub resolver. If my idea as to what's going on is correct, it's only things running on *servers* that - depending on exactly how the lookup is done - may not be able to look up other systems in the domain. We also have a caveat for Beta that "moderate workarounds" are okay, and I think "disable systemd-resolved before you deploy" counts as a moderate workaround. We don't have explicit replica requirements in the criteria, though we probably should.

Gonna propose as a Beta FE, though, in case we do come up with something safe enough and fast enough.

Comment 2 Zbigniew Jędrzejewski-Szmek 2020-09-19 11:53:34 UTC
I think freeipa installation should tell resolved about the master server. Maybe drop in a file like

#/etc/systemd/resolved.conf.d/freeipa.conf
[Resolve]
DNS=1.2.3.4
Domains=~example.com

and restart systemd-resolved. This will redirect the queries for example.com and subdomains to the
specified servers.

(The dbus api currently allows setting dns servers for links, and not the global ones. But I don't
think freeipa needs to set this dynamically.)

Comment 3 Alexander Bokovoy 2020-09-19 18:15:20 UTC
For NetworkManager integration we are doing a static setup too right now but I'd say it needs a better approach as well:

# cat /etc/NetworkManager/conf.d/zzz-ipa.conf

# auto-generated by IPA installer
[main]
dns=default

[global-dns]
searches=ipa.test

[global-dns-domain-*]
servers=127.0.0.1

Sure, we can extend to generate a similar snippet as a first step. 

However, FreeIPA provides a full-featured DNS server with dynamic DNS zone management. Which means zones can be added and removed dynamically by administrators at any stage. On the other hand, in a proper DNS configuration you should not force your resolver client to do these selective redirects for a locally-running DNS server.

Comment 4 Adam Williamson 2020-09-21 19:06:13 UTC
Discussed at 2020-09-21 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2020-09-21/f33-blocker-review.2020-09-21-16.00.html . Accepted as a freeze exception issue as it'd be helpful for FreeIPA deployments to include any fix we happen to come up with for this.

Comment 5 Christian Heimes 2020-09-22 06:59:04 UTC
The FreeIPA team knew for months that the introduction of systemd-resolved would cause issues with our DNS feature, see https://pagure.io/freeipa/issue/8275. We could not start testing systemd-resolved integration until systemd-resolved was enabled on F33.

Comment 6 Alexander Bokovoy 2020-09-22 07:14:55 UTC
Note that FESCO ticket https://pagure.io/fesco/issue/2381 which approved systemd-resolved mentioned no need to coordinate with other components than authselect. It simply ignored FreeIPA and everything worked because systemd-resolved had broken symlink creation which deactivated its operation. When that symlink was fixed in the update with systemd 246.4-2 (second attempt to fix bug 1873856), it broke FreeIPA.

Comment 7 Alexander Bokovoy 2020-09-22 07:29:35 UTC
Investigation:

 - we can reuse 'dns=systemd-resolved' option of NetworkManager to trigger propagation of the parameters we already set up for NetworkManager to systemd resolved, this might be the easiest fix in FreeIPA

 - conversion of the /etc/resolv.conf to symlink would break a backup of /etc/resolv.conf in IPA code, we need to handle this

 - we need to detect activation of systemd resolved -- perhaps by noting that /etc/resolv.conf is a symlink now.

Comment 8 Alexander Bokovoy 2020-09-23 08:19:12 UTC
Christian's pull request https://github.com/freeipa/freeipa/pull/5125 implements basic support for systemd-resolved.

Comment 11 Alexander Bokovoy 2020-09-24 11:53:46 UTC
We are planning to release FreeIPA 4.8.10 which will contain these changes before the end of this week.

Comment 12 Fedora Update System 2020-09-26 09:42:39 UTC
FEDORA-2020-e9e815177e has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

Comment 13 Alexander Bokovoy 2020-09-26 15:02:04 UTC
I think we fixed the most immediate issues with systemd-resolved visible in OpenQA -- base installation of F33 domain controller succeeds, upgrade of F32 to F33 succeeds as well. Original code in FreeIPA 4.8.10 to support upgrade from non-systemd-resolved to systemd-resolved configuration did not work but freeipa-4.8.10-2.fc33 fixes it. A build freeipa-4.8.10-2.fc33 with PR https://github.com/freeipa/freeipa/pull/5153 succeeded in OpenQA where previous build freeipa-4.8.10-1.fc33 did fail.

OpenQA run results: https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=33&build=Update-FEDORA-2020-e9e815177e

It contains one warning for AVCs generated by systemd attempting to remove self-created server certificate in 389-ds on service shutdown which exists on F32 as well. This does not prevent FreeIPA operation.

Comment 14 Fedora Update System 2020-09-27 02:15:53 UTC
FEDORA-2020-e9e815177e has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-e9e815177e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2020-09-27 15:23:33 UTC
FEDORA-2020-e9e815177e has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

Comment 16 Fedora Update System 2020-09-28 01:01:29 UTC
FEDORA-2020-e9e815177e has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-e9e815177e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2020-09-28 12:55:59 UTC
FEDORA-2020-e9e815177e has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

Comment 18 Fedora Update System 2020-09-29 01:16:27 UTC
FEDORA-2020-e9e815177e has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-e9e815177e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2020-09-29 11:15:47 UTC
FEDORA-2020-e9e815177e has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

Comment 20 Fedora Update System 2020-09-30 01:09:55 UTC
FEDORA-2020-e9e815177e has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-e9e815177e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-e9e815177e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 Adam Williamson 2020-10-02 22:21:02 UTC
Note openQA results here don't tell you a lot because I already put a workaround for the problem in the tests (we disable systemd-resolved before doing anything else, if it is enabled). To truly test the fix we'd need to disable that workaround and rerun the tests. I'm currently off sick and only have my cellphone, but will try and do this when next I can.

Comment 22 Adam Williamson 2020-10-04 17:07:53 UTC
Another thing I haven't checked here: what happens on upgrade. The openQA server upgrade test is passing, but I'm not sure it's a sufficient check of the situation with DNS after an upgrade from F32 to F33. I believe the Change says that upgrading to F33 should enable resolved. We should check if that actually happens when the system is deployed as a FreeIPA server, and if so, whether we need to make the FreeIPA upgrade bits do something when resolved gets enabled across an upgrade...

Comment 23 Adam Williamson 2020-10-04 17:57:06 UTC
With the openQA "disable systemd-resolved" workaround disabled, the replica test run on the update fails in the same way as I initially reported in this bug. So I don't think the fix is working.

Comment 24 Christian Heimes 2020-10-09 11:26:36 UTC
Hi Adam,

Alexander asked me to look at the test script https://openqa.fedoraproject.org/tests/684390/modules/realmd_join_sssd/steps/1/src . I noticed a few things

* line 25 overwrites /etc/resolv.conf
* lines 28 to 31 modify NM with "nmcli con mod". If I understand the code correctly then it configures NM to use the host's IP address as DNS server, not the primary server IP address.
* lines 42 to 54 disable systemd-resolved on replica
* line 66 + lines 76-78 pass forwarders to installer explicitly

Please change the installation procedure:

* Don't modify /etc/resolv.conf

* on the replica use 172.16.2.100 (primary server) as DNS server. You can also use "resolvectl dns eth0 172.16.2.100" instead of NM to temporary set a default DNS server.

* instead of explicit --forwarder=IP please use --auto-forwarders in all steps. ipa-[server,replica,dns]-install support the option. The feature detects if the system runs with systemd-resolved or /etc/resolv.conf and gets a correct list of forwarders. get_host_dns() may return wrong data if it does not know how to deal with systemd-resolved.

* install replica with explicit IP address (ipa-replica-install --ip-address=$server_ip ...) or create the host entry with IP address on the primary server before you run ipa-replica-install (ipa host-add $replica --ip-address=$replica_ip). Either approach ensures correct DNS records for the replica.

Comment 25 Adam Williamson 2020-10-09 17:12:39 UTC
"* line 25 overwrites /etc/resolv.conf
* lines 28 to 31 modify NM with "nmcli con mod". If I understand the code correctly then it configures NM to use the host's IP address as DNS server, not the primary server IP address."

No, it's the opposite. Both of these are telling the test to use the primary FreeIPA server - the one the replica is replicating - as the DNS server. That's what $server_ip is - the IP of the primary FreeIPA server.

line 25 may be superfluous now as I changed the approach in lines 28 to 31 recently (previously we wrote an ifcfg file there and restarted NetworkManager, but this didn't seem to change resolv.conf till reboot, which is why line 25 was there). But I'll have to test with pre-resolved releases if just lines 28 to 31 are sufficient now.

"* lines 42 to 54 disable systemd-resolved on replica"

...but only `unless ($upd eq "FEDORA-2020-e9e815177e")`. That's the change I explained in #c23: that's the workaround *for this bug* that I now disabled when testing the update that's trying to fix this bug. When the test runs on update FEDORA-2020-e9e815177e, we don't disable systemd-resolved.

"* line 66 + lines 76-78 pass forwarders to installer explicitly"

Yeah. It's passing the Fedora infra DNS servers, basically. `get_host_dns()` is getting the server IPs from *the VM host* - the openQA worker box itself. I can try it again with --auto-forwarders, but IIRC when I first set this up it didn't work. Anyway, I'm pretty sure that isn't breaking anything, because you can eyeball it and see it's using the right IPs, and it works fine without resolved.

"* Don't modify /etc/resolv.conf"

As I said we can try this, but it either needs to be conditional on using resolved or we need to be sure it works on F31 and F32.

"* on the replica use 172.16.2.100 (primary server) as DNS server. You can also use "resolvectl dns eth0 172.16.2.100" instead of NM to temporary set a default DNS server."

172.16.2.100 is not the right IP. That's the IP of the primary server in *another* test group - the non-replica test group. In the replica tests, the primary server is ipa002.domain.local / 172.16.2.106 , and that is the IP we use. (In theory we could re-use the same IP and hostname in both test groups and openvswitch will deal with it for us with vlans, but I like to keep them different for clarity).

"* instead of explicit --forwarder=IP please use --auto-forwarders in all steps. ipa-[server,replica,dns]-install support the option. The feature detects if the system runs with systemd-resolved or /etc/resolv.conf and gets a correct list of forwarders. get_host_dns() may return wrong data if it does not know how to deal with systemd-resolved."

I can try this, but it shouldn't change anything, because the same IPs will be used either way. _post_network_static (which runs earlier) uses the same `get_host_dns()` function to set the resolvers, so --auto-forwarders is just going to pick up the same IPs.

"* install replica with explicit IP address (ipa-replica-install --ip-address=$server_ip ...) or create the host entry with IP address on the primary server before you run ipa-replica-install (ipa host-add $replica --ip-address=$replica_ip). Either approach ensures correct DNS records for the replica."

I can do this, though I don't believe it was in any of the instructions I used to write the tests. How is an admin to know this is necessary? And, of course, it previously worked fine without doing either of these.

Comment 26 Adam Williamson 2020-10-09 18:50:03 UTC
OK, so I re-ran the tests with these changes:

https://pagure.io/fedora-qa/os-autoinst-distri-fedora/c/bffa3d5fcc6f6f8c570b04bb4a56143788349f0e?branch=freeipa-resolved

note I already had the 'skip the resolved workaround if we're testing FEDORA-2020-e9e815177e" code monkeypatched in live, but it wasn't in a git commit; I added it to this branch so it shows in the diff, but it's not really changed. The other changes are new.

With those changes, the tests passed:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&build=Update-FEDORA-2020-e9e815177e&version=33&groupid=2

I checked the logs and verified we really didn't use the resolved workaround - you can check os-autoinst.log for the string 'resolved' and it's not there - so seems like those changes do make the test pass. I think the change that most likely actually did the trick is not editing resolv.conf .

I still need to test this branch on an F31 and F32 update to make sure it doesn't cause problems there.

Comment 27 Fedora Update System 2020-10-09 23:56:11 UTC
FEDORA-2020-e9e815177e has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 28 Adam Williamson 2020-10-23 21:26:06 UTC
Bug fixed, commonbugs not needed. (For the record, the openQA test change worked fine on F31 and F32).


Note You need to log in before you can comment on or make changes to this bug.