Bug 1197204

Summary: /etc/resolv.conf should be managed by the network daemon not systemd
Product: [Fedora] Fedora Reporter: Brian Lane <bcl>
Component: systemdAssignee: systemd-maint
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: admiller, awilliam, dshea, johannbg, jsynacek, lnykryn, msekleta, omerusta, s, systemd-maint, vpavlin, walters, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemd-220-1.fc23 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-10 17:05:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian Lane 2015-02-27 19:17:48 UTC
On a rawhide boot.iso, built with systemd-219-7, systemd is creating a broken symlink to /run/systemd/resolve/resolve.conf at boot time. This causes NetworkManager to not write resolv.conf and DNS resolution breaks.

I don't think systemd should be creating symlinks for things that it doesn't control. By creating the file it is claiming it for itself and other potential users see that and don't touch it. It should only be created when it is actually going to be used (by systemd-networkd?) not by default when it sees it missing.

Comment 1 Lennart Poettering 2015-02-28 11:50:39 UTC
I disagree with this.

Generally I think it would be a wise idea to enforce a clear ownership scheme for /etc/resolv.conf: make it a symlink that points to the file managed by your network managament solution of choice, and ownership is clear.

Secondly, constantly writing to /etc is a really bad idea anyway, hence your network management solution of choice should write to a file in /run or so, and make the version a symlink in /etc a fixed symlinks.

But most importantly, and ignoring all of the above: systemd only creates the symlink as fallback when nothing else created it. This logic is only ever relevant when /etc is entirely unpopulated at boot. This is safe since networking with a non-existing /etc/resolv.conf doesn't really work.

A normal system will never even see this behaviour as /etc is installed populated. An empty /etc is seen only on specialized "state-less" systems where /etc is entirely emptied out.

Sorry, but I don't consider this a bug, and will hence close this WONTFIX.

Comment 2 Colin Walters 2015-03-17 21:05:47 UTC
Note this originated in https://bugzilla.redhat.com/show_bug.cgi?id=1116651

Comment 3 Colin Walters 2015-03-17 21:11:44 UTC
Lennart, the problem isn't quite that simple, because in order to support pre-boot configuration in kickstart (via %post), systemd-tmpfiles ends up being run *before* the system is actually booted.

In the %post environment we do want things to look populated for scripts to be able to run, etc.

And in fact, what's invoking this is `systemd.spec' has a `%post` that runs `systemd-tmpfiles --create >/dev/null 2>&1 || :` among other things.

If we're saying that tmpfiles shouldn't run until the system is actually booted, we could do something like this in systemd.spec's `%post`:

if test -z "${ANACONDA_INSTALL}"; then
  systemd-tmpfiles --create >/dev/null 2>&1 || :
fi

However, this is likely to have large ramifications.

Comment 4 Zbigniew Jędrzejewski-Szmek 2015-03-23 02:29:48 UTC
(In reply to Colin Walters from comment #3)
> If we're saying that tmpfiles shouldn't run until the system is actually
> booted, we could do something like this in systemd.spec's `%post`:
> 
> if test -z "${ANACONDA_INSTALL}"; then
>   systemd-tmpfiles --create >/dev/null 2>&1 || :
> fi
> 
> However, this is likely to have large ramifications.
Yeah, this kind of change would most likely have to be propagated "downwards" to other packages which would suddenly encounter a different fs layout than expected. I'd expect this to be a source of annoying bugs.

We really should solve this between systemd and anaconda, without making any changes visible to any other package.

Comment 5 Colin Walters 2015-04-08 01:38:06 UTC
I'm reopening based on comment #3 and comment #4.

Comment 6 Colin Walters 2015-05-21 14:57:37 UTC
So this is still breaking Atomic in Fedora.

I think systemd-tmpfiles needs to grow some knowledge about whether it's being used in an "installroot", or whether it's actually running on boot.

Things like the resolv.conf handling should then only happen on real boots.

But it'd still be better yet for this specific case I think if systemd-networkd took ownership of resolv.conf when it started, not via tmpfiles.

Comment 7 Colin Walters 2015-05-21 14:58:03 UTC
Example failed image build: http://koji.fedoraproject.org/koji/taskinfo?taskID=9816824

Comment 8 Zbigniew Jędrzejewski-Szmek 2015-05-21 15:05:23 UTC
(In reply to Colin Walters from comment #6)
> Things like the resolv.conf handling should then only happen on real boots.
OK, what about the following patch:

--- tmpfiles.d/etc.conf.m4
+++ tmpfiles.d/etc.conf.m4
@@ -11,7 +11,7 @@ L /etc/os-release - - - - ../usr/lib/os-release
 L /etc/localtime - - - - ../usr/share/zoneinfo/UTC
 L+ /etc/mtab - - - - ../proc/self/mounts
 m4_ifdef(`ENABLE_RESOLVED',
-L /etc/resolv.conf - - - - ../run/systemd/resolve/resolv.conf
+L! /etc/resolv.conf - - - - ../run/systemd/resolve/resolv.conf
 )m4_dnl
 C /etc/nsswitch.conf - - - -
 m4_ifdef(`HAVE_PAM',

?

Comment 9 Colin Walters 2015-05-21 15:29:57 UTC
Oh, '!' already exists, I didn't know that.  Looks like it will work to me.  I'll test it now.

Comment 10 Colin Walters 2015-05-21 18:16:24 UTC
Confirmed that patch will fix the bug.

I built a local tree with current systemd and kickstart, verified it failed locally with the same symptom.

Then I built a local systemd with that patch, composed an ostree tree from it, then pointed the rawhide installer at it, and it worked.

Comment 11 Zbigniew Jędrzejewski-Szmek 2015-05-21 18:24:15 UTC
OK, thanks for testing. This should land in rawhide shortly: http://cgit.freedesktop.org/systemd/systemd/commit/?id=6921bf11fa.

Comment 12 David Shea 2015-07-23 19:12:39 UTC
*** Bug 1203482 has been marked as a duplicate of this bug. ***

Comment 13 Adam Williamson 2015-09-18 15:55:09 UTC
This seems to be back in recent Rawhide nightlies. Once again /etc/resolv.conf is a symlink to a systemd location that does not exist, so NetworkManager fails to overwrite it:

Sep 18 15:45:24 localhost NetworkManager[1417]: <info>  DNS: using resolv.conf manager 'none'
Sep 18 15:45:28 localhost NetworkManager[1417]: <warn>  could not commit DNS changes: Could not stat /etc/resolv.conf: No such file or directory

/etc/resolv.conf is a symlink to /run/systemd/resolve/resolv.conf , which does not exist. See https://bugzilla.redhat.com/show_bug.cgi?id=1264364 .

Comment 14 Adam Williamson 2015-09-24 23:15:30 UTC
Note, I'm not sure this fix ever did exactly what we thought it did, as even on Fedora 23 Beta, a systemd-tmpfiles-setup.service that runs early in boot (before NetworkManager) does try and create a /etc/resolv.conf , then immediately tries to remove it(?) But somehow on F23 NetworkManager manages to handle this, while on current Rawhide it doesn't. See https://bugzilla.redhat.com/show_bug.cgi?id=1264364 for more details, I'm still looking into it.