Bug 2101527 - resolv.conf missing for %post scripts if systemd-resolved not included in install package set
Summary: resolv.conf missing for %post scripts if systemd-resolved not included in ins...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 37
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Radek Vykydal
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-27 17:48 UTC by Jeff Bastian
Modified: 2022-11-01 14:36 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-11-01 14:36:36 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jeff Bastian 2022-06-27 17:48:19 UTC
Description of problem:
Kickstart %post scripts that need to resolve hostnames fail because the resolv.conf file is missing from the /mnt/sysimage chroot environment:

[anaconda root@kvm-03-guest03 ~]# ls -l /mnt/sysimage/etc/resolv.conf
ls: cannot access '/mnt/sysimage/etc/resolv.conf': No such file or directory

With Beaker internally at Red Hat, this causes kickstart to go into a loop since the %post script cannot tell Beaker that kickstart has finished.  That is, a %post script runs:
  curl http://beaker:8000/postinstall_done/12345
where 12345 is the recipe ID, and Beaker then adjusts the PXE grub config so the system will boot from disk after kickstart.

The problem is larger than Beaker kickstarts, however.  For example, even trying to add a 3rd party dnf repo and install some extra packages in a %post script will fail since it cannot resolve any hostnames.

This was previously reported in bug 2032085 which was fixed and closed, but it seems to have cropped up again.


Version-Release number of selected component (if applicable):
Fedora rawhide as of 2022-06-27

How reproducible:
always

Steps to Reproduce:
1. kickstart with a %post script that uses DNS, e.g.
%post
ping -c 1 fedoraproject.org
%end

Actual results:
ping cannot resolve fedoraproject.org

Expected results:
DNS hostnames resolve in %post scripts

Additional info:

Comment 2 Jeff Bastian 2022-06-27 18:42:33 UTC
Example serial console output from Fedora-ELN-20220624.3 with sanitized hostnames.  Note, the 'fetch' command is a bash function/wrapper around 'curl'.


+ fetch /usr/local/sbin/anamon http://$LAB_CONTROLLER/beaker/anamon3
+ curl -L --remote-time -o /usr/local/sbin/anamon http://$LAB_CONTROLLER/beaker/anamon3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: $LAB_CONTROLLER
...
...
...
+ dnf -y install restraint-rhts
beaker-AppStream                                0.0  B/s |   0  B     00:00    
Errors during downloading metadata for repository 'beaker-AppStream':
  - Curl error (6): Couldn't resolve host name for http://$MIRROR/released/fedora/ELN-rawhide/Fedora-ELN-20220624.3/AppStream/x86_64/os/repodata/repomd.xml [Could not resolve host: $MIRROR]
Error: Failed to download metadata for repo 'beaker-AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
...
...
...

Comment 3 Jeff Bastian 2022-06-27 19:22:30 UTC
Note: This might be a duplicate of bug 2074083

Comment 4 Adam Williamson 2022-06-28 15:41:01 UTC
I wonder if this has the same cause as https://bugzilla.redhat.com/show_bug.cgi?id=2100883 . It's not the same as 2074083.

Comment 5 Zbigniew Jędrzejewski-Szmek 2022-06-29 07:03:19 UTC
Hmm, /etc/resolv.conf seems to be created if systemd-resolved.rpm is installed. If the symlink is fully missing,
then it seems the package is not installed. Are the full logs from that failed build available somewhere?

Comment 6 Adam Williamson 2022-06-29 15:21:52 UTC
Unless I'm missing something, isn't the %packages in the kickstart for the job odd? It looks like it's just this:

%packages --ignoremissing
# Task requirements will be installed by the harness
# no snippet data for packages
chrony
%end

it doesn't pull in core at all. That looks like it'll only include chrony, anything chrony happens to depend on, and anything anaconda pulls into the installed system as needed for filesystems, bootloader etc.?

Comment 7 Adam Williamson 2022-06-29 15:25:27 UTC
Oh, it seems core is included by default even if not specified, okay. But the problem does indeed seem to be that systemd-resolved is not pulled in. The install logs do not show it at all. That is odd, since it's listed as default in core, at least in comps-f37; I'm not sure what comps ELN uses.

Comment 8 Adam Williamson 2022-06-29 15:31:51 UTC
So, comps-eln.xml.in does not include systemd-resolved, which explains why we're not getting it in an ELN install. Are you sure this is affecting Rawhide?

Obviously, resolv.conf should be handled properly if systemd-resolved isn't installed, so there's a bug here. I'm just trying to pin down the angles.

Comment 9 Jeff Bastian 2022-06-29 20:51:09 UTC
Now that you mention it, Rawhide does seem to be ok. I ran a lot of Beaker jobs trying to debug this and probably confused myself. :) It seems this is limited to Fedora ELN at the moment.

Comment 10 Adam Williamson 2022-06-29 21:11:14 UTC
Yeah. So, I can reproduce on Rawhide if I do an install without systemd-resolved.

Things are completely different in Rawhide from how they were in F35/F36 (and all the stuff discussed/described in https://bugzilla.redhat.com/show_bug.cgi?id=2074083 : in Rawhide, anaconda has just completely ditched the code for doing things to `/etc/resolv.conf` in the installed system. It does not try and do anything to it any more, it leaves it entirely up to the installed packages to handle it.

So in the case of systemd-resolved not being installed, I expect the idea is that NetworkManager should handle it. And NetworkManager *does* handle it...on boot of the installed system. But nothing seems to put one in place for the installer %post phase if systemd-resolved isn't installed.

I think this case was not considered in reviewing the anaconda change: https://github.com/rhinstaller/anaconda/pull/3818 . So we probably need to ask anaconda team to re-consider and think about how this case should be handled, especially if RHEL does not intend to adopt resolved by default.

Comment 11 Zbigniew Jędrzejewski-Szmek 2022-06-30 12:16:41 UTC
We have the %posttrans scriptlet to handle the symlink in systemd-resolved.rpm. We could add
similar code to e.g. systemd-libs to create something different when systemd-resolved is *not*
enabled. Though I'm not sure if we'd know what to put there.

Comment 12 Adam Williamson 2022-06-30 15:44:09 UTC
I don't think it's really systemd's "job" to handle this if we're not using systemd-resolved. It'd rather be NetworkManager's, but NetworkManager isn't started in this environment.

How does this work with systemd-resolved, actually? Is anaconda explicitly triggering it to set up its resolver config files in the post-install, before-reboot environment somehow, or does it do it 'naturally'?

Comment 13 Radek Vykydal 2022-07-21 09:16:20 UTC
(In reply to Adam Williamson from comment #12)
 
> How does this work with systemd-resolved, actually? Is anaconda explicitly
> triggering it to set up its resolver config files in the post-install,
> before-reboot environment somehow, or does it do it 'naturally'?

systemd-resolved is creating the symlink in post installation script

Comment 14 Radek Vykydal 2022-07-21 09:29:43 UTC
I am not able to reproduce the issue with rawhide. If I install with

%packages --ignoremissing
-systemd-resolved
%end

/etc/resolv.conf is missing in the chroot (/mnt/sysimage) but name resolution seems to be working.

I am able to reproduce with ELN.

Comment 15 Radek Vykydal 2022-07-21 11:40:29 UTC
(In reply to Radek Vykydal from comment #14)

> I am able to reproduce with ELN.

A workaround (or documented hint) could be this %post script:

%post --nochroot
if [ ! -e /mnt/sysimage/etc/resolf.conf ]; then
  cp -P /etc/resolv.conf /mnt/sysimage/etc/resolv.conf
fi
%end

That would copy symlink created by resolved in installer environment to the chroot.

Theoretically, Anaconda could do that before running %post scripts.
But we need also take care to remove the /mnt/sysimage/etc/resolv.conf at the end of installation (in another %post script) so that NM can set up /etc/resolv.conf after reboot (I think finding the symlink would prevent NM from doing that).

Which I doubt is something Anaconda should do on its own.

And before figuring out some fragile logic for making /etc/resolv.conf available in chroot we should probably make clear why the name resolution works in rawhide even without /etc/resolv.conf in chroot (comment#14).

Comment 16 Adam Williamson 2022-07-21 16:04:51 UTC
If you manually `chroot /mnt/sysimage` after install is complete, that's not the same as how %post scripts are run - I think in that case, the resolved that's running in the main anaconda environment is still available for name resolution.

So, try doing what the reproducer kickstart does: have a %post that tries to use curl. For me, that always reproduces the problem. Have it do e.g. `curl -o /tmp/index.html https://www.google.com' and check if /tmp/index.html gets downloaded properly.

Comment 17 Radek Vykydal 2022-07-22 10:59:16 UTC
(In reply to Adam Williamson from comment #16)
> If you manually `chroot /mnt/sysimage` after install is complete, that's not
> the same as how %post scripts are run - I think in that case, the resolved
> that's running in the main anaconda environment is still available for name
> resolution.
> 
> So, try doing what the reproducer kickstart does: have a %post that tries to
> use curl. For me, that always reproduces the problem. Have it do e.g. `curl
> -o /tmp/index.html https://www.google.com' and check if /tmp/index.html gets
> downloaded properly.

So it must be something about the network / dns setup I am in.

For rawhide with systemd-resolved excluded (and the symlink missing) I am getting this %post script log:


======== calling resolvectl
Global
       Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (ens3)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.43.136.2
       DNS Servers: 10.43.136.2
        DNS Domain: anaconda.englab.brq.redhat.com
======== calling ping -c 1 fedoraproject.org
PING fedoraproject.org(2620:52:3:1:dead:beef:cafe:fed6 (2620:52:3:1:dead:beef:cafe:fed6)) 56 data bytes
From 2620:52:0:2b88::fe (2620:52:0:2b88::fe) icmp_seq=1 Destination unreachable: Address unreachable

--- fedoraproject.org ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

======== calling ping -4 -c 1 fedoraproject.org
PING  (67.219.144.68) 56(84) bytes of data.
64 bytes from 67.219.144.68 (67.219.144.68): icmp_seq=1 ttl=51 time=108 ms

---  ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 108.185/108.185/108.185/0.000 ms
======== calling curl -o /tmp/index.html https://www.google.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M100 14528    0 14528    0     0  58347      0 --:--:-- --:--:-- --:--:-- 58580


For eln (0624) I am getting this %post script log:


======== calling resolvectl
Global
       Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (ens3)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.43.136.2
       DNS Servers: 10.43.136.2
        DNS Domain: anaconda.englab.brq.redhat.com
======== calling ping -c 1 fedoraproject.org
ping: fedoraproject.org: Temporary failure in name resolution
======== calling ping -4 -c 1 fedoraproject.org
ping: fedoraproject.org: Temporary failure in name resolution
======== calling curl -o /tmp/index.html https://www.google.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: www.google.com


With the %post script workaroud from comment #15 I am getting the same results for eln as for rawhide.

Comment 18 Ben Cotton 2022-08-09 13:19:29 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle.
Changing version to 37.

Comment 19 Radek Vykydal 2022-10-13 06:49:13 UTC
We've documented the issue and workaround in https://github.com/rhinstaller/anaconda/pull/4374.

Comment 20 Vendula Poncova 2022-11-01 14:36:36 UTC
Please, follow the workaround from the comment 19.


Note You need to log in before you can comment on or make changes to this bug.