Description of problem: Kickstart %post scripts that need to resolve hostnames fail because the resolv.conf file is missing from the /mnt/sysimage chroot environment: [anaconda root@kvm-03-guest03 ~]# ls -l /mnt/sysimage/etc/resolv.conf ls: cannot access '/mnt/sysimage/etc/resolv.conf': No such file or directory With Beaker internally at Red Hat, this causes kickstart to go into a loop since the %post script cannot tell Beaker that kickstart has finished. That is, a %post script runs: curl http://beaker:8000/postinstall_done/12345 where 12345 is the recipe ID, and Beaker then adjusts the PXE grub config so the system will boot from disk after kickstart. The problem is larger than Beaker kickstarts, however. For example, even trying to add a 3rd party dnf repo and install some extra packages in a %post script will fail since it cannot resolve any hostnames. This was previously reported in bug 2032085 which was fixed and closed, but it seems to have cropped up again. Version-Release number of selected component (if applicable): Fedora rawhide as of 2022-06-27 How reproducible: always Steps to Reproduce: 1. kickstart with a %post script that uses DNS, e.g. %post ping -c 1 fedoraproject.org %end Actual results: ping cannot resolve fedoraproject.org Expected results: DNS hostnames resolve in %post scripts Additional info:
Example serial console output from Fedora-ELN-20220624.3 with sanitized hostnames. Note, the 'fetch' command is a bash function/wrapper around 'curl'. + fetch /usr/local/sbin/anamon http://$LAB_CONTROLLER/beaker/anamon3 + curl -L --remote-time -o /usr/local/sbin/anamon http://$LAB_CONTROLLER/beaker/anamon3 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: $LAB_CONTROLLER ... ... ... + dnf -y install restraint-rhts beaker-AppStream 0.0 B/s | 0 B 00:00 Errors during downloading metadata for repository 'beaker-AppStream': - Curl error (6): Couldn't resolve host name for http://$MIRROR/released/fedora/ELN-rawhide/Fedora-ELN-20220624.3/AppStream/x86_64/os/repodata/repomd.xml [Could not resolve host: $MIRROR] Error: Failed to download metadata for repo 'beaker-AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried ... ... ...
Note: This might be a duplicate of bug 2074083
I wonder if this has the same cause as https://bugzilla.redhat.com/show_bug.cgi?id=2100883 . It's not the same as 2074083.
Hmm, /etc/resolv.conf seems to be created if systemd-resolved.rpm is installed. If the symlink is fully missing, then it seems the package is not installed. Are the full logs from that failed build available somewhere?
Unless I'm missing something, isn't the %packages in the kickstart for the job odd? It looks like it's just this: %packages --ignoremissing # Task requirements will be installed by the harness # no snippet data for packages chrony %end it doesn't pull in core at all. That looks like it'll only include chrony, anything chrony happens to depend on, and anything anaconda pulls into the installed system as needed for filesystems, bootloader etc.?
Oh, it seems core is included by default even if not specified, okay. But the problem does indeed seem to be that systemd-resolved is not pulled in. The install logs do not show it at all. That is odd, since it's listed as default in core, at least in comps-f37; I'm not sure what comps ELN uses.
So, comps-eln.xml.in does not include systemd-resolved, which explains why we're not getting it in an ELN install. Are you sure this is affecting Rawhide? Obviously, resolv.conf should be handled properly if systemd-resolved isn't installed, so there's a bug here. I'm just trying to pin down the angles.
Now that you mention it, Rawhide does seem to be ok. I ran a lot of Beaker jobs trying to debug this and probably confused myself. :) It seems this is limited to Fedora ELN at the moment.
Yeah. So, I can reproduce on Rawhide if I do an install without systemd-resolved. Things are completely different in Rawhide from how they were in F35/F36 (and all the stuff discussed/described in https://bugzilla.redhat.com/show_bug.cgi?id=2074083 : in Rawhide, anaconda has just completely ditched the code for doing things to `/etc/resolv.conf` in the installed system. It does not try and do anything to it any more, it leaves it entirely up to the installed packages to handle it. So in the case of systemd-resolved not being installed, I expect the idea is that NetworkManager should handle it. And NetworkManager *does* handle it...on boot of the installed system. But nothing seems to put one in place for the installer %post phase if systemd-resolved isn't installed. I think this case was not considered in reviewing the anaconda change: https://github.com/rhinstaller/anaconda/pull/3818 . So we probably need to ask anaconda team to re-consider and think about how this case should be handled, especially if RHEL does not intend to adopt resolved by default.
We have the %posttrans scriptlet to handle the symlink in systemd-resolved.rpm. We could add similar code to e.g. systemd-libs to create something different when systemd-resolved is *not* enabled. Though I'm not sure if we'd know what to put there.
I don't think it's really systemd's "job" to handle this if we're not using systemd-resolved. It'd rather be NetworkManager's, but NetworkManager isn't started in this environment. How does this work with systemd-resolved, actually? Is anaconda explicitly triggering it to set up its resolver config files in the post-install, before-reboot environment somehow, or does it do it 'naturally'?
(In reply to Adam Williamson from comment #12) > How does this work with systemd-resolved, actually? Is anaconda explicitly > triggering it to set up its resolver config files in the post-install, > before-reboot environment somehow, or does it do it 'naturally'? systemd-resolved is creating the symlink in post installation script
I am not able to reproduce the issue with rawhide. If I install with %packages --ignoremissing -systemd-resolved %end /etc/resolv.conf is missing in the chroot (/mnt/sysimage) but name resolution seems to be working. I am able to reproduce with ELN.
(In reply to Radek Vykydal from comment #14) > I am able to reproduce with ELN. A workaround (or documented hint) could be this %post script: %post --nochroot if [ ! -e /mnt/sysimage/etc/resolf.conf ]; then cp -P /etc/resolv.conf /mnt/sysimage/etc/resolv.conf fi %end That would copy symlink created by resolved in installer environment to the chroot. Theoretically, Anaconda could do that before running %post scripts. But we need also take care to remove the /mnt/sysimage/etc/resolv.conf at the end of installation (in another %post script) so that NM can set up /etc/resolv.conf after reboot (I think finding the symlink would prevent NM from doing that). Which I doubt is something Anaconda should do on its own. And before figuring out some fragile logic for making /etc/resolv.conf available in chroot we should probably make clear why the name resolution works in rawhide even without /etc/resolv.conf in chroot (comment#14).
If you manually `chroot /mnt/sysimage` after install is complete, that's not the same as how %post scripts are run - I think in that case, the resolved that's running in the main anaconda environment is still available for name resolution. So, try doing what the reproducer kickstart does: have a %post that tries to use curl. For me, that always reproduces the problem. Have it do e.g. `curl -o /tmp/index.html https://www.google.com' and check if /tmp/index.html gets downloaded properly.
(In reply to Adam Williamson from comment #16) > If you manually `chroot /mnt/sysimage` after install is complete, that's not > the same as how %post scripts are run - I think in that case, the resolved > that's running in the main anaconda environment is still available for name > resolution. > > So, try doing what the reproducer kickstart does: have a %post that tries to > use curl. For me, that always reproduces the problem. Have it do e.g. `curl > -o /tmp/index.html https://www.google.com' and check if /tmp/index.html gets > downloaded properly. So it must be something about the network / dns setup I am in. For rawhide with systemd-resolved excluded (and the symlink missing) I am getting this %post script log: ======== calling resolvectl Global Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: stub Link 2 (ens3) Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported Current DNS Server: 10.43.136.2 DNS Servers: 10.43.136.2 DNS Domain: anaconda.englab.brq.redhat.com ======== calling ping -c 1 fedoraproject.org PING fedoraproject.org(2620:52:3:1:dead:beef:cafe:fed6 (2620:52:3:1:dead:beef:cafe:fed6)) 56 data bytes From 2620:52:0:2b88::fe (2620:52:0:2b88::fe) icmp_seq=1 Destination unreachable: Address unreachable --- fedoraproject.org ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms ======== calling ping -4 -c 1 fedoraproject.org PING (67.219.144.68) 56(84) bytes of data. 64 bytes from 67.219.144.68 (67.219.144.68): icmp_seq=1 ttl=51 time=108 ms --- ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 108.185/108.185/108.185/0.000 ms ======== calling curl -o /tmp/index.html https://www.google.com % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed ^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M100 14528 0 14528 0 0 58347 0 --:--:-- --:--:-- --:--:-- 58580 For eln (0624) I am getting this %post script log: ======== calling resolvectl Global Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: stub Link 2 (ens3) Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported Current DNS Server: 10.43.136.2 DNS Servers: 10.43.136.2 DNS Domain: anaconda.englab.brq.redhat.com ======== calling ping -c 1 fedoraproject.org ping: fedoraproject.org: Temporary failure in name resolution ======== calling ping -4 -c 1 fedoraproject.org ping: fedoraproject.org: Temporary failure in name resolution ======== calling curl -o /tmp/index.html https://www.google.com % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed ^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: www.google.com With the %post script workaroud from comment #15 I am getting the same results for eln as for rawhide.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle. Changing version to 37.
We've documented the issue and workaround in https://github.com/rhinstaller/anaconda/pull/4374.
Please, follow the workaround from the comment 19.