Bug 1710699 - the default /etc/resolv.conf provided by systemd in mock doesn't work immediately
Summary: the default /etc/resolv.conf provided by systemd in mock doesn't work immedia...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-16 06:28 UTC by Pavel Raiskup
Modified: 2020-05-26 18:41 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-26 18:41:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Pavel Raiskup 2019-05-16 06:28:18 UTC
When I start fresh VM (copr builder), and do (for the first time on the VM):

$ mock --enable-network -r fedora-rawhide-x86_64 --shell
...
<mock-chroot> sh-5.0# curl https://copr.fedorainfracloud.org/
curl: (6) Could not resolve host: copr.fedorainfracloud.org
<mock-chroot> sh-5.0# cat /etc/resolv.conf
# This file belongs to man:systemd-resolved(8). Do not edit.
... snip comments ...
nameserver 127.0.0.53
options edns0

After some time, and the previous resolution attempt is required, name
resolution starts to work.

Mock is configured to use systemd nspawn.

Name resolution on host (the VM) works (configured by network manager):
$ cat /etc/resolv.conf 
# Generated by NetworkManager
search openstacklocal copr-builder-783619145.novalocal
nameserver 140.<snip>
nameserver 66.<snip>

So I suppose that the name resolution done by systemd-resolved is
triggered by the the first attempt to resolve some address, and then it
starts working after some time, right?

Comment 1 Pavel Raiskup 2019-05-16 07:32:08 UTC
I'm not sure it matters, but the output from
'systemd-resolve --status' (on host VM, not in mock chroot)
doesn't change (I mean the time the name resolution
doesn't work yet vs. the time when it works fine).

If that matters, when I reinitialize the chroot (or
initialize a different one) the problem doesn't happen
again.

Original problem:
https://pagure.io/copr/copr/issue/747

Comment 2 Zbigniew Jędrzejewski-Szmek 2020-04-16 09:22:53 UTC
Sorry for the slow reply.

Frankly, I have no idea why this would happen. Based on this comment
> After some time, and the previous resolution attempt is required, name
> resolution starts to work.
it seems that systemd-resolved on the host is not reachable when the container is brought
up, and becomes reachable later. I cannot reproduce this. In fact,
mock --enable-network -r fedora-rawhide-x86_64 --shell 'curl https://copr.fedorainfracloud.org/'
always works for me.

> So I suppose that the name resolution done by systemd-resolved is
> triggered by the the first attempt to resolve some address, and then it
> starts working after some time, right?

Not really. First, systemd-resolved should already be running on the host, since it is
started in early boot and mock much much later. Second, if if *were* triggered by some
request, everything should still work, in the sense that the resolution would be a bit
slower, but the answer would be the same. Third, the container is talking to systemd-resolved
over DNS, i.e. sending IP packets, and not local IPC as is done when the nss-resolve module
normally talks to systemd-resolved. And systemd-resolved is not socket-activated on port 53,
so such communication does not trigger a start of systemd-resolved. If it sends any replies,
it must have already been running before (as expected).

Based on all this, I think it something that filters the packets.

What is the version of systemd (on the host)?
'grep hosts: /etc/nsswitch.conf' (in the container)?
Could you run something like 'sudo tcpdump -i any port 53 or icmp' (on the host) to see if
packets are being sent and received?

Comment 3 Lennart Poettering 2020-04-16 11:23:01 UTC
How precisely does mock set up the environments? Does it copy the host /etc/resolv.conf in? Does it boot the container up (so that PID 1 is systemd and resolved runs)? Does it use network namespacing? Is this in a chroot of some kind, or is this some systemd-nspawn invocation (if the latter, please provide the full systemd-nspawn command line).

Comment 4 Zbigniew Jędrzejewski-Szmek 2020-04-16 11:31:53 UTC
> Does it copy the host /etc/resolv.conf in?
That's what seems to be happening, but a confirmation would be good.

> Does it boot the container up (so that PID 1 is systemd and resolved runs)?
No.

> Does it use network namespacing?
No (with --enable-network, as was used in this case).

> Is this in a chroot of some kind, or is this some systemd-nspawn invocation (if the latter, please provide the full systemd-nspawn command line).

The latter. Something like this:
$ /usr/bin/systemd-nspawn -q -M 4bc76243619a4f579654a738e8314217 -D /var/lib/mock/fedora-rawhide-x86_64/root -a --capability=cap_ipc_lock --bind=/tmp/mock-resolv.kr9rjik1:/etc/resolv.conf --bind=/dev/loop-control --bind=/dev/loop0 --bind=/dev/loop1 --bind=/dev/loop2 --bind=/dev/loop3 --bind=/dev/loop4 --bind=/dev/loop5 --bind=/dev/loop6 --bind=/dev/loop7 --bind=/dev/loop8 --bind=/dev/loop9 --bind=/dev/loop10 --bind=/dev/loop11 --setenv=TERM=vt100 --setenv=SHELL=/bin/bash --setenv=HOME=/builddir --setenv=HOSTNAME=mock --setenv=PATH=/usr/bin:/bin:/usr/sbin:/sbin --setenv=PROMPT_COMMAND=printf "\033]0;<mock-chroot>\007" --setenv=PS1=<mock-chroot> \s-\v\$  --setenv=LANG=en_US.UTF-8 /bin/sh -i -l

Comment 5 Pavel Raiskup 2020-04-16 12:13:25 UTC
> What is the version of systemd (on the host)?
> 'grep hosts: /etc/nsswitch.conf' (in the container)?
> Could you run something like 'sudo tcpdump -i any port 53 or icmp' (on the host) to see if
> packets are being sent and received?

This is pretty old bug, and I don't have such broken VM in hand
ATM.  This wasn't easy to play with because the problem appeared only
once right after booting the VM from image.  And we no more have
the same OpenStack environment (we spawn buidlers in AWS).  I'll
take a look once more, and try to answer those as soon as possible.

I applied this when I was facing the problem:
https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=8963287ddd7d51965b614860b17b2c7d5f5dbe89

> > Does it copy the host /etc/resolv.conf in?
> That's what seems to be happening, but a confirmation would be good.

Yes, we copy it inside if 'use_host_resolv' is True
(--enable-network does that).

Comment 6 Lennart Poettering 2020-04-16 13:29:57 UTC
> The latter. Something like this:
$ /usr/bin/systemd-nspawn -q -M 4bc76243619a4f579654a738e8314217 -D /var/lib/mock/fedora-rawhide-x86_64/root -a --capability=cap_ipc_lock --bind=/tmp/mock-resolv.kr9rjik1:/etc/resolv.conf --bind=/dev/loop-control --bind=/dev/loop0 --bind=/dev/loop1 --bind=/dev/loop2 --bind=/dev/loop3 --bind=/dev/loop4 --bind=/dev/loop5 --bind=/dev/loop6 --bind=/dev/loop7 --bind=/dev/loop8 --bind=/dev/loop9 --bind=/dev/loop10 --bind=/dev/loop11 --setenv=TERM=vt100 --setenv=SHELL=/bin/bash --setenv=HOME=/builddir --setenv=HOSTNAME=mock --setenv=PATH=/usr/bin:/bin:/usr/sbin:/sbin --setenv=PROMPT_COMMAND=printf "\033]0;<mock-chroot>\007" --setenv=PS1=<mock-chroot> \s-\v\$  --setenv=LANG=en_US.UTF-8 /bin/sh -i -l

Hmm, so this bind mounts /etc/resolv.conf from some temporary file on the host in /tmp/mock-resolv.kr9rjik1. What is in there? Is that copied from the host's /etc/resolv.conf? If so it might possibly copy a file with the reference to 127.0.0.53 as DNS server, which is likely not desirable and might be the problem here.

Note that recent systemd-nspawn versions have a switch --resolv-conf= which allows choosin different modes how to handle resolv.conf, maybe one of them is useful here?

This command line does not turn on netns, this means the host DNS stub should actually be reachable under 127.0.0.53, but maybe there's something wrong there, and talking to it doesn't actually work. (which i guess is what zbgniew is already saying). it might be good to track this down with tcpdump, or maybe by turning on SYSTEMD_LOG_LEVEL=debug in the systemd-resolved unit file on the host which should show information about any DNS requests incoming on the DNS stub. that we can at least see if the problem is communication from container to stub or communication the other way round...

Comment 7 Ben Cotton 2020-04-30 20:12:41 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Ben Cotton 2020-05-26 18:41:48 UTC
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.