I have a brand new install of `Silverblue Fedora 40` on a new laptop , i created a `fedora:f40` **toolbox** with `toolbox create` but **every dnf** or **yum** command hangs for a few minutes before successfully completing I installed `strace` and i can see it is hanging on `/run/systemd/resolve/io.systemd.Resolve` socket ``` futex(0x7f8a79e7d900, FUTEX_WAKE_PRIVATE, 2147483647) = 0 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_UNIX, sun_path="/run/systemd/resolve/io.systemd.Resolve"}, 42) = 0 sendto(3, "{\"method\":\"io.systemd.Resolve.Re"..., 90, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 90 brk(0x557406ed9000) = 0x557406ed9000 recvfrom(3, 0x557406e98760, 131080, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999960000}, NULL, 8 ``` and after a minute or 2 , it `times out` and **proceed successfully** to run the command ``` ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999960000}, NULL, 8) = 0 (Timeout) recvfrom(3, 0x557406e98760, 131080, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 close(3) = 0 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=13539, ...}) = 0 mmap(NULL, 13539, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8a7cddc000 close(3) = 0 openat(AT_FDCWD, "/lib64/libnss_myhostname.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=174416, ...}) = 0 mmap(NULL, 174360, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8a79e25000 mmap(0x7f8a79e28000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f8a79e28000 mmap(0x7f8a79e3e000, 49152, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f8a79e3e000 mmap(0x7f8a79e4a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f8a79e4a000 close(3) = 0 mprotect(0x7f8a79e4a000, 20480, PROT_READ) = 0 munmap(0x7f8a7cddc000, 13539) = 0 rt_sigprocmask(SIG_BLOCK, [HUP USR1 USR2 PIPE ALRM CHLD TSTP URG VTALRM PROF WINCH IO], [], 8) = 0 uname({sysname="Linux", nodename="toolbox", ...}) = 0 ``` Note that * This **only happens** with toolbox based on `f40` , i created ( and am currently using ) one based on `f39` and works just fine * There are no available update in the `f40` container * `dns resolution` and `systemd-resolved` **works just fine** both inside and outside the toolbox container * I tried disabling `selinux` but did not help ``` $ ls -la /run/systemd/resolve/io.systemd.Resolve srw-rw-rw-. 1 nobody nobody 0 Jun 8 15:59 /run/systemd/resolve/io.systemd.Resolve ⬢[@toolbox ~]$ resolvectl Global Protocols: LLMNR=resolve -mDNS +DNSOverTLS DNSSEC=yes/supported resolv.conf mode: stub Current DNS Server: 1.1.1.2#cloudflare-dns.com DNS Servers: 1.1.1.2#cloudflare-dns.com 1.0.0.2#cloudflare-dns.com Fallback DNS Servers: 8.8.8.8#dns.google 8.8.4.4#dns.google Link 3 (wlp0s20f3) Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 Protocols: +DefaultRoute LLMNR=resolve -mDNS +DNSOverTLS DNSSEC=yes/supported Current DNS Server: 192.168.100.1 DNS Servers: 192.168.100.1 DNS Domain: lan Link 4 (enp85s0) Current Scopes: none Protocols: -DefaultRoute LLMNR=resolve -mDNS +DNSOverTLS DNSSEC=yes/supported Link 5 (docker0) Current Scopes: none Protocols: -DefaultRoute LLMNR=resolve -mDNS +DNSOverTLS DNSSEC=yes/supported $ resolvectl query mirrors.fedoraproject.org mirrors.fedoraproject.org: 2600:1f14:fad:5c02:7c8a:72d0:1c58:c189 -- link: wlp0s20f3 2600:2701:4000:5211:dead:beef:fe:fed3 -- link: wlp0s20f3 2604:1580:fe00:0:dead:beef:cafe:fed1 -- link: wlp0s20f3 2605:bc80:3010:600:dead:beef:cafe:fed9 -- link: wlp0s20f3 2620:52:3:1:dead:beef:cafe:fed6 -- link: wlp0s20f3 2620:52:3:1:dead:beef:cafe:fed7 -- link: wlp0s20f3 8.43.85.67 -- link: wlp0s20f3 8.43.85.73 -- link: wlp0s20f3 34.221.3.152 -- link: wlp0s20f3 38.145.60.20 -- link: wlp0s20f3 38.145.60.21 -- link: wlp0s20f3 67.219.144.68 -- link: wlp0s20f3 140.211.169.196 -- link: wlp0s20f3 152.19.134.142 -- link: wlp0s20f3 152.19.134.198 -- link: wlp0s20f3 (wildcard.fedoraproject.org) -- Information acquired via protocol DNS in 144.7ms. -- Data is authenticated: yes; Data was acquired via local or encrypted transport: yes -- Data from: network ``` Reproducible: Always Steps to Reproduce: 1. Install silverblue f40 2. Create a Toolbox from `fedora f40` 3. run dnf update 4. wait Expected Results: dnf commands works
Interesting. Just to be sure. A container created from fedora-toolbox:40 on a Fedora 40 host shows this problem, but a container from fedora-toolbox:39 on a Fedora 40 host doesn't show this problem. Right?
That is correct. I forgot to update that I found this change in nsswitch.conf that seems to address the problem. The linked issue was not about toolbox but it does fix the problem inside toolbox as well https://discussion.fedoraproject.org/t/dnf-and-firefox-take-extreemly-long-to-start-when-vpn-active-on-f40/114604/4
(In reply to Francesco Ciocchetti from comment #2) > That is correct. > > I forgot to update that I found this change in nsswitch.conf that seems to > address the problem. The linked issue was not about toolbox but it does fix > the problem inside toolbox as well > > https://discussion.fedoraproject.org/t/dnf-and-firefox-take-extreemly-long- > to-start-when-vpn-active-on-f40/114604/4 Thanks for digging that up! I can track the change to the hosts database configuration in /etc/nsswitch.conf to this upstream pull request: https://github.com/authselect/authselect/pull/366 ... which was added to Fedora >= 40 through these commits: https://src.fedoraproject.org/rpms/authselect/c/714bad65d2a09836ba84911ea5f6c6b011f2c480 https://src.fedoraproject.org/rpms/authselect/c/f411c0ecd9eef866fbe13e710867a3fcebaaf87d ... and was discussed in: https://bugzilla.redhat.com/show_bug.cgi?id=2257197
Could you confirm that you are also experiencing this problem when using VPNs, as mentioned in: https://discussion.fedoraproject.org/t/dnf-and-firefox-take-extreemly-long-to-start-when-vpn-active-on-f40/114604
Toolbx doesn't touch /etc/nsswitch.conf. It's used as is the default on Fedora. Given that this behaviour is observed both on Fedora 40 hosts and containers made from the fedora-toolbox:40 images, I am confident that this has nothing to do with Toolbx. Reassigning to authselect.
These are unfortunate hacks. I think primary problem should be solved by a good caching instead of synthetizing non-existent records by myhostname plugin. That plugin prevents obtaining working name available at the network itself. Because what myhostname plugin delivers is visible only at the host, but probably is not resolvable on the other hosts of this network. It never contains also full hostname including domain. But programs doing reverse address queries are typically attempting to obtain a name, under which other hosts can see it. There are roughly 2 different cases: 1) showing network traffic with names, where I want to use hostnames to display it in user-friendly way. myhostname works great for this case. 2) finding under which name other hosts would resolve me. This is what smtp servers try, what hostname -A should provide. Ie. I do not want to see a name synthetized by my host, but seen by network. myhostname plugin prevents this, when used before a real resolution plugins like "dns". I do not think there is simple fix possible with what we have now. The same API is used for both cases, without any flag possible to make clear indication what is desired. But the question is why dnf needs reverse resolution in any case. Somehow I do not think that should be necessary for any dnf operation. I think we may want getnameinfo(3) new flag, which might suppress synthetized names provided by myhostname plugin. To be able to resolve real addresses visible from network, which things like hostname -A or mail server daemons need. In ideal case local DNS cache should be tried first and myhostname would get consulted only in case local network DNS did not provide any name or even timeouted. To prevent timeout on each query, it would remember with TTL of minutes that the record does not work and follow to myhostname. Unless we have clear use cases, it is not simple to fix. We could also tune timeout to lower value when name provided by myhostname plugin were queried. But nss plugins architecture does not allow it to be resolved a simple way.
Good way to provide feedback is using getent -s parameter, which allows just selected plugins to be used. On decent network with good connectivity, it should not take long to query names. If it does, there is some part misconfigured with local DNS. What will be output of this command on that network? for plugin in myhostname resolve dns; do echo "# $plugin"; time getent -s $plugin ahosts $HOSTNAME; done Or better with different databases too: for plugin in myhostname resolve dns; do for DB in ahosts ahostsv4 ahostsv6; do echo "# $plugin; $DB"; time getent -s $plugin $DB $HOSTNAME; done; done Sometimes IPv6 queries cause visible timeouts, because local DNS cache is not well configured. Can it be dnsmasq? But since DNS over TLS is forced on, do all of those servers respond to DoT? Can servers be queried, whether they can respond? $ dig -d @192.168.100.1 +tls $HOSTNAME
Is it possible that resolved fallback servers need to kick in, because wifi resolver does not support DNS over TLS and it timeouts, before fallback server gets tried? Also enabled DNSSEC might cause issues if DNSSEC specific queries cause timeouts instead positive or negative queries. I am not sure how exactly this should be visible in resolvectl output. I mean, later query to resolvectl gets reasonably fast. -- Information acquired via protocol DNS in 144.7ms. is not slow, not to become noticable.
Thank you Petr. This is very helpful. (In reply to Petr Menšík from comment #6) > These are unfortunate hacks. I think primary problem should be solved by a > good caching instead of synthetizing non-existent records by myhostname > plugin. That plugin prevents obtaining working name available at the network > itself. Because what myhostname plugin delivers is visible only at the host, > but probably is not resolvable on the other hosts of this network. It never > contains also full hostname including domain. systemd-resolved will also synthesize a record for the local hostname, so it's probably coming from nss-resolve rather than nss-myhostname. i.e. when systemd-resolved is running, nss-resolve will always return a result; the other NSS modules will not be used and are only there as a fallback for when systemd-resolved is disabled. Therefore, surely nss-myhostname is never used (after the change in bug #2257197) and the synthesized result is not actually coming from nss-myhostname anymore. A good short term workaround would be to revert the change from bug #2257197 to get dnf and Firefox working properly again, since that is more important than hostname --fqdn, but that is not a good long term fix. In the long term, we should probably keep nss-myhostname at the end and move nss-mdns4_minimal instead. But this is easier said than done, because systemd-resolved doesn't seem to handle mDNS properly, bug #1867830. > I think we may want getnameinfo(3) new flag, which might suppress > synthetized names provided by myhostname plugin. To be able to resolve real > addresses visible from network, which things like hostname -A or mail server > daemons need. Hm, there is already NI_NOFQDN though. Could we use that? > In ideal case local DNS cache should be tried first and myhostname would get > consulted only in case local network DNS did not provide any name or even > timeouted. To prevent timeout on each query, it would remember with TTL of > minutes that the record does not work and follow to myhostname. Unless we > have clear use cases, it is not simple to fix. We could also tune timeout to > lower value when name provided by myhostname plugin were queried. But nss > plugins architecture does not allow it to be resolved a simple way. I think it's designed on the assumption that the local hostname is more trusted than DNS and should always resolve to the local computer. Changes would probably need to be compatible with that. But maybe something similar could work. We'd need to discuss with upstream. The NSS plugin architecture shouldn't be a problem because everything is happening inside systemd-resolved.
Hmm, I am finding something rotten with dig +search +showsearch host, after I have set lan suffix by command: sudo resolvectl domain enp1s0 lan When I do dig, it times out instead of giving at least cached response. It sends query to local nameserver over DoT protocol, which at least my resolver does not support. Can it be also case for original reporter? Confirmed by running tcpdump on background: sudo tcpdump -n port llmnr or port domain or port domain-s & I get a lot of lines like: 09:42:51.900967 IP 192.168.122.19.44602 > 192.168.122.1.domain-s: Flags [S], seq 1017153061, win 32120, options [mss 1460,sackOK,TS val 1259108054 ecr 0,nop,wscale 7,tfo cookiereq,nop,nop], length 0 And indeed, it takes long before it fails. It does not behave well and especially status report in resolvectl does not indicate clearly, which server is actually responding and which is not. This is caused I think by global DNSoverTLS is enabled on all interfaces. But no status of "Current DNS Server" is provided, whether it responds over specified protocols or not. I have put into /etc/systemd/resolved.conf: [Resolve] # Some examples of DNS servers which may be used for DNS= and FallbackDNS=: # Cloudflare: 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com 2606:4700:4700::1111#cloudflare-dns.com 2606:4700:4700::1001#cloudflare-dns.com # Google: 8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.google 2001:4860:4860::8844#dns.google # Quad9: 9.9.9.9#dns.quad9.net 149.112.112.112#dns.quad9.net 2620:fe::fe#dns.quad9.net 2620:fe::9#dns.quad9.net DNS=1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com FallbackDNS=8.8.8.8#dns.google 8.8.4.4#dns.google #Domains= DNSSEC=yes DNSOverTLS=yes $ time getent ahosts host8 real 0m18.467s user 0m0.001s sys 0m0.004s Until I have disabled DNSoverTLS at my local device, resolution of hosts without dots does not work at all. Yes, takes a lot of retries at 853 port. Which in my case is not protected by direwall and fails immediately. If that port would cause timeout, it might be a lot longer failure. When I did sudo resolvectl dnsovertls enp1s0 off, responses become immediately done. But I did also disable LLMNR. $ resolvectl Global Protocols: LLMNR=resolve -mDNS +DNSOverTLS DNSSEC=yes/supported resolv.conf mode: stub Current DNS Server: 1.1.1.1#cloudflare-dns.com DNS Servers: 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com Fallback DNS Servers: 8.8.8.8#dns.google 8.8.4.4#dns.google Link 2 (enp1s0) Current Scopes: DNS Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=yes/supported Current DNS Server: 192.168.122.1 DNS Servers: 192.168.122.1 DNS Domain: lan
Also I have seen unexpected plaintext query names on Link resolver address. It does not seem to send DNSSEC queries to expected global configuration, but sends them also to 192.168.122.1 server. I have expected only *.lan. names would be sent to it, but does not seem to be case for systemd-resolved-255.7-1.fc40.x86_64. I know systemd guys do not use DNSSEC and do not recommend it. I am seeing both traffic on TLS addresses 1.1.1.1 and plaintext 192.168.122.1, including DNSSEC records not for *.lan. I am not sure how it is supposed to work with configuration from comment 10, but I would have expected different behaviour. This does not seem a good way to configure global DoT servers in combination with local plaintext server for local only records. Because it leaks queries, which should have stayed protected by TLS (IMHO). Sadly, global DNS in Network Manager configuration does not allow TLS enabled by default. It does not seem this has better configuration available now.
If privacy is desired, I would recommend turning off LLMNR resolution. It should be possible to disable it per NM profile or globally. Should speed up resolution of non-existent names without dot, bare names. $ nmcli c show NAME UUID TYPE DEVICE Wired connection 1 5a99be05-107a-3c54-a21b-d275cca70b0c ethernet enp1s0 lo 6c56b7d0-efa8-4da8-a913-cbfed360efa6 loopback lo $ nmcli c edit 5a99be05-107a-3c54-a21b-d275cca70b0c set connection.dns-over-tls no set connection.llmnr no set ipv4.dns-search lan save activate And have /etc/systemd/resolved.conf: [Resolve] # Some examples of DNS servers which may be used for DNS= and FallbackDNS=: # Cloudflare: 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com 2606:4700:4700::1111#cloudflare-dns.com 2606:4700:4700::1001#cloudflare-dns.com # Google: 8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.google 2001:4860:4860::8844#dns.google # Quad9: 9.9.9.9#dns.quad9.net 149.112.112.112#dns.quad9.net 2620:fe::fe#dns.quad9.net 2620:fe::9#dns.quad9.net DNS=1.1.1.2#cloudflare-dns.com 1.0.0.2#cloudflare-dns.com FallbackDNS=8.8.8.8#dns.google 8.8.4.4#dns.google #Domains= DNSSEC=yes DNSOverTLS=yes LLMNR=no It seems with disabled DefaultRoute on link, it will indeed send everything to global TLS servers, only *.lan to local plaintext resolver. I am not sure how it is supposed to be configured permanently in resolved.conf in combination with Network Manager.
Problem is you have global DNS configured *and* the link-specific DNS configured by NetworkManager. That is almost always wrong because it's going to generate two requests. I think systemd-resolved is stupid to interpret the configuration this way and should really change its behavior somehow, but anyway, suffice to say you should stick to link-specific configuration only (unless you're going to disable NetworkManager).
There are name resolving issues even without VPN. In Toolbox, the hostname in /etc/hostname is "toolbox" and environment variable HOSTNAME is inherited (?) from outside the toolbox container - it is set to the real machine's hostname. Is this disparity a problem too? In Fedora 40 image, the priority change between myhostname and resolve has the consequence that container's hostname "toolbox" will not be resolved in timely manner. This causes name resolving timeouts in any application that attempts to use the hostname in /etc/hostname. Most notable issue is that some X11 applications hang the whole Plasma desktop in Kinoite for a few seconds. In Fedora 39 toolbox images, the myhostname takes precedence and resolves the /etc/hostname's value "toolbox". Applications work correctly there. Command `resolvectl query toolbox` fails in both Fedora 40 and 39 toolbox images.
Problem with `resolvectl query toolbox` might happen because LLMNR is enabled in systemd-resolved, but no such host returns negative reply immediately. non-existent names take a long time to timeout on multicast protocols, at least 3 seconds. If unicast DNS with applied search domains is tried first, the answer should be fast, if the name with applied search exists in DNS. Question is whether unicast query can be fast in cases, when DNSoverTLS is desired but not responding. Be it because server is misconfigured, overloaded or broken. I am not sure whether systemd-resolved caches temporary unavailability of link-local server. Can original reporter provide also output of command: - resolvectl show-server-state - resolvectl statistics Unfortunately recent response time does not seem to be visible
(In reply to Michael Catanzaro from comment #13) > Problem is you have global DNS configured *and* the link-specific DNS > configured by NetworkManager. That is almost always wrong because it's going > to generate two requests. I think systemd-resolved is stupid to interpret > the configuration this way and should really change its behavior somehow, > but anyway, suffice to say you should stick to link-specific configuration > only (unless you're going to disable NetworkManager). I am afraid that is wrong only because it were not considered supported situation. But especially with DNS over TLS directed to public IP addresses, this would be exactly what I want. I want to use DNS over TLS for everything, except local-only domains accessible on local link servers. Because names like example.lan. cannot be resolved on global server, like Cloudflare or Google DNS. Unfortunately it is not simple to configure global servers used for everything, except local only domains on connections. You cannot define DNSoverTLS is desired on global connection, but different (default) settings for common interfaces. For example DNSoverTLS=yes for my global setting (where I can be very sure that is supported), but DNSoverTLS=opportunistic for link provided servers. (Where it might be supported, but often will not be. It should work, if Network Manager default were set to different value. That should be possible, when NetworkManager.conf contains: [connection] connection.dns-over-tls=1 # means opportunistic That should change default for link settings in NM, meaning it would provide own value. But still, I am not sure how to make it -DefaultRoute.
(In reply to Petr Menšík from comment #15) > Can original reporter provide also output of command: > > - resolvectl show-server-state > - resolvectl statistics > > Unfortunately recent response time does not seem to be visible These questions are still unanswered.
Oh sorry, I see Petr just asked for that a couple hours ago. :D
(In reply to Petr Menšík from comment #16) > I am afraid that is wrong only because it were not considered supported > situation. > > But especially with DNS over TLS directed to public IP addresses, this would > be exactly what I want. Yeah, your argument is persuasive. But that's also a tangent, not directly related to this bug report.
It seems my idea of global DNS over TLS server is described at upstream issue https://github.com/systemd/systemd/issues/33579 That requests different DefaultRoute handling. Since both systemd-resolved and myhostname nss plugin are part of systemd, should this issue be switched to systemd component? It would notify more maintainers than just Zbygniew. It seems to be issue is not caused by the order itself. Especially if not annoying seconds, but minutes are involved, there might be something else responsible. authselect is very likely not primary fix it needs. If reproduction steps work, it should be simple to reproduce with correct settings. I think in comment #10 I have made them. Wrong order of dns should add only default timeout and attempts at most, which is maximum 15 seconds for single resolution. Report does not contain specific number, but 1 minute is a lot for DNS. I think it needs both modified configuration AND better unresponsive TLS servers marking. systemd-resolved as a daemon should be able to wait longer, but should emit resolution failure sooner to client.
It's worth noting that there are two separate issues here: A) Negative lookups being slow to time out (which happens on my system too, no VPN involved - I'm happy to provide details, but it's just a pretty standard wifi connection to a home network) B) The hostname value inside toolbox should resolve, not time out as a negative lookup B) could be solved by going back to preferring nss-myhostname over nss-resolve or by making nss-resolve somehow behave correctly in a UTS namespace with a hostname different than the hostname seen my systemd-resolved.
Correct, it works quite fine on my system, because I have disabled systemd-resolved. Therefore myhostname is before dns. hostname resolved from container is different from what systemd-resolved sees itself. Assumption it caches what myhostname would have to do process does not work in those cases. Because it caches let's say fedora.lan, but toolbox is using toolbox.lan. resolve plugin would either need to current hostname as part of the request - Created systemd upstream issue: https://github.com/systemd/systemd/issues/33870 But one resolution with broken TLS takes way too long. resolvectl domain eth0 lan # I know my resolver does not resolve dnsovertls resolvectl dnsovertls eth0 yes - Created systemd upstream issue https://github.com/systemd/systemd/issues/33871 I do not think this should be fixed anywhere else. Yes, we have two parts, both fixable on systemd side better way IMO.
Should I revert the patch in authselect or should I keep it and we'll wait for systemd-resolved to be fixed?
(In reply to Pavel Březina from comment #23) > Should I revert the patch in authselect or should I keep it and we'll wait > for systemd-resolved to be fixed? I don't think that's going to be quick.
Does the long timeout happen when DNS over TLS is not enabled? That is a non-default configuration *specifically* because we know fallback doesn't work well when DoT is not supported by the DNS server. That's why the change proposal https://fedoraproject.org/wiki/Changes/DNS_Over_TLS failed. I'm asking not because it's OK for this to be broken, but because if it's only broken in a non-default configuration then we probably don't want to make major design changes (authselect moving NSS modules around) as a workaround.
Systemd upstream does not think the problem is at their side. Filled also issue on toolbox, maybe there is something to adjust there. https://github.com/containers/toolbox/issues/1528
*** Bug 2314175 has been marked as a duplicate of this bug. ***
There does not seem to be any action taken so far. What is your guidance? I can revert the patch in authselect, breaking hostname --fqdn again. But that seems like a reasonable thing to do instead of experiencing timeouts. I can apply the patch again, once this is fixed.
Whilst they are cooking up a longtime solution might I suggest introducing the "with-domain" option? If this option is given the behavior that is currently deployed is engaged where resolved is asked to resolve the fqdn. Otherwise and by default the option is not activated and all users which do not deploy the package in a domain environment are not sent running. Because at this point in time trying to get people to transition to distributions using this rpm package is impossible. Because of the tail of bugs that this change causes. Alternatively the deployment of a third party managed plugin, fixing this patch, is required to be deployed by administrators.
Unfortunately the bug is waiting on the bug reporter to provide information, but the reporter has disappeared. venanocta, since you're hitting the same issues, can you answer Petr's questions in comment #15 and my question in comment #25 please? (In reply to Pavel Březina from comment #28) > There does not seem to be any action taken so far. What is your guidance? If this is happening with DNS over TLS enabled, then I would do nothing, per my reasoning in comment #25. Otherwise, I suggest my strategy from comment #9: (In reply to Michael Catanzaro from comment #9) > A good short term workaround would be to revert the change from bug #2257197 > to get dnf and Firefox working properly again, since that is more important > than hostname --fqdn, but that is not a good long term fix. In the long > term, we should probably keep nss-myhostname at the end and move > nss-mdns4_minimal instead. But this is easier said than done, because > systemd-resolved doesn't seem to handle mDNS properly, bug #1867830.
So since you asked me to answer your questions, I have spent a little more time digging into the issue and found what might be the core of the problem but before that I want to explain my setup on which I am experiencing the issue. -- ANSWERS -- DNS over TLS => default = not configured VPN => not enabled, actually doesn't make a difference $ resolvectl show-server-state | Server: 10.10.255.254 | Type: link | Interface: rnet | Interface Index: 13 | Verified feature level: n/a | Possible feature level: TLS+EDNS0+DO | DNSSEC Mode: no | DNSSEC Supported: yes | Maximum UDP fragment size received: 512 | Failed UDP attempts: 0 | Failed TCP attempts: 0 | Seen truncated packet: no | Seen OPT RR getting lost: no | Seen RRSIG RR missing: no | Seen invalid packet: no | Server dropped DO flag: no | | Server: 192.168.100.254 | Type: link | Interface: enp67s0 | Interface Index: 3 | Verified feature level: UDP+EDNS0 | Possible feature level: UDP+EDNS0 | DNSSEC Mode: no | DNSSEC Supported: yes | Maximum UDP fragment size received: 512 | Failed UDP attempts: 0 | Failed TCP attempts: 0 | Seen truncated packet: no | Seen OPT RR getting lost: no | Seen RRSIG RR missing: no | Seen invalid packet: no | Server dropped DO flag: no $ resolvectl statistics | Transactions | Current Transactions: 0 | Total Transactions: 11602 | | Cache | Current Cache Size: 40 | Cache Hits: 6307 | Cache Misses: 6117 | | Failure Transactions | Total Timeouts: 652 | Total Timeouts (Stale Data Served): 0 | Total Failure Responses: 120 | Total Failure Responses (Stale Data Served): 0 | | DNSSEC Verdicts | Secure: 0 | Insecure: 0 | Bogus: 0 | Indeterminate: 0 -- SETUP -- The PC called 'workstation-linux' is a workstation based on the Threadripper 3960X and is connected through a ~3m 2.5Gb RJ45 connection to a router/firewall based on OPNsense (IPv6 is not configured / turned off). Since the workstation is dual boot capable the workstation has 2 hostnames: in Fedora = 'workstation-linux' & in Windows = 'workstation-win10'. Additionally, the OPNsense router has a static DHCP record set for the workstation to 'workstation.home.lan'. Furthermore, the router provides 2 Networks: 1) LAN: LAN is provided directly on the native interface. DNS: Unbound is accepting connections on the native interface DHCP: Gateway: 192.168.100.254 ( the router ip ) DNS Server: 192.168.100.254 ( the router ip ) DNS Domain: home.lan DNS Search Domain: home.lan, srv.lan (another VLAN with service hosts - not relevant) 2) RNET RNET is provided on VLAN 10 on the same interface as (1). DNS: Unbound is NOT accepting connections on VLAN 10 DHCP: Gateway: 10.10.255.254 ( the router ip ) DNS Server: ( not defined => interface ip by default = 10.10.255.254 ) DNS Domain: sector1.rnet DNS Search Domain: sector1.rnet, rnet -- TESTING -- For testing I ran 'resolvectl flush-caches' followed by the `time resolvectl query workstation-linux`.. .. a) with only connection (1) in NetworkManager enabled: $ time resolvectl query workstation-linux | workstation-linux: 192.168.100.1 -- link: enp67s0 | fe80::5303:3bac:39e5:be66%3 -- link: enp67s0 | | -- Information acquired via protocol DNS in 2.4ms. | -- Data is authenticated: yes; Data was acquired via local or encrypted transport: yes | -- Data from: synthetic | | real 0m0,008s | user 0m0,002s | sys 0m0,004s .. b) with connections (1) & (2) in NetworkManager enabled: # RNET (2) has Ipv6 disabled in NM! $ time resolvectl query workstation-linux | workstation-linux: resolve call failed: Connection timed out | | real 2m0,062s | user 0m0,001s | sys 0m0,004s -- THOUGHTS -- From my tests it seems that the timeout happens whenever a dns name is tried to be resolved on a connection where the DNS servers do not respond. Furthermore, after these results appeared I enabled the DNS server for the RNET (VLAN 10) and the issue disappeared with following result: c) $ time resolvectl query workstation-linux | workstation-linux: 10.10.128.6 -- link: rnet | (workstation-linux.sector1.rnet) | | -- Information acquired via protocol DNS in 1.9ms. | -- Data is authenticated: no; Data was acquired via local or encrypted transport: no | -- Data from: network | | real 0m0,007s | user 0m0,000s | sys 0m0,005s If I instead of enabling Unbound set the DNS server of the RNET connection in nmcli to 1.1.1.1 following happens: d-1) $ time resolvectl query workstation-linux | workstation-linux: resolve call failed: Lookup failed due to system error: No route to host | | real 0m16,927s | user 0m0,002s | sys 0m0,004s # after 'resolvectl flush-caches' (& time?): d-2) $ time resolvectl query workstation-linux | workstation-linux: 192.168.100.1 -- link: enp67s0 | 10.10.128.1 -- link: rnet | 10.10.128.6 -- link: rnet | fe80::5303:3bac:39e5:be66%3 -- link: enp67s0 | | -- Information acquired via protocol DNS in 68.8ms. | -- Data is authenticated: yes; Data was acquired via local or encrypted transport: yes | -- Data from: synthetic | | real 0m0,075s | user 0m0,001s | sys 0m0,005s From what I have seen this looks like an error of 'resolve' where it should fail fast when it detects there is no DNS server responding on the NM connection setting the domain entries. Nonetheless I see the problem where 'resolve' can't really tell the state of the dns server since dns is UDP based.
Well it's probably bad for your DNS servers to be nonresponsive. Surely that's your main problem. I'll just reiterate my suggestion from comment #9: revert for now, move nss-myhostname back to where it was before, 'hostname --fqdn' to remain broken until somebody can figure out how to fix this properly.
Ok, I'll revert it for now, I'm fine with it. But please, keep in mind, that the original order before systemd-resolved was made default was "hosts: files dns myhostname", hostname --fqdn worked and nobody experienced any timeouts. But I am moving myhostname back and forth since resolved was introduced, every time with blessing from systemd developers, and its buggy at either place. It would be really good if somebody could take resolved and start actively working on it or open a change page to remove it from the default configuration.
FEDORA-2024-02a5688338 (authselect-1.5.0-8.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2024-02a5688338
This is reverted in * rawhide: https://bodhi.fedoraproject.org/updates/FEDORA-2024-02a5688338 * F41: https://bodhi.fedoraproject.org/updates/FEDORA-2024-d7f0d7c65b * F40: https://bodhi.fedoraproject.org/updates/FEDORA-2024-d7caacc700 So the problem should be fixed now by the cost of breaking `hostname --fqdn` again. I would like to see this fixed properly though.
FEDORA-2024-02a5688338 (authselect-1.5.0-8.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report.
Reopening since the problem was not solved. Just mitigated in authselect.
(In reply to Pavel Březina from comment #35) > So the problem should be fixed now by the cost of breaking `hostname --fqdn` It works correctly for me: ``` > rpm -q authselect authselect-1.5.0-8.fc41.x86_64 > sudo authselect select local -f --nobackup Profile "local" was selected. > grep -e ^hosts: /etc/nsswitch.conf hosts: files myhostname resolve [!UNAVAIL=return] dns > sudo hostnamectl hostname fedora.example.org > hostname -s fedora > hostname --short fedora > hostname -f fedora.example.org > hostname --fqdn fedora.example.org ```
(In reply to Vladislav Grigoryev from comment #38) > (In reply to Pavel Březina from comment #35) > > So the problem should be fixed now by the cost of breaking `hostname --fqdn` > > It works correctly for me: > ``` > > rpm -q authselect > authselect-1.5.0-8.fc41.x86_64 > > > sudo authselect select local -f --nobackup > Profile "local" was selected. > > > grep -e ^hosts: /etc/nsswitch.conf > hosts: files myhostname resolve [!UNAVAIL=return] dns > > > sudo hostnamectl hostname fedora.example.org > > > hostname -s > fedora > > > hostname --short > fedora > > > hostname -f > fedora.example.org > > > hostname --fqdn > fedora.example.org > ``` Won't work if you set hostname to a shortname. In this case, --fqdn should look it up via reverse dns lookup which is however intersected by myhostname. See: https://bugzilla.redhat.com/show_bug.cgi?id=2257197
This message is a reminder that Fedora Linux 40 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '40'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 40 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
FWIW, this change breaks Postfix startup for me. I have /etc/postfix/main.cf inet_interface = foo.example.com (my hosts FQDN), and with NSS myhostname plugin in charge, getent ahosts $(hostname) returns also the link-local IPv6 address of the host - and Postfix cannot deal with it. I haven't investigated yet wether a) NSS myhostname should return link-local addresses, b) it does so with the relevant interface scope (otherwise meaningless), and c) Postfix is supposed to be able to handle link-local addresses, so unclear what behaviour is wrong here.
Actually, getent ahosts does indeed return the scope ID, so b) is answered. And looking at postfix's code, it seems that Postfix completely ignores address scoping, so can't really deal with link-local addresses. And seeing a similar problem Debian deal with in regard to ping6 vs myhostname in 2013, the myhostname author expects the link-local addresses to get reported. So to me, it looks like Postfix needs to better deal with link-local addresses. Either ignoring them, or properly handling them.