Bug 2148500 - mDNS .local address is not resolved if 'local' domain exists on the assigned DNS server
Summary: mDNS .local address is not resolved if 'local' domain exists on the assigned ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: nss-mdns
Version: 37
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Adam Goode
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-25 16:58 UTC by Michael K Johnson
Modified: 2023-09-19 04:30 UTC (History)
20 users (show)

Fixed In Version: nss-mdns-0.15.1-7.fc38
Clone Of:
Environment:
Last Closed: 2023-03-16 08:20:53 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
ipptool --ippserver printer.txt ipp://192.168.23.80:631/ipp/print /usr/share/cups/ipptool/get-printer-attributes.test (1.35 MB, text/plain)
2022-11-25 16:58 UTC, Michael K Johnson
no flags Details
/etc/authselect/nsswitch.conf (703 bytes, text/plain)
2022-11-29 15:02 UTC, Michael K Johnson
no flags Details
requested ping strace (50.87 KB, text/plain)
2022-12-02 02:40 UTC, Michael K Johnson
no flags Details
journalctl output during ping and avahi-browse (11.29 KB, text/plain)
2022-12-02 02:41 UTC, Michael K Johnson
no flags Details
tcpdump -i any -w ping.pcap -s0 port 53 (282 bytes, application/octet-stream)
2022-12-02 20:05 UTC, Michael K Johnson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Fedora Package Sources nss-mdns pull-request 10 0 None None None 2023-01-16 14:53:39 UTC
Fedora Package Sources nss-mdns pull-request 9 0 None None None 2023-01-16 14:07:25 UTC
Github lathiat nss-mdns issues 75 0 None open Is 'unicast SOA heuristic' a problem for end users? 2022-12-06 18:48:08 UTC
Github lathiat nss-mdns issues 79 0 None open Why does nss-mdns query the default DNS before avahi? 2022-12-06 18:48:08 UTC
Github lathiat nss-mdns pull 84 0 None Merged Draft: Change .local domain heuristic 2022-12-20 11:56:36 UTC

Description Michael K Johnson 2022-11-25 16:58:50 UTC
Created attachment 1927454 [details]
ipptool --ippserver printer.txt ipp://192.168.23.80:631/ipp/print /usr/share/cups/ipptool/get-printer-attributes.test

Description of problem:

Using GNOME Settings → Printers → Add Printer... I can add an entry for my HP Color LaserJet MFP M281fdw, but cannot successfully print a test page.

Version-Release number of selected component (if applicable):

cups-2.4.2-5.fc37.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Open GNOME Settings
2. Printers
3. Add Printer...
4. Choose HP_Color_LaserJet_MFP_M281fdw_B1AC20
5. Try to print a test page, or try to print.

Actual results:

Printer doesn't print

Expected results:

Printer works

Additional info:

From journalctl -u cups, I think that this error came from attempting to add the printer manually (vs. using it automatically):

cupsd[1067]: [CGI] Unable to create PPD file: Could not poll sufficient capability info from the printer (ipps://HP%20Color%20LaserJet%20MFP%20M281fdw%20(B1AC20)._ipps._tcp.local/, ipps://NPIB1AC20.local:631/ipp/print) via IPP!
cupsd[1067]: copy_model: empty PPD file
cupsd[1067]: [Client 59] Returning IPP server-error-internal-error for CUPS-Add-Modify-Printer (ipp://localhost/printers/HP-ColorLaserJet-MFP-M278-M281) from localhost.

This error is not surfaced in the UI in any way.

ipptool --ippserver printer.txt ipp://192.168.23.80:631/ipp/print /usr/share/cups/ipptool/get-printer-attributes.test
(printer.txt attached)

Comment 1 Michael K Johnson 2022-11-25 17:03:05 UTC
Also, given that this is an HP printer, hplip package versions:

# rpm -qa | grep hplip
hplip-common-3.22.6-4.fc37.x86_64
hplip-libs-3.22.6-4.fc37.x86_64
hplip-3.22.6-4.fc37.x86_64

Comment 2 Michael K Johnson 2022-11-25 20:08:43 UTC
bug #1544912 seems to indicate that it is expected to work with those hplip packages, if I read it right, so my expectation is that this is a real regression, beyond my experience that I previously set it up somehow in an earlier version of Fedora... ☺

Comment 3 Zdenek Dohnal 2022-11-28 14:03:17 UTC
Hi Michael,

thank you for reporting the issue!

Are you able to ping NPIB1AC20.local address?

Does it work if you install the printer manually via lpadmin with IP address?

$ lpadmin -p <choose_name> -v ipp://192.168.23.80:631/ipp/print -m everywhere -E

Comment 4 Michael K Johnson 2022-11-28 14:33:23 UTC
Aha!

$ ping NPIB1AC20.local
ping: NPIB1AC20.local: Temporary failure in name resolution

I tried this multiple times as "Add Printer..." was probing and finding printers, and it never worked.

This is on a fresh install of F37 as of the last beta release, with updates applied after the release.



# lpadmin -p testprinter -v ipp://192.168.23.80:631/ipp/print -m everywhere -E

I was able to print the test page from Settings successfully to "testprinter" after running this command.

Comment 5 Zdenek Dohnal 2022-11-29 14:02:44 UTC
Thank you for the info! So it is mDNS resolution problem, which can happen due a missing package, a stopped service or incorrect configuration in /etc/authselect/nsswitch.conf.

Do you have avahi installed and avahi-daemon service running?

Do you have nss-mdns package installed?

Would you mind attaching your /etc/authselect/nsswitch.conf file here as an attachment?

Comment 6 Michael K Johnson 2022-11-29 15:02:21 UTC
Created attachment 1928260 [details]
/etc/authselect/nsswitch.conf

Avahi installed and running; no changes to configuration since the system was installed.

$ rpm -q avahi
avahi-0.8-18.fc37.x86_64

$ systemctl status avahi-daemon
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
     Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled; pre>
     Active: active (running) since Thu 2022-11-24 22:02:56 EST; 4 days ago
TriggeredBy: ● avahi-daemon.socket
   Main PID: 944 (avahi-daemon)
     Status: "avahi-daemon 0.8 starting up."
      Tasks: 2 (limit: 28374)
     Memory: 1.9M
        CPU: 1min 8.536s
     CGroup: /system.slice/avahi-daemon.service
             ├─944 "avahi-daemon: running [gibbie.local]"
             └─969 "avahi-daemon: chroot helper"

Comment 7 Zdenek Dohnal 2022-11-30 11:35:16 UTC
And nss-mdns is installed as well? The nsswitch.conf shows it should be, but I would like you to verify it.

The other thing can be firewall - do you have mDNS allowed in the firewall? IIRC it should be allowed by default on workstations.

The nsswitch.conf looks the same as mine and it works for me in F37 and F36 - please check whether /etc/nsswitch.conf matches as well and whether it is a symlink to /etc/authselect/nsswitch.conf.

You can try whether the device is shown in 'avahi-browse -avrt' output - my guess it will be shown, since CUPS is able to pick up.

Otherwise I would turn on debug mode on avahi-daemon and watch whether it generates some suspicious messages during start or when you ping the mDNS address or during calling avahi-browse.

You can enable Avahi debug logging this by copying its service file /usr/lib/systemd/system/avahi-daemon.service into /etc/systemd/system, add '--debug' into ExecStart and ExecReload lines, do 'systemctl daemon-reload' and restart avahi-daemon. With this the daemon will start log into journal.

If nothing of this helps, I'm reassigning this to Avahi for further investigation.

Comment 8 Michael K Johnson 2022-11-30 13:50:25 UTC
$ rpm -q nss-mdns
nss-mdns-0.15.1-6.fc37.x86_64
$ sha1sum /etc/authselect/nsswitch.conf /etc/nsswitch.conf 
c9470430eef92f60c8de351c4a6044bad351cb45  /etc/authselect/nsswitch.conf
c9470430eef92f60c8de351c4a6044bad351cb45  /etc/nsswitch.conf

avahi-browse -avrt output is long but definitely includes the printer.

These are on a local network, not going through a firewall.

Lots of "sendmsg() to ff02::fb failed: Network is unreachable" in the debug output

I have no idea where those are coming from after looking at the output of "ifconfig":

ifconfig | grep inet
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        inet 192.168.23.65  netmask 255.255.255.0  broadcast 192.168.23.255
        inet6 fe80::8433:d7f0:41fb:f3b8  prefixlen 64  scopeid 0x20<link>

Comment 9 Zdenek Dohnal 2022-12-01 08:00:06 UTC
(In reply to Michael K Johnson from comment #8)
> These are on a local network, not going through a firewall.

Firewall is on your machine (unless you turn it off), so IMO the messages go via firewall, thus the mDNS is enabled - otherwise you wouldn't see the devices.

> 
> Lots of "sendmsg() to ff02::fb failed: Network is unreachable" in the debug
> output

It would be great if you caught the avahi messages into a file while you ping the mDNS address and call 'avahi-browse -avrt'. You can use 'journalctl -f > all_log' (this capture whole journal log - just to be sure if an other component reports something relevant), do the commands and then cancel the journalctl command

I've searched what address is 'ff02::fb' - it is the multicast DNS IPv6 address. I have it in logs as well - it doesn't seem to influence mDNS resolving though.

Other thing you can try is using 'strace' on the ping process - it might let us know which resolver is active and where the error comes from. The command is:

$ strace -yy -Y -tt -T -f -s4092 -o ping.strace ping NPIB1AC20.local


Please attach both files - all_logs and ping.strace - to the bugzilla.

Comment 10 Michael K Johnson 2022-12-02 02:40:43 UTC
Created attachment 1929234 [details]
requested ping strace

Comment 11 Michael K Johnson 2022-12-02 02:41:24 UTC
Created attachment 1929235 [details]
journalctl output during ping and avahi-browse

Comment 12 Michael K Johnson 2022-12-02 02:44:47 UTC
Testing ping with local firewall disabled:

# systemctl stop firewalld.service
$ ping NPIB1AC20.local
ping: NPIB1AC20.local: Temporary failure in name resolution

Comment 13 Zdenek Dohnal 2022-12-02 10:07:48 UTC
Since I'm not focused on Avahi, I've tried to check the strace output and compared it with mine. In my case I see:

180 10712<ping> 06:58:25.771991 socket(AF_UNIX, SOCK_STREAM, 0) = 5<UNIX-STREAM:[111016]> <0.000045>
181 10712<ping> 06:58:25.772366 fcntl(5<UNIX-STREAM:[111016]>, F_GETFD) = 0 <0.000030>
182 10712<ping> 06:58:25.772643 fcntl(5<UNIX-STREAM:[111016]>, F_SETFD, FD_CLOEXEC) = 0 <0.000034>
183 10712<ping> 06:58:25.772923 connect(5<UNIX-STREAM:[111016]>, {sa_family=AF_UNIX, sun_path="/var/run/avahi-daemon/socket"}, 110) = 0 <0.000342>
184 10712<ping> 06:58:25.773458 fcntl(5<UNIX-STREAM:[111016->109159]>, F_GETFL) = 0x2 (flags O_RDWR) <0.000027>
185 10712<ping> 06:58:25.773713 newfstatat(5<UNIX-STREAM:[111016->109159]>, "", {st_mode=S_IFSOCK|0777, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000031>
186 10712<ping> 06:58:25.773852 write(5<UNIX-STREAM:[111016->109159]>, "RESOLVE-HOSTNAME-IPV4 NPI42307C.local\n", 38) = 38 <0.000030>
187 10712<ping> 06:58:25.773948 read(5<UNIX-STREAM:[111016->109159]>, "+ 2 0 NPI42307C.local 192.168.0.112\n", 4096) = 36 <0.000429>
188 10712<ping> 06:58:25.774566 close(5<UNIX-STREAM:[111016->109159]>) = 0 <0.000048>

but this action is missing in your strace, although the nss-mdns library is loaded and modified here:

682553<ping> 21:38:27.439774 openat(AT_FDCWD</tmp>, "/lib64/libnss_mdns4_minimal.so.2", O_RDONLY|O_CLOEXEC) = 5</usr/lib64/libnss_mdns4_minimal.so.2> <0.000045>
682553<ping> 21:38:27.439941 read(5</usr/lib64/libnss_mdns4_minimal.so.2>, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\330G\0\0\0\0\0\0\0\0\0\0@\08\0\v\0@\0\36\0\35\0\1\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0X\r\0\0\0\0\0\0X\r\0\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\5\0\0\0\0\20\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\20\0\0\0\0\0\0m\23\0\0\0\0\0\0m\23\0\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0000\0\0\0\0\0\0\0000\0\0\0\0\0\0\0000\0\0\0\0\0\0@\4\0\0\0\0\0\0@\4\0\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\6\0\0\0\310<\0\0\0\0\0\0\310L\0\0\0\0\0\0\310L\0\0\0\0\0\08\3\0\0\0\0\0\0@\3\0\0\0\0\0\0\0\20\0\0\0\0\0\0\2\0\0\0\6\0\0\0\340<\0\0\0\0\0\0\340L\0\0\0\0\0\0\340L\0\0\0\0\0\0\20\2\0\0\0\0\0\0\20\2\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\350\2\0\0\0\0\0\0\350\2\0\0\0\0\0\0\350\2\0\0\0\0\0\0\260\0\0\0\0\0\0\0\260\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0S\345td\4\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0\3100\0\0\0\0\0\0\3100\0\0\0\0\0\0\3100\0\0\0\0\0\0\204\0\0\0\0\0\0\0\204\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\310<\0\0\0\0\0\0\310L\0\0\0\0\0\0\310L\0\0\0\0\0\08\3\0\0\0\0\0\08\3\0\0\0\0\0\0\1\0\0\0\0\0\0\0\4\0\0\0000\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0\1\0\1\300\4\0\0\0\t\0\0\0\0\0\0\0\2\0\1\300\4\0\0\0\1\0\0\0\0\0\0\0\4\0\0\0\24\0\0\0\3\0\0\0GNU\0M\263\10\276V@z\335\304\1\7\nG\231Sz,\340jd\4\0\0\0|\0\0\0~\32\376\312FDO\0{\"type\":\"rpm\",\"name\":\"nss-mdns\",\"ver", 832) = 832 <0.000036>
682553<ping> 21:38:27.440127 newfstatat(5</usr/lib64/libnss_mdns4_minimal.so.2>, "", {st_mode=S_IFREG|0755, st_size=20312, ...}, AT_EMPTY_PATH) = 0 <0.000036>
682553<ping> 21:38:27.440280 mmap(NULL, 20488, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 5</usr/lib64/libnss_mdns4_minimal.so.2>, 0) = 0x7fcd7a18c000 <0.000040>
682553<ping> 21:38:27.440406 mmap(0x7fcd7a18d000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5</usr/lib64/libnss_mdns4_minimal.so.2>, 0x1000) = 0x7fcd7a18d000 <0.000047>
682553<ping> 21:38:27.440536 mmap(0x7fcd7a18f000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5</usr/lib64/libnss_mdns4_minimal.so.2>, 0x3000) = 0x7fcd7a18f000 <0.000039>
682553<ping> 21:38:27.440658 mmap(0x7fcd7a190000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5</usr/lib64/libnss_mdns4_minimal.so.2>, 0x3000) = 0x7fcd7a190000 <0.000041>
682553<ping> 21:38:27.440782 mmap(0x7fcd7a191000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fcd7a191000 <0.000036>
682553<ping> 21:38:27.440925 close(5</usr/lib64/libnss_mdns4_minimal.so.2>) = 0 <0.000030>

Do you have /var/run/avahi-daemon/socket on your device? That's the file which has to exist for nss-mdns to work and _nss_mdns4_minimal_gethostbyname4_r() fails if it misses.

The other thing can be that glibc (in gaih_inet() function) is not able to find nss-mdns module - this would have to be checked in gdb.

Do you have debugging experience with gdb? If you do, run your ping command in it, set breakpoints on gaih_inet and DL_CALL_FCT (by 'b') and start debugging ('r <your_local_name>') - in DL_CALL_FCT try stepping into with 's' - DL_CALL_FCT() is called in loop which goes via loaded modules - nss-mdns (by _nss_mdns4_minimal_gethostbyname4_r) should be among them.

It really looks like the module is not loaded by glibc at all, so I'm switching to glibc for further help - but please try debugging the ping process as well.

Comment 14 Zdenek Dohnal 2022-12-02 10:08:51 UTC
Switching to glibc - nsswitch.conf looks valid, but _nss_mdns4_minimal_gethostbyname4_r from nss-mdns is not loaded for mDNS resolution. Would you mind looking into it?

Comment 15 Florian Weimer 2022-12-02 10:34:50 UTC
nss-mdns code runs, so it's definitely not a glibc issue:

682553<ping> 21:38:27.441752 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 5<UDP:[12666719]> <0.000070>
682553<ping> 21:38:27.442183 setsockopt(5<UDP:[12666719]>, SOL_IP, IP_RECVERR, [1], 4) = 0 <0.000038>
682553<ping> 21:38:27.442494 connect(5<UDP:[12666719]>, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, 16) = 0 <0.000053>
682553<ping> 21:38:27.442832 poll([{fd=5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, events=POLLOUT}], 1, 0) = 1 ([{fd=5, revents=POLLOUT}]) <0.000070>
682553<ping> 21:38:27.443276 sendto(5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, "\252\245\1 \0\1\0\0\0\0\0\1\5local\0\0\6\0\1\0\0)\4\260\0\0\0\0\0\0", 34, MSG_NOSIGNAL, NULL, 0) = 34 <0.000204>
682553<ping> 21:38:27.443755 poll([{fd=5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, events=POLLIN}], 1, 5000) = 1 ([{fd=5, revents=POLLIN}]) <0.000403>
682553<ping> 21:38:27.444480 recvfrom(5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, "\252\245\201\200\0\1\0\1\0\0\0\1\5local\0\0\6\0\1\300\f\0\6\0\1\0\0\0?\0002\7ns1-etm\3att\3net\0\6nomail\3etm\300+\0\0\0\1\0\t:\200\0\0\16\20\0$\352\0\0\0\3\204\0\0)\377\326\0\0\0\0\0\0", 65535, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, [28 => 16]) = 96 <0.000088>
682553<ping> 21:38:27.444923 close(5<UDP:[127.0.0.1:41316->127.0.0.53:53]>) = 0 <0.000048>

I think this is the “local“ probing mentioned in NEWS:

“
## Mon Jan 22 2018:

[Version 0.11](https://github.com/lathiat/nss-mdns/releases/tag/v0.11)
released. The first release in some time! Highlights:
[…]
* nss-mdns now implements [standard
  heuristics](https://support.apple.com/en-us/HT201275) for
  detecting `.local` unicast resolution and will automatically
  disable resolution when a local server responds to `.local` requests
”

There is a DNS response, so this heuristic kicks in. I'm not sure if this is a bug, it looks like it works as intended.

Comment 16 Zdenek Dohnal 2022-12-02 12:03:10 UTC
Hi Florian,

aha! I thought that part of log is glibc calling systemd-resolved somewhere I don't see, but here nss-mdns calls DNS query before it access the avahi socket - src/util.c - so nss-mdns calls systemd-resolved :D :

128     result =
129         res_nquery(&state, "local", ns_c_in, ns_t_soa, answer, sizeof answer);

I guess I see a difference - I have this response and mDNS resolving works:

176 10712<ping> 06:58:25.735883 sendto(5<UDP:[127.0.0.1:49632->127.0.0.53:53]>, "T\217\1 \0\1\0\0\0\0\0\1\5local\0\0\6\0\1\0\0)\4\260\0\0\0\0\0\0", 34, MSG_NOSIGNAL, NULL, 0) = 34 <0.000094>
177 10712<ping> 06:58:25.736033 poll([{fd=5<UDP:[127.0.0.1:49632->127.0.0.53:53]>, events=POLLIN}], 1, 5000) = 1 ([{fd=5, revents=POLLIN}]) <0.035493>
178 10712<ping> 06:58:25.771641 recvfrom(5<UDP:[127.0.0.1:49632->127.0.0.53:53]>, "T\217\201\203\0\1\0\0\0\1\0\1\5local\0\0\6\0\1\0\0\6\0\1\0\0\32\320\0@\1a\froot-servers\3net\0\5nstld\fverisign-grs\3com\0x\207\32\247\0\0\7\10\0\0\3\204\0\t:\200\0\1Q\200\0\0)\377\326\0\0\0\0\0\0", 65535, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, [28 => 16]) = 109 <0.000041>

which probably means no relevant server was found and the request went up to the root servers, and Michael has:

682553<ping> 21:38:27.443276 sendto(5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, "\252\245\1 \0\1\0\0\0\0\0\1\5local\0\0\6\0\1\0\0)\4\260\0\0\0\0\0\0", 34, MSG_NOSIGNAL, NULL, 0) = 34 <0.000204>
682553<ping> 21:38:27.443755 poll([{fd=5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, events=POLLIN}], 1, 5000) = 1 ([{fd=5, revents=POLLIN}]) <0.000403>
682553<ping> 21:38:27.444480 recvfrom(5<UDP:[127.0.0.1:41316->127.0.0.53:53]>, "\252\245\201\200\0\1\0\1\0\0\0\1\5local\0\0\6\0\1\300\f\0\6\0\1\0\0\0?\0002\7ns1-etm\3att\3net\0\6nomail\3etm\300+\0\0\0\1\0\t:\200\0\0\16\20\0$\352\0\0\0\3\204\0\0)\377\326\0\0\0\0\0\0", 65535, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, [28 => 16]) = 96 <0.000088>

it looks like Michael has a server nearby which can provide SOA records IIUC and it's called ns1-etm.att and nomail.etm? Michael, do those names ring a bell?

We can check the communication of resolved when it gets the result - Michael, would you mind catching network packets by tcpdump during using ping?

The command is:

$ sudo tcpdump -i any -w ping.pcap -s0


Thank you in advance! It seems something in your network blocks the mDNS resolution.

Comment 17 Michael K Johnson 2022-12-02 20:05:19 UTC
I have AT&T U-Verse internet, and they try really hard to force typo-squatting ad-serving DNS. I suspect this is ns1-etm.att.net and given the similarity in names also the nomail.etm — they do not ring bells other than one more bad result of AT&T typo-squatting on DNS to try to make an extra buck off me.

They apparently used to allow you to turn of what they call "DNS error assist" but this no longer appears to be possible.

I am surprised that with systemd-resolved .local resolution would be delegated. I though part of the point was not to do that.

> something in your network blocks the mDNS resolution

I'll repeat that the laptop and printer are on the same network segment and I reproduced this with the firewall disabled.

I'll attach the ping.pcap from "tcpdump -i any -w ping.pcap -s0 port 53"

Comment 18 Michael K Johnson 2022-12-02 20:05:59 UTC
Created attachment 1929490 [details]
tcpdump -i any -w ping.pcap -s0 port 53

Comment 19 Zdenek Dohnal 2022-12-05 12:58:25 UTC
Ok, the tcpdump gives a clearer message - the DNS server defining 'local' domain is called 'ns1-etm.att.net', because it contains SOA record for the domain.

(In reply to Michael K Johnson from comment #17)
> I am surprised that with systemd-resolved .local resolution would be
> delegated. I though part of the point was not to do that.

C API functions from glibc for hostname resolution follow service sequence defined in 'hosts' line at /etc/nsswitch.conf - so it looks first into /etc/hosts, then your local hostname, nss-mdns and the fourth is systemd-resolved. So since the hostname is not defined in the file and is not your hostname, it tries to run libnss_mdns4_minimal from nss-mdns. The library checks whether there is a DNS server in your network which provides SOA record for 'local' domain and if it exists, skips the resolution. systemd-resolved is run as the last here, because it has the following sequence in nsswitch.conf:

resolve [!UNAVAIL=return]

which means that if systemd-resolved is not disabled, always return the result (either successful or failed). systemd-resolved is enabled and running, but has mDNS resolution turned off by default to prevent possible conflicts with avahi, so it returns the failure and glibc doesn't try the next service.

> 
> > something in your network blocks the mDNS resolution
> 
> I'll repeat that the laptop and printer are on the same network segment and
> I reproduced this with the firewall disabled.

mDNS resolution via nss-mdns is blocked in your network - to be precise blocked by 'ns1-etm.att.net' server defining SOA record for 'local' domain, which causes nss-mdns to don't do the resolving and mDNS support in systemd-resolved is turned off by default.

So IMO there are two possible solutions:

A) change DNS server manually for your connection

B) try setting up systemd-resolved to do the resolution - you can follow the steps here https://docs.fedoraproject.org/en-US/quick-docs/cups-useful-tricks/#_how_to_setup_mdns_with_systemd_resolved , maybe systemd-resolved does not have this heuristics as nss-mdns and can give you an answer.

Can you try some (or better both) of them?

Comment 20 Michael K Johnson 2022-12-05 13:40:02 UTC
I have now changed my home firewall to override AT&T's lying DNS server with working ones. In my case, this started with:

# nmcli con mod enp8s0 ipv4.ignore-auto-dns yes
# nmcli con mod enp8s0 ipv4.dns "8.8.8.8 8.8.4.4 1.1.1.1"
# systemctl restart NetworkManager

I also put DNS servers into /etc/systemd/resolved.conf on my firewall (also running Fedora). I also added the IPv6 nameservers:

# grep '^DNS=' /etc/systemd/resolved.conf
DNS=8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.google 2001:4860:4860::8844#dns.google 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com 2606:4700:4700::1111#cloudflare-dns.com 2606:4700:4700::1001#cloudflare-dns.com
# systemctl restart systemd-resolved

That firewall is the DNS server for my home network (so that it can add internal-only hostnames from /etc/hosts on the firewall).

I no longer see AT&T's typo-squatting DNS server when I intentionally break a domain in the browser on my laptop behind that whole-house firewall, so that's progress.

Then, after that, I followed the instructions in B) on my laptop on which we've been trying to get working. Immediately after that, instead of a nearly instantaneous failure in name resolution when I ping NPIB1AC20.local, it took several seconds before failing. A few minutes later, after making no changes, name resolution started to work. So a cache must have been in place somewhere and that cache must have timed out in between the two attempts.

Therefore, this particular problem looks like it is resolved *for me*.

This wasn't mainly about me. I could have just worked around it with an IP address and moved on.

Working around this problem on my system with a configuration change won't make printing more reliable out of the box for others with this common problem of lying DNS. I'm going to guess that, for example, macOS doesn't depend on working upstream DNS for rendezvous to work for any purpose including printing. My wife's windows laptop has no problems printing with rendezvous, even with exactly the same AT&T DNS that broke Fedora. I don't understand why MulticastDNS=resolve is not the default install state. Roughly everyone with AT&T internet service — and maybe other typo-squatting ISP DNS services — ends up with broken .local this way. At least back in the days of RHL, we had a strong rule of setting the default configuration to work best out of the box for most people. I would expect this to be the same.

Comment 21 Michael K Johnson 2022-12-05 16:06:58 UTC
On another affected system running Fedora 36, I discovered that I was still getting SOA records from ns1-etm.att.net even after running "resolvectl flush-caches" on the firewall and that system itself. Rebooting the (Fedora) firewall, however, resolved the issue, giving me "No such name" SOA from a.root-servers.net.

This allowed that second affected system to work without MulticastDNS=resolve, because of the negative entry coming back from good DNS servers. My point about resolving .local mDNS correctly out of the box still stands. Fedora's default configuration is basically being DOSed by a malicious ISP, but it's a configuration that is easily susceptible to a common form of DNS DOS, and a more robust configuration is easily available and in fact documented.

Comment 22 Zdenek Dohnal 2022-12-05 16:39:35 UTC
(In reply to Michael K Johnson from comment #20)
> Therefore, this particular problem looks like it is resolved *for me*.

Good to know it worked!

> Working around this problem on my system with a configuration change won't
> make printing more reliable out of the box for others with this common
> problem of lying DNS.

Since I'm not a mDNS-related packages maintainer, I have only a brief knowledge about it - and it seems you're right, there is group of users affected by this and they opened an upstream issue at https://github.com/lathiat/nss-mdns/issues/75 .

The summary after my brief reading is:

- your DNS server really breaks compatibility with RFC standard, so it is a problem caused by standard incompatible DNS server.
- Windows and MacOS might have removed the heuristics, but nobody knows for sure
- upstream would like someone who has the issue to do the investigation - especially about what happens if the record from DNS server conflicts with mDNS hostname and whether there is an administrative way of changing conflict behavior (this point is from the last comment from Adam, who is Fedora nss-mdns's maintainer and upstream)

It would be great if you could help with investigation on the upstream ticket, since you have a standard incompatible DNS server in your network.

> I don't understand why MulticastDNS=resolve is not the default install state.
> Roughly everyone with AT&T internet service — and maybe other typo-squatting
> ISP DNS services — ends up with broken .local this way. At least back in the
> days of RHL, we had a strong rule of setting the default configuration to
> work best out of the box for most people. I would expect this to be the same.

I applied this rule as well :) - I didn't hit your issue myself, nor all people who reported an different issue to me on CUPS and used mDNS hostnames. And, in comparison to systemd-resolved's MulticastDNS=resolve, it didn't require any additional configuration (for resolved you have to enable mDNS and LLMNR for your specific connection), so it's out-of-the-box/installation won there.

But I can see it is a problem which we should tackle if we want out-of-the-box mDNS working everywhere - IMO options are:

A) switch to systemd-resolved and sacrifices out-of-the-box functionality - has to be configured in your connection via NetworkManager
B) drop the heuristics from nss-mdns or implement a different algorithm for it (looks like Mac is doing A query via DNS to find out whether your device really is defined in the zone - but some people can argue this leaks your local mdns names to the DNS server, which is violation of privacy for them...)
C) add /etc/mdns.allow file with '*' and use 'mdns' instead of 'mdns4_minimal' in /etc/nsswitch.conf, which bypasses the heuristics.


From my POV as printing maintainer I'm for functional solution which requires no user intervention in most cases - so right now nss-mdns. systemd-resolved mdns resolution has to be enabled in NetworkManager every time and for every connection, regardless whether the DNS server in the network violates RFC 6762.

But this should be something which nss-mdns maintainer and people around mDNS in Fedora should decide on their own - I have only the point of view as consumer of mDNS.

Comment 23 Michael K Johnson 2022-12-05 19:02:46 UTC
OK, so I understand that .local doesn't mean only the local machine, but also the local network. So it's reasonable for .local to be served by the local DNS but not the internet. Which means that a configuration that allows mDNS on top of a SOA record would solve the problem on individual machines.

Now, additionally, I have learned that my firewall system configuration is technically wrong; it should never send .local outside the local network. For my case of a firewall running dnsmasq, I should configure dnsmasq not to pass requests for .local to its upstream server. I believe that this configuration suffices:

local=/local/

I have done that, and the system without MulticastDNS=resolve still correctly resolves NPIB1CA20.local, so I do not appear to be depending on google/cloudflare correctly rejecting .local

This won't help folks who are just connected to (for example) a AT&T modem/access point, of course, but including it here for anyone else with a similar configuration who finds this bug report in the course of their searches...

Comment 24 Petr Menšík 2022-12-06 18:48:08 UTC
Adding links to related upstream issues.

Note many implementations of DNSSEC cache with aggressive caching enabled would prevent sending queries to .local domain to upstream forwarders. Because it knows that name does not exist and therefore anything below that would not exist too. I doubt dnsmasq is able to do that, I am sure bind and unbound are able to do that.

Comment 25 Petr Menšík 2022-12-20 11:56:37 UTC
My proposal [1] were merged in upstream. It changes the logic to always do the multicast lookup. But when .local SOA is present in normal DNS, it continues to next nss plugin (dns or resolve) if the name were not found. That introduces 5-10s delay on .local names in DNS, but they remain resolvable. When faster lookup in DNS is required, then the remaining solution is to remove mdns* plugin from /etc/nsswitch.conf altogether. Because it would not be used in that network anyway.

It might be possible to disable mdns resolution just in some networks via Network Manager. That might skip resolution immediately. avahi-daemon does not support runtime reconfiguration at the moment however.

1. https://github.com/lathiat/nss-mdns/pull/84

Comment 26 Zdenek Dohnal 2023-01-16 08:44:24 UTC
Hi Adam and Petr,

I've announced the following change for F38 https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZRXYCB2MJHXBTEGZGI6T472KFAJURJMT/ , which will depend on mDNS for automatic printer installation in the default printing stack.

I'm not sure how many people have the setup as Michael here, but if they do, they will hit the same issue as him for driverless USB printers. I haven't added this issue as a blocker in the Change proposal, since we don't know how many users are affected, but it would be great if Petr's PR was backported to Fedora 38 to avoid problems.

The checkpoint for changes to be testable is 7th February (schedule here - https://fedorapeople.org/groups/schedule/f-38/f-38-key-tasks.html), would it be possible to backport+build in rawhide nss-mdns change till 31st January?

Comment 27 Petr Menšík 2023-01-16 14:04:56 UTC
That already happened by PR #9 [1], which is already merged. But it is not yet built in Rawhide. Started build with version nss-mdns-0.15.1-7.fc38, but it failed. I think with unrelated reason.

It would make (unicast) queries into .local domain slower, but will try to resolve all 2-label names under .local in MDNS, even if .local SOA exists in DNS. If that is not wanted, the only option remains to remove mdns* from /etc/nsswitch.conf, because it were not used on that network anyway.

If anyone has .local zone in DNS and using that for any useful data, then it should moved away soon. A good candidate instead is 'home.arpa' domain [2]. Local zone is just for multicast resolution and should not be used for anything else.

1. https://src.fedoraproject.org/rpms/nss-mdns/pull-request/9
2. https://www.rfc-editor.org/rfc/rfc8375.html

Comment 28 Petr Menšík 2023-01-16 14:53:39 UTC
Oh, addition of unit test does not contain the patch file in the repository. Needs also PR #10.

Comment 29 Petr Menšík 2023-01-16 21:03:15 UTC
Built also with PR #10 merged. That means .local heuristic change is effective in f38+. No backport to stable releases yet.

Comment 30 Zdenek Dohnal 2023-03-07 08:47:12 UTC
Can we get this into stable releases as well? The bug is against F37.

Comment 31 Petr Menšík 2023-03-15 23:04:30 UTC
I am not sure we want this change into already released stable branches. It changes previous behaviour and can lead to unwanted changes. We sort of go from one regression to (I think) less severe one, but still a regression. But maybe just documenting it in the update would be enough.

Besides I do not have even commit rights into nss-mdns package, I cannot do that even if I wanted.

Comment 32 Zdenek Dohnal 2023-03-16 08:20:53 UTC
Ok, understood, I've got confused because I thought you need the rights to even build the package, but that's not the case - we can at least say it is fixed in next release.

Comment 33 Red Hat Bugzilla 2023-09-19 04:30:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.