Fedora Account System
Red Hat Associate
Red Hat Customer
For some time I have observed that nslookup queries on my various Fedora systems were returning: nslookup communications error to x.x.x.x#53: timed out and then immediately resolved for the query. In trying to track this down I did 2 fresh installs of Fedora Workstation 38 on my own hypervisor and in the cloud. The problem persists, leading me to believe that there is a bug. Example from fresh install of Fedora 38: $ nslookup > server 8.8.8.8 Default server: 8.8.8.8 Address: 8.8.8.8#53 > set debug > cisco.com ;; communications error to 8.8.8.8#53: timed out Server: 8.8.8.8 Address: 8.8.8.8#53 ------------ QUESTIONS: cisco.com, type = A, class = IN ANSWERS: -> cisco.com internet address = 72.163.4.185 ttl = 281 AUTHORITY RECORDS: ADDITIONAL RECORDS: ------------ Non-authoritative answer: Name: cisco.com Address: 72.163.4.185 ------------ QUESTIONS: cisco.com, type = AAAA, class = IN ANSWERS: -> cisco.com has AAAA address 2001:420:1101:1::185 ttl = 133 AUTHORITY RECORDS: ADDITIONAL RECORDS: ------------ Name: cisco.com Address: 2001:420:1101:1::185 > Reproducible: Always Steps to Reproduce: Install Fedora 38 nslookup set debug server 8.8.8.8 (or any server of your choosing) cisco.com (or any other valid name of your choosing) 8 out of 10 times you'll get the communications error message, then a result. Unable to select bind-utils as the involved package for nslookup. Not in list.
Interesting issue. It seems to be present just in interactive mode of nslookup. When I set those parameters from command line, I have not seen it there. $ nslookup -debug cisco.com 8.8.8.8 Server: 8.8.8.8 Address: 8.8.8.8#53 ------------ QUESTIONS: cisco.com, type = A, class = IN ANSWERS: -> cisco.com internet address = 72.163.4.185 ttl = 247 AUTHORITY RECORDS: ADDITIONAL RECORDS: ------------ Non-authoritative answer: Name: cisco.com Address: 72.163.4.185 ------------ QUESTIONS: cisco.com, type = AAAA, class = IN ANSWERS: -> cisco.com has AAAA address 2001:420:1101:1::185 ttl = 69 AUTHORITY RECORDS: ADDITIONAL RECORDS: ------------ Name: cisco.com Address: 2001:420:1101:1::185 But in interactive mode, even set debug followed by name seems to be enough: $ nslookup > set debug > isc.org ;; communications error to 127.0.0.1#53: timed out Server: 127.0.0.1 Address: 127.0.0.1#53 ------------ QUESTIONS: isc.org, type = A, class = IN ANSWERS: -> isc.org internet address = 149.20.1.66 ttl = 300 AUTHORITY RECORDS: ADDITIONAL RECORDS: ------------ Non-authoritative answer: Name: isc.org Address: 149.20.1.66 ------------ QUESTIONS: isc.org, type = AAAA, class = IN ANSWERS: -> isc.org has AAAA address 2001:4f8:1:f::66 ttl = 300 AUTHORITY RECORDS: ADDITIONAL RECORDS: ------------ Name: isc.org Address: 2001:4f8:1:f::66 I agree it is too fast reply to met timeout, so there is more likely some logic wrong to print timeout message without any timeout happening.
For some reason set debug made interactively results in the first recv_done returns eresult==timeout, always. If I set it on command line, even the first one returns with success. Queried name does not matter, even '.' works too. (gdb) p eresult $3 = ISC_R_TIMEDOUT (gdb) bt #0 recv_done (handle=0x7ffff5081000, eresult=ISC_R_TIMEDOUT, region=<optimized out>, arg=<optimized out>) at ../../../bin/dig/dighost.c:4037 #1 0x00007ffff7a29a30 in isc__nm_async_readcb (worker=worker@entry=0x0, ev0=ev0@entry=0x7ffff5dfe140) at ../../../lib/isc/netmgr/netmgr.c:2885 #2 0x00007ffff7a29b7c in isc__nm_readcb (sock=sock@entry=0x7ffff5027a00, uvreq=<optimized out>, eresult=eresult@entry=ISC_R_TIMEDOUT) at ../../../lib/isc/netmgr/netmgr.c:2858 #3 0x00007ffff7a2aa17 in isc__nmsocket_readtimeout_cb (timer=0x7ffff5027e98) at ../../../lib/isc/netmgr/netmgr.c:2085 #4 0x00007ffff793c1f0 in uv__run_timers (loop=0x7ffff622f010) at src/timer.c:178 #5 uv_run (loop=loop@entry=0x7ffff622f010, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:393 #6 0x00007ffff7a2e6b4 in nm_thread (worker0=0x7ffff622f000) at ../../../lib/isc/netmgr/netmgr.c:698 #7 0x00007ffff7a64569 in isc__trampoline_run (arg=0x5555555b38b0) at ../../../lib/isc/trampoline.c:189 #8 0x00007ffff74ae12d in start_thread (arg=<optimized out>) at pthread_create.c:442 #9 0x00007ffff752fbc0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) p eresult $4 = ISC_R_TIMEDOUT (gdb) info locals sockstr = "127.0.0.1#53", '\000' <repeats 50 times> query = 0x7ffff5042000 b = {magic = 0, base = 0x0, length = 0, used = 0, current = 0, active = 0, link = {prev = 0x0, next = 0x0}, mctx = 0x0, autore = false} msg = 0x0 result = <optimized out> n = 0x0 l = 0x7ffff5000000 docancel = false donext = false match = true done_process_opt = false parseflags = <optimized out> id = 0 msgflags = 0 newedns = <optimized out> peer = {type = {sa = {sa_family = 0, sa_data = '\000' <repeats 13 times>}, sin = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = { __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, ss = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, sunix = {sun_family = 0, sun_path = '\000' <repeats 107 times>}}, length = 0, link = {prev = 0x0, next = 0x0}} next_lookup = <optimized out> cancel_lookup = <optimized out> keep_query = <optimized out> __func__ = "recv_done"
I have tried to capture the reason by using strace, but once I follow also children forks, it never happened to me. I think that means there is some weird race condition, which does not happen when it is slower. $ strace -o /tmp/nslookup.strace -f nslookup
Ah, found reliable reproducer. The set at the beginning does matter. What does matter is just waiting before query is asked. Reproduce with: (sleep 6 && echo localhost) | nslookup The problem is not present in development version 9.19.11, but reliably reproduced on 9.18.14.
Reported upstream: https://gitlab.isc.org/isc-projects/bind9/-/issues/4044
Fixed in release 9.18.17, bug #2223913.