Bug 2192225
| Summary: | nslookup communications error to x.x.x.x#53: timed out | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | dan |
| Component: | bind | Assignee: | Petr Menšík <pemensik> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 38 | CC: | anon.amish, dan, dns-sig, mruprich, pemensik, vonsch, zdohnal |
| Target Milestone: | --- | Keywords: | Reproducer, Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-22 17:22:05 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2223913 | ||
| Bug Blocks: | |||
|
Description
dan
2023-04-30 16:30:25 UTC
Interesting issue. It seems to be present just in interactive mode of nslookup. When I set those parameters from command line, I have not seen it there.
$ nslookup -debug cisco.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8#53
------------
QUESTIONS:
cisco.com, type = A, class = IN
ANSWERS:
-> cisco.com
internet address = 72.163.4.185
ttl = 247
AUTHORITY RECORDS:
ADDITIONAL RECORDS:
------------
Non-authoritative answer:
Name: cisco.com
Address: 72.163.4.185
------------
QUESTIONS:
cisco.com, type = AAAA, class = IN
ANSWERS:
-> cisco.com
has AAAA address 2001:420:1101:1::185
ttl = 69
AUTHORITY RECORDS:
ADDITIONAL RECORDS:
------------
Name: cisco.com
Address: 2001:420:1101:1::185
But in interactive mode, even set debug followed by name seems to be enough:
$ nslookup
> set debug
> isc.org
;; communications error to 127.0.0.1#53: timed out
Server: 127.0.0.1
Address: 127.0.0.1#53
------------
QUESTIONS:
isc.org, type = A, class = IN
ANSWERS:
-> isc.org
internet address = 149.20.1.66
ttl = 300
AUTHORITY RECORDS:
ADDITIONAL RECORDS:
------------
Non-authoritative answer:
Name: isc.org
Address: 149.20.1.66
------------
QUESTIONS:
isc.org, type = AAAA, class = IN
ANSWERS:
-> isc.org
has AAAA address 2001:4f8:1:f::66
ttl = 300
AUTHORITY RECORDS:
ADDITIONAL RECORDS:
------------
Name: isc.org
Address: 2001:4f8:1:f::66
I agree it is too fast reply to met timeout, so there is more likely some logic wrong to print timeout message without any timeout happening.
For some reason set debug made interactively results in the first recv_done returns eresult==timeout, always. If I set it on command line, even the first one returns with success. Queried name does not matter, even '.' works too.
(gdb) p eresult
$3 = ISC_R_TIMEDOUT
(gdb) bt
#0 recv_done (handle=0x7ffff5081000, eresult=ISC_R_TIMEDOUT, region=<optimized out>, arg=<optimized out>) at ../../../bin/dig/dighost.c:4037
#1 0x00007ffff7a29a30 in isc__nm_async_readcb (worker=worker@entry=0x0, ev0=ev0@entry=0x7ffff5dfe140) at ../../../lib/isc/netmgr/netmgr.c:2885
#2 0x00007ffff7a29b7c in isc__nm_readcb (sock=sock@entry=0x7ffff5027a00, uvreq=<optimized out>, eresult=eresult@entry=ISC_R_TIMEDOUT)
at ../../../lib/isc/netmgr/netmgr.c:2858
#3 0x00007ffff7a2aa17 in isc__nmsocket_readtimeout_cb (timer=0x7ffff5027e98) at ../../../lib/isc/netmgr/netmgr.c:2085
#4 0x00007ffff793c1f0 in uv__run_timers (loop=0x7ffff622f010) at src/timer.c:178
#5 uv_run (loop=loop@entry=0x7ffff622f010, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:393
#6 0x00007ffff7a2e6b4 in nm_thread (worker0=0x7ffff622f000) at ../../../lib/isc/netmgr/netmgr.c:698
#7 0x00007ffff7a64569 in isc__trampoline_run (arg=0x5555555b38b0) at ../../../lib/isc/trampoline.c:189
#8 0x00007ffff74ae12d in start_thread (arg=<optimized out>) at pthread_create.c:442
#9 0x00007ffff752fbc0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) p eresult
$4 = ISC_R_TIMEDOUT
(gdb) info locals
sockstr = "127.0.0.1#53", '\000' <repeats 50 times>
query = 0x7ffff5042000
b = {magic = 0, base = 0x0, length = 0, used = 0, current = 0, active = 0, link = {prev = 0x0, next = 0x0}, mctx = 0x0, autore = false}
msg = 0x0
result = <optimized out>
n = 0x0
l = 0x7ffff5000000
docancel = false
donext = false
match = true
done_process_opt = false
parseflags = <optimized out>
id = 0
msgflags = 0
newedns = <optimized out>
peer = {type = {sa = {sa_family = 0, sa_data = '\000' <repeats 13 times>}, sin = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0},
sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0},
ss = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, sunix = {sun_family = 0,
sun_path = '\000' <repeats 107 times>}}, length = 0, link = {prev = 0x0, next = 0x0}}
next_lookup = <optimized out>
cancel_lookup = <optimized out>
keep_query = <optimized out>
__func__ = "recv_done"
I have tried to capture the reason by using strace, but once I follow also children forks, it never happened to me. I think that means there is some weird race condition, which does not happen when it is slower. $ strace -o /tmp/nslookup.strace -f nslookup Ah, found reliable reproducer. The set at the beginning does matter. What does matter is just waiting before query is asked. Reproduce with: (sleep 6 && echo localhost) | nslookup The problem is not present in development version 9.19.11, but reliably reproduced on 9.18.14. Reported upstream: https://gitlab.isc.org/isc-projects/bind9/-/issues/4044 Fixed in release 9.18.17, bug #2223913. |