Bug 161181 - resolver fails to handle truncated UDP replies
resolver fails to handle truncated UDP replies
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
: 165802 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-21 02:19 EDT by Tomasz Kepczynski
Modified: 2007-11-30 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version: FC5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-21 22:07:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace -i -s 1024 rdesktop tkepczyx-mobl1.ger.corp.intel.com. (38.56 KB, text/plain)
2005-06-21 02:19 EDT, Tomasz Kepczynski
no flags Details
strace -i -s 1024 ping tkepczyx-mobl1.ger.corp.intel.com (14.81 KB, text/x-log)
2005-08-29 06:40 EDT, Tomasz Kepczynski
no flags Details

  None (edit)
Description Tomasz Kepczynski 2005-06-21 02:19:24 EDT
Description of problem:

I am trying to connect to remote host:
gklab-59-001:~> rdesktop tkepczyx-mobl1.ger.corp.intel.com.
ERROR: tkepczyx-mobl1.ger.corp.intel.com.: unable to resolve host

nslookup output:
gklab-59-001:~> nslookup  tkepczyx-mobl1.ger.corp.intel.com.
;; Truncated, retrying in TCP mode.
Server:         172.28.168.7
Address:        172.28.168.7#53

Non-authoritative answer:
Name:   tkepczyx-mobl1.ger.corp.intel.com
Address: 172.28.37.68

dig output:
gklab-59-001:~> dig tkepczyx-mobl1.ger.corp.intel.com.
;; Truncated, retrying in TCP mode.

; <<>> DiG 9.3.1 <<>> tkepczyx-mobl1.ger.corp.intel.com.
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17848
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 20, ADDITIONAL: 20

;; QUESTION SECTION:
;tkepczyx-mobl1.ger.corp.intel.com. IN  A

;; ANSWER SECTION:
tkepczyx-mobl1.ger.corp.intel.com. 384 IN A     172.28.37.68

(AUTHORITY section is long and skipped here as I consider this
information sensitive).
I will attach strace.

Version-Release number of selected component (if applicable):
1.4.0, FC4 fully upgraded to latest updates.

How reproducible:
always

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Cannot connect to remote host using its name.

Expected results:
Connect to remote host using its name.

Additional info:
I guess the problem is due to large response which requires
fallback to DNS over TCP and this is not correctly handled.
Comment 1 Tomasz Kepczynski 2005-06-21 02:19:24 EDT
Created attachment 115731 [details]
strace -i -s 1024 rdesktop tkepczyx-mobl1.ger.corp.intel.com.
Comment 2 Tomasz Kepczynski 2005-06-21 09:00:59 EDT
I've just tried simple C++ program:
#include <netdb.h>
#include <cstdio>

char name[] = "tkepczyx-mobl1.ger.corp.intel.com";

int main()
{
  struct hostent *he;
  he = gethostbyname(name);
  printf("hostent: %p\n", he);
  if(he == NULL)
    printf("h_errno: %d\n", h_errno);
  return(0);
}
which also fails with h_errno = TRY_AGAIN, while dig and nslookup
still work. This points to a problem in library, reassigning to glibc.
Comment 3 David Zeuthen 2005-06-21 13:20:40 EDT
Reassigning to glibc maintainer.
Comment 4 Tomasz Kepczynski 2005-06-21 13:47:02 EDT
A few other hints:
- host I am trying to reach has 20 NS records associated with, other
  hosts with fewer NS records work fine (2-3 NS'es)
- the problem did not exist in FC3 (but I am not 100% sure that in
  the mean time there were no changes in DNS)
- I tried telnet and ssh to the same host with similar result
Comment 5 Jakub Jelinek 2005-06-22 04:27:19 EDT
Can you reproduce it with some publicly accessible DNS?
Comment 6 Tomasz Kepczynski 2005-06-22 07:11:26 EDT
No. But setting test zone with one A entry and 20 or so NS entries should do
the trick. I used the following GENERATE statements to save typing:
$GENERATE 1-50 @ NS nameserver${0,2}
$GENERATE 1-50 nameserver${0,2} A 192.168.253.${200}
I actually confirmed the fault with this kind of setup on x86_64 platform.
I also tried CentOS 4 which ships with recompiled from source RHEL's glibc
glibc-2.3.4-2.9 and it works fine.
I can also add that adding the above lines to my usual setup completly
screwed up my nfs client which uses hostnames.
Comment 7 Ulrich Drepper 2005-08-21 12:42:29 EDT
*** Bug 165802 has been marked as a duplicate of this bug. ***
Comment 8 Ulrich Drepper 2005-08-21 19:07:26 EDT
I think I fixed this now upstream.  The next rawhide build will probably have it
(look out for this bug number in the rpm changelog).  Once it is available,
consider trying it.
Comment 9 Jakub Jelinek 2005-08-26 05:04:53 EDT
Yeah, glibc-2.3.90-9 and above should fix this.
Comment 10 Tomasz Kepczynski 2005-08-29 06:13:08 EDT
Sorry guys, I've just tried it on glibc glibc-2.3.90-10 i686 and it does not work.
Comment 11 Jakub Jelinek 2005-08-29 06:23:53 EDT
If you are using nscd, have you flushed nscd cache (i.e. nscd -i hosts)?
Or stop nscd before testing.
Then, please attach a new strace -i s 1024 log.
The one in #1 showed
connect(4, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("172.28.168.7")}, 28) = -1 EINVAL
and similar errors, which is exactly what has been fixed in 2.3.90-9 and above.
Now the last argument to connect in this case will be 16 and it shouldn't fail
with EINVAL.
Comment 12 Tomasz Kepczynski 2005-08-29 06:40:02 EDT
Created attachment 118202 [details]
strace -i -s 1024 ping tkepczyx-mobl1.ger.corp.intel.com
Comment 13 Tomasz Kepczynski 2005-08-29 06:41:43 EDT
Now I am not sure if it is glibc or ping. Telnet and rdesktop somehow works.
Comment 14 Jakub Jelinek 2005-08-29 06:52:44 EDT
In the ping case it might be a SELinux policy issue.  Look at your logs
for audit messages.
Comment 15 Tomasz Kepczynski 2005-08-29 06:56:28 EDT
I guess this may be it:
type=AVC msg=audit(1125311948.892:16533349): avc:  denied  { name_connect } for
 pid=4163 comm="ping" dest=53 scontext=user_u:system_r:ping_t tcontext=system_u:
object_r:dns_port_t tclass=tcp_socket
type=SYSCALL msg=audit(1125311948.892:16533349): arch=40000003 syscall=102 succe
ss=no exit=-13 a0=3 a1=bfed214c a2=b35ff4 a3=b7fb1690 items=0 pid=4163 auid=4327
0 uid=43270 gid=32602 euid=43270 suid=43270 fsuid=43270 egid=32602 sgid=32602 fs
gid=32602 comm="ping" exe="/bin/ping"
Comment 16 Jakub Jelinek 2005-08-29 06:59:56 EDT
Then the glibc bug is fixed.  Whether this is a bug in selinux policy
or whether use of nscd in this case is mandatory is something I'll leave
to the selinux maintainers to decide.
Comment 17 Daniel Walsh 2005-08-29 10:06:40 EDT
Why would ping be trying to tcp connect to port 53?

Comment 18 Tomasz Kepczynski 2005-08-29 10:10:36 EDT
In case UDP resolver query fails it is retried using TCP. And this was a case -
UDP query returned so called "truncated" result (i.e. more data then UDP datagram
can contain) and query in this case was retried and denied by SELinux.
This probably happend "behind the scenes" in resolver library.
Comment 20 Daniel Walsh 2005-08-29 12:00:38 EDT
Ok added 

allow $1 dns_port_t:tcp_socket name_connect;

to the can_ldap macro, which will allow all domains that use DNS to use eith UDP
or TCP to resolve.

Dan
Comment 21 John McBride 2005-08-30 00:30:15 EDT
Per request on fedora-list:

I have a fully yum-updated FC3 machine with bind setup as caching DNS. This bug
is present in glibc-2.3.5. 

# ping en.wikipedia.org
ping: unknown host en.wikipedia.org

I updated the machine temporarily with :

binutils-2.16.91.0.2-4.i386.rpm
glibc-2.3.90-10.i386.rpm
glibc-2.3.90-10.i686.rpm
glibc-common-2.3.90-10.i386.rpm
glibc-devel-2.3.90-10.i386.rpm
glibc-headers-2.3.90-10.i386.rpm

The machine boots and appears stable, the truncation message from bind is still
present, and ping works correctly (as expected).

Reverting the machine to a glibc-2.3.5 setup and the earlier binutils once again
breaks ping.
Comment 22 Daniel Walsh 2005-09-19 16:20:01 EDT
Fixed in selinux-policy-*-1.27.1-2.1
Comment 23 Charlie Bennett 2006-01-26 12:20:59 EST
can someone please backport the glibc fix into FC4?

thanks
Comment 24 Russell Coker 2006-03-16 09:08:02 EST
The SE Linux issue is resolved, so now it's apparently just a glibc issue. 
Comment 25 Tomasz Kepczynski 2006-03-16 14:18:58 EST
I've got fully updated system on x86_64 (with glibc-2.3.5-10.3 and
selinux-policy-targeted-1.27.1-2.22) and both ping and ssh work fine
on host with lots of nameservers configured (as described in #6) and
for which dig reports retry in TCP.
I believe this bug can be closed now.
Comment 26 Charlie Bennett 2006-03-28 10:48:37 EST
What was the upstream bug number for the GLIBC side of this bug?  I see a big
list of upstream BZ numbers in the 2.3.6-1 rev.  Is this fix one of them?

Thanks,
ccb
Comment 27 Bill Nottingham 2006-09-21 22:07:36 EDT
Closing bugs in MODIFIED state from prior Fedora releases. If this bug persists
in a current Fedora release (such as Fedora Core 5 or later), please reopen and
set the version appropriately.

Note You need to log in before you can comment on or make changes to this bug.