161181 – resolver fails to handle truncated UDP replies

Bug 161181 - resolver fails to handle truncated UDP replies

Summary: resolver fails to handle truncated UDP replies

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	165802 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-06-21 06:19 UTC by Tomasz Kepczynski
Modified:	2007-11-30 22:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:	FC5
Clone Of:
Environment:
Last Closed:	2006-09-22 02:07:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace -i -s 1024 rdesktop tkepczyx-mobl1.ger.corp.intel.com. (38.56 KB, text/plain) 2005-06-21 06:19 UTC, Tomasz Kepczynski	no flags	Details
strace -i -s 1024 ping tkepczyx-mobl1.ger.corp.intel.com (14.81 KB, text/x-log) 2005-08-29 10:40 UTC, Tomasz Kepczynski	no flags	Details
View All

Description Tomasz Kepczynski 2005-06-21 06:19:24 UTC

Description of problem:

I am trying to connect to remote host:
gklab-59-001:~> rdesktop tkepczyx-mobl1.ger.corp.intel.com.
ERROR: tkepczyx-mobl1.ger.corp.intel.com.: unable to resolve host

nslookup output:
gklab-59-001:~> nslookup  tkepczyx-mobl1.ger.corp.intel.com.
;; Truncated, retrying in TCP mode.
Server:         172.28.168.7
Address:        172.28.168.7#53

Non-authoritative answer:
Name:   tkepczyx-mobl1.ger.corp.intel.com
Address: 172.28.37.68

dig output:
gklab-59-001:~> dig tkepczyx-mobl1.ger.corp.intel.com.
;; Truncated, retrying in TCP mode.

; <<>> DiG 9.3.1 <<>> tkepczyx-mobl1.ger.corp.intel.com.
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17848
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 20, ADDITIONAL: 20

;; QUESTION SECTION:
;tkepczyx-mobl1.ger.corp.intel.com. IN  A

;; ANSWER SECTION:
tkepczyx-mobl1.ger.corp.intel.com. 384 IN A     172.28.37.68

(AUTHORITY section is long and skipped here as I consider this
information sensitive).
I will attach strace.

Version-Release number of selected component (if applicable):
1.4.0, FC4 fully upgraded to latest updates.

How reproducible:
always

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Cannot connect to remote host using its name.

Expected results:
Connect to remote host using its name.

Additional info:
I guess the problem is due to large response which requires
fallback to DNS over TCP and this is not correctly handled.

Comment 1 Tomasz Kepczynski 2005-06-21 06:19:24 UTC

Created attachment 115731 [details]
strace -i -s 1024 rdesktop tkepczyx-mobl1.ger.corp.intel.com.

Comment 2 Tomasz Kepczynski 2005-06-21 13:00:59 UTC

I've just tried simple C++ program:
#include <netdb.h>
#include <cstdio>

char name[] = "tkepczyx-mobl1.ger.corp.intel.com";

int main()
{
  struct hostent *he;
  he = gethostbyname(name);
  printf("hostent: %p\n", he);
  if(he == NULL)
    printf("h_errno: %d\n", h_errno);
  return(0);
}
which also fails with h_errno = TRY_AGAIN, while dig and nslookup
still work. This points to a problem in library, reassigning to glibc.

Comment 3 David Zeuthen 2005-06-21 17:20:40 UTC

Reassigning to glibc maintainer.

Comment 4 Tomasz Kepczynski 2005-06-21 17:47:02 UTC

A few other hints:
- host I am trying to reach has 20 NS records associated with, other
  hosts with fewer NS records work fine (2-3 NS'es)
- the problem did not exist in FC3 (but I am not 100% sure that in
  the mean time there were no changes in DNS)
- I tried telnet and ssh to the same host with similar result

Comment 5 Jakub Jelinek 2005-06-22 08:27:19 UTC

Can you reproduce it with some publicly accessible DNS?

Comment 6 Tomasz Kepczynski 2005-06-22 11:11:26 UTC

No. But setting test zone with one A entry and 20 or so NS entries should do
the trick. I used the following GENERATE statements to save typing:
$GENERATE 1-50 @ NS nameserver${0,2}
$GENERATE 1-50 nameserver${0,2} A 192.168.253.${200}
I actually confirmed the fault with this kind of setup on x86_64 platform.
I also tried CentOS 4 which ships with recompiled from source RHEL's glibc
glibc-2.3.4-2.9 and it works fine.
I can also add that adding the above lines to my usual setup completly
screwed up my nfs client which uses hostnames.

Comment 7 Ulrich Drepper 2005-08-21 16:42:29 UTC

*** Bug 165802 has been marked as a duplicate of this bug. ***

Comment 8 Ulrich Drepper 2005-08-21 23:07:26 UTC

I think I fixed this now upstream.  The next rawhide build will probably have it
(look out for this bug number in the rpm changelog).  Once it is available,
consider trying it.

Comment 9 Jakub Jelinek 2005-08-26 09:04:53 UTC

Yeah, glibc-2.3.90-9 and above should fix this.

Comment 10 Tomasz Kepczynski 2005-08-29 10:13:08 UTC

Sorry guys, I've just tried it on glibc glibc-2.3.90-10 i686 and it does not work.

Comment 11 Jakub Jelinek 2005-08-29 10:23:53 UTC

If you are using nscd, have you flushed nscd cache (i.e. nscd -i hosts)?
Or stop nscd before testing.
Then, please attach a new strace -i s 1024 log.
The one in #1 showed
connect(4, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("172.28.168.7")}, 28) = -1 EINVAL
and similar errors, which is exactly what has been fixed in 2.3.90-9 and above.
Now the last argument to connect in this case will be 16 and it shouldn't fail
with EINVAL.

Comment 12 Tomasz Kepczynski 2005-08-29 10:40:02 UTC

Created attachment 118202 [details]
strace -i -s 1024 ping tkepczyx-mobl1.ger.corp.intel.com

Comment 13 Tomasz Kepczynski 2005-08-29 10:41:43 UTC

Now I am not sure if it is glibc or ping. Telnet and rdesktop somehow works.

Comment 14 Jakub Jelinek 2005-08-29 10:52:44 UTC

In the ping case it might be a SELinux policy issue.  Look at your logs
for audit messages.

Comment 15 Tomasz Kepczynski 2005-08-29 10:56:28 UTC

I guess this may be it:
type=AVC msg=audit(1125311948.892:16533349): avc:  denied  { name_connect } for
 pid=4163 comm="ping" dest=53 scontext=user_u:system_r:ping_t tcontext=system_u:
object_r:dns_port_t tclass=tcp_socket
type=SYSCALL msg=audit(1125311948.892:16533349): arch=40000003 syscall=102 succe
ss=no exit=-13 a0=3 a1=bfed214c a2=b35ff4 a3=b7fb1690 items=0 pid=4163 auid=4327
0 uid=43270 gid=32602 euid=43270 suid=43270 fsuid=43270 egid=32602 sgid=32602 fs
gid=32602 comm="ping" exe="/bin/ping"

Comment 16 Jakub Jelinek 2005-08-29 10:59:56 UTC

Then the glibc bug is fixed.  Whether this is a bug in selinux policy
or whether use of nscd in this case is mandatory is something I'll leave
to the selinux maintainers to decide.

Comment 17 Daniel Walsh 2005-08-29 14:06:40 UTC

Why would ping be trying to tcp connect to port 53?

Comment 18 Tomasz Kepczynski 2005-08-29 14:10:36 UTC

In case UDP resolver query fails it is retried using TCP. And this was a case -
UDP query returned so called "truncated" result (i.e. more data then UDP datagram
can contain) and query in this case was retried and denied by SELinux.
This probably happend "behind the scenes" in resolver library.

Comment 20 Daniel Walsh 2005-08-29 16:00:38 UTC

Ok added 

allow $1 dns_port_t:tcp_socket name_connect;

to the can_ldap macro, which will allow all domains that use DNS to use eith UDP
or TCP to resolve.

Dan

Comment 21 John McBride 2005-08-30 04:30:15 UTC

Per request on fedora-list:

I have a fully yum-updated FC3 machine with bind setup as caching DNS. This bug
is present in glibc-2.3.5. 

# ping en.wikipedia.org
ping: unknown host en.wikipedia.org

I updated the machine temporarily with :

binutils-2.16.91.0.2-4.i386.rpm
glibc-2.3.90-10.i386.rpm
glibc-2.3.90-10.i686.rpm
glibc-common-2.3.90-10.i386.rpm
glibc-devel-2.3.90-10.i386.rpm
glibc-headers-2.3.90-10.i386.rpm

The machine boots and appears stable, the truncation message from bind is still
present, and ping works correctly (as expected).

Reverting the machine to a glibc-2.3.5 setup and the earlier binutils once again
breaks ping.

Comment 22 Daniel Walsh 2005-09-19 20:20:01 UTC

Fixed in selinux-policy-*-1.27.1-2.1

Comment 23 Charlie Bennett 2006-01-26 17:20:59 UTC

can someone please backport the glibc fix into FC4?

thanks

Comment 24 Russell Coker 2006-03-16 14:08:02 UTC

The SE Linux issue is resolved, so now it's apparently just a glibc issue.

Comment 25 Tomasz Kepczynski 2006-03-16 19:18:58 UTC

I've got fully updated system on x86_64 (with glibc-2.3.5-10.3 and
selinux-policy-targeted-1.27.1-2.22) and both ping and ssh work fine
on host with lots of nameservers configured (as described in #6) and
for which dig reports retry in TCP.
I believe this bug can be closed now.

Comment 26 Charlie Bennett 2006-03-28 15:48:37 UTC

What was the upstream bug number for the GLIBC side of this bug?  I see a big
list of upstream BZ numbers in the 2.3.6-1 rev.  Is this fix one of them?

Thanks,
ccb

Comment 27 Bill Nottingham 2006-09-22 02:07:36 UTC

Closing bugs in MODIFIED state from prior Fedora releases. If this bug persists
in a current Fedora release (such as Fedora Core 5 or later), please reopen and
set the version appropriately.

Note You need to log in before you can comment on or make changes to this bug.