Bug 15100 - getaddrinfo() error returns TIMEOUTs as NOTKNOWN
getaddrinfo() error returns TIMEOUTs as NOTKNOWN
Status: CLOSED CURRENTRELEASE
Product: Red Hat Raw Hide
Classification: Retired
Component: glibc (Show other bugs)
1.0
All Linux
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-08-02 05:21 EDT by matti aarnio
Modified: 2016-11-24 10:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-02-21 20:30:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
testharness to show how getaddrinfo() mistreats lookup timeouts (3.47 KB, text/plain)
2003-06-12 04:14 EDT, matti aarnio
no flags Details

  None (edit)
Description matti aarnio 2000-08-02 05:21:15 EDT
File:  sysdeps/posix/getaddrinfo.c

This is *old* bug, but affects only people who use IPv6 APIs.

At  getaddrinfo()  function as used at glibc 2.1. and latter makes
grave omission at processing error statuses returned by resolver 
subfunctions.

Specifically the  gethosts()  macroes, and __gethostbyaddr_r() call
are not followed by carefull analysis of  herrno  values.
The  gethosts()  macroes are surrounded with code ASSUMING that
the only error status worth mentioning is  EAI_NODATA, and others
will (with AF_UNSPEC query) be definitely  EAI_NONAME level errors.

I use following kind of mappings in these cases (I don't have copyleft
assignments stored to FSF so I can't give you patch..)

	  switch (h_errno) {
	  case NETDB_INTERNAL:
	    return EAI_SYSTEM;
	  case HOST_NOT_FOUND:
	    return EAI_NONAME;
	  case NO_RECOVERY:
	    return EAI_FAIL;
          case TRY_AGAIN:
            /* Either bail-out, or set a flag
               depending on are there more address-
               families to query for afterwards.. */
            got_tryagain = 1;
            break;
	  case NO_DATA:
	  default:
	    break;
	  }
	/* continue queries and yield errors.
           it is IMPORTANT to yield  EAI_AGAIN in case
           we have gotten  "got_tryagain"  set!  */


The end result of the lack of proper timeout code processing is
that  getaddrinfo()  will yield error  EAI_NONAME (-2) in cases
where it MUST yield  EAI_AGAIN (-3).   My application environment
(ZMailer MTA) will then treat the result as if there really was
DNS responsecode  NXDOMAIN  for the query, and not simply as
"well, it timed out, lets return latter"..


/Matti Aarnio <matti.aarnio@zmailer.org>
Comment 1 Jakub Jelinek 2000-08-25 09:33:04 EDT
Ulrich Drepper added EAI_AGAIN into getaddrinfo in glibc-2.1.92.
Please have a look if his solution is sufficient.
Comment 2 matti aarnio 2000-08-25 09:49:33 EDT
Per the test code I sent to Ulrich, he is getting correct results from zone
which is guaranteed to timeout the request.

Without seeing the new source, following is just indirect  deduction, but the
code is now handling resolver status codes  HOST_NOT_FOUND, TRY_AGAIN and
NO_DATA (plus NETDB_INTERNAL to some degree), while defaulting the result to
appear as  EAI_NONAME  in case an unhandled error occurs -- but then, 
NO_RECOVERY and NETDB_INTERNAL are not very easy to handle...  at least not easy
to simulate.

I really would prefer to get appropriate returns for NO_RECOVERY and
NETDB_INTERNAL, but they are not as important every day things as TRY_AGAIN.

... but of course they are easy to simulate,  we just overload 
__gethostbyname_r_()  with out own
version which keeps returning that error condition.
Comment 3 matti aarnio 2003-06-11 13:51:15 EDT
This has been fixed in the past, and broken again.
This applies to    glibc-2.3.2-48

Attached you can find test-harness runs with broken glibc code, and
a bit further down, correct with ZMailer embedded replacement functions
for these incorrect ones_


$ /opt/mail/bin/getmxrr-test.glibc timeout-mx.zmailer.org 
ZMAILER GETMXRR() TEST HARNESS
DNS lookup reply: len=230 rcode=0 qdcount=1 ancount=1 nscount=4 arcount=4 RD=1
TC=0 AA=0 QR=1 RA=1
 -> (3046s) MX[0] pref=0 host=timeout-zone.zmailer.org
  mx[0] mxtype=--(0) host='timeout-zone.zmailer.org'
  getaddrinfo('timeout-zone.zmailer.org','0') (PF_INET) -> r=-2 (Name or service
not known), ai=(nil)
  getaddrinfo('timeout-zone.zmailer.org','0') (PF_INET6) -> r=-2 (Name or
service not known), ai=(nil)
  getmxrr('timeout-mx.zmailer.org') -> nmx=1, maxpref=66000, realname=''
GETMXRR() rc=69 EX_UNAVAILABLE; mxcount=0
NO SUCCESSFULLY COLLECTED MX DATA, LOOKING FOR A/AAAA DATA:
...... (didn't get any A/AAAA, as is proper) .....

$ /opt/mail/bin/getmxrr-test.static timeout-mx.zmailer.org
ZMAILER GETMXRR() TEST HARNESS
DNS lookup reply: len=230 rcode=0 qdcount=1 ancount=1 nscount=4 arcount=4 RD=1
TC=0 AA=0 QR=1 RA=1
 -> (2829s) MX[0] pref=0 host=timeout-zone.zmailer.org
  mx[0] mxtype=--(0) host='timeout-zone.zmailer.org'
  getaddrinfo('timeout-zone.zmailer.org','0') (PF_INET) -> r=-3 (Temporary
failure in name resolution), ai=(nil)
  getaddrinfo('timeout-zone.zmailer.org','0') (PF_INET6) -> r=-3 (Temporary
failure in name resolution), ai=(nil)
  getmxrr('timeout-mx.zmailer.org') -> nmx=1, maxpref=66000, realname=''
GETMXRR() rc=100 EX_DEFERALL; mxcount=0
NO SUCCESSFULLY COLLECTED MX DATA, LOOKING FOR A/AAAA DATA:
..... (didn't get any A/AAAA, as is proper) .....


This "tiny" difference did bite at  vger.kernel.org  as spurious email
rejects, which got fixed very quickly by using ZMailer's own version of
these functions.   VGER runs with glibc-2.2.4-19.3,  which does have
this same bug present.
Comment 4 Ulrich Drepper 2003-06-12 03:03:32 EDT
I cannot make out what you claim isn't working.  Attach a test program and
explain the expected behavior or else nothing will happen.
Comment 5 matti aarnio 2003-06-12 04:14:23 EDT
Created attachment 92353 [details]
testharness to show how getaddrinfo() mistreats lookup timeouts

compile simply:
  gcc -o getaddrinfo-test getaddrinfo-test.c
and run as:
  ./getaddrinfo-test
Comment 6 Ulrich Drepper 2003-06-12 18:21:04 EDT
I've a patch in the current CVS archive which fixes the problem.  The code was
basically there, just one little bug in the PF_INET handling and a big problem
in the _r lookup functions which went unnoticed.  The patch will appear in one
of the next rawhide glibcs.
Comment 7 Sami Farin 2005-02-17 11:32:18 EST
currently, in glibc-2.3.4-10:
sysdeps/posix/getaddrinfo.c:gaih_inet()
simple_again:
  while (1)
    {
      rc = __gethostbyname2_r (name, family, &th, tmpbuf, tmpbuflen, &h, &herrno);
   ...

now, it would work better if gethostbyname2_r returned != 0 on error,
but it returns also 0 on _success_.
either that loop in getaddrinfo should get fixed or gethostbyname* should get fixed.

also, when name parameter is "fi", gethostbyname2_r gives
herrno=1 (HOST_NO_FOUND), but it should be 4 (NO_DATA).
for "foo.invalid" it correctly returns 1 for herrno, and also
correctly 2 for tmperror.safari.iki.fi.

Comment 8 Ulrich Drepper 2005-02-21 20:30:29 EST
What is this comment #7 about?  I cannot have anything to do with the original
report.  If you have a bug to report, open a new BZ.  I'll ignore comment #7
since it is completely without context.

Note You need to log in before you can comment on or make changes to this bug.