Bug 47086 - getaddrinfo status=2, SERVFAIL, on some addresses over time until named is restarted
Summary: getaddrinfo status=2, SERVFAIL, on some addresses over time until named is re...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: bind
Version: 7.1
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Bernhard Rosenkraenzer
QA Contact: David Lawrence
URL: mail.wtrlo1.ia.home.com
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-07-03 04:58 UTC by Donald Whisnant
Modified: 2007-04-18 16:34 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-07-03 17:01:31 UTC
Embargoed:


Attachments (Terms of Use)
Debug output of a good resolve (1.01 KB, text/plain)
2001-07-20 15:53 UTC, Daniel Resare
no flags Details
Debug output of a broken resolve (499 bytes, text/plain)
2001-07-20 15:54 UTC, Daniel Resare
no flags Details

Description Donald Whisnant 2001-07-03 04:58:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2smp i686)

Description of problem:
Everything is configured correctly and running fine.  However, after a
while running (anywhere from 8-10 hours to 3 or 4 days).  Named stops
resolving the domain name for the mail server of my ISP.  getaddrinfo
returns status code 2, nslookup returns SERVFAIL, and fetchmail stops
retreiving mail.  However, other addresses continue to resolve correctly
(so named is still running correctly).

To fix the problem, you must stop and restart named.  Once you do, it
starts resolving the name correctly again.

It is as if the ISP's DNS server hickups (which does happen quite
regularly) and then named caches this bad state and never bothers
rechecking their server.

This is new to bind 9.1.0, as the older versions 8.7 -  8.9 of the older
releases never had this problem.



How reproducible:
Always

Steps to Reproduce:
1.Start named
2.Start a program that resolves a DNS name at a regular interval (say every
5 minutes) such as fetchmail.
3.Let it run until their DNS hickups (URL in this form is prime example) --
Anywhere from 8-10 hours to 4 or 5 days.
4.Note status=2 from getaddrinfo and SERVFAIL from nslookup and similar
5.Restart named
6.Problem goes away

	

Actual Results:  With certain domain names, named stops resolving the name
until it is restarted.  Restarting named resets it and causes it to start
resolving correctly again.

Expected Results:  Named should not get stuck with a bad-state for a domain
name in its cache (which is what it is acting like is happening).

Additional info:

bind-9.1.0-10

nslookup output (when it stops working):
** server can't find mail.wtrlo1.ia.home.com.: SERVFAIL

fetchmail output (when it stops working):
fetchmail: fetchmail: getaddrinfo(mail.wtrlo1.ia.home.com.pop3)
fetchmail: Query status=2 (SOCKET)

Comment 1 Bernhard Rosenkraenzer 2001-07-03 17:01:28 UTC
I can't reproduce this anywhere.
Please check if this still happens with 9.1.3-0.rc2.2 (from rawhide) and let 
me know if this fixes the problem for you.


Comment 2 Donald Whisnant 2001-07-05 15:25:55 UTC
Bind package 9.1.3-0.rc2.2 from rawhide appears to have solved the problem.  Since upgrading, I've encountered one getaddrinfo status=2 error (which 
I'm sure is problems with their server), but on  the next lookup attempt it worked correctly and didn't get stuck with the error like it did with 9.1.0-10.

Thanks.


Comment 3 Daniel Resare 2001-07-20 15:51:48 UTC
I got bitten by this bug and at least for my setup the problem is so big that I
think an errata is needed. I've tried to debug the problme further, but bind is
not very debug friendly and since the I cannot reproduce the problem until after
some hours after restart debugging is really difficult. I've managed to get some
logging output (resolve category, severity 3) from named which I will attach

Comment 4 Daniel Resare 2001-07-20 15:53:20 UTC
Created attachment 24312 [details]
Debug output of a good resolve

Comment 5 Daniel Resare 2001-07-20 15:54:30 UTC
Created attachment 24313 [details]
Debug output of a broken resolve


Note You need to log in before you can comment on or make changes to this bug.