Bug 102428

Summary: bind locks up after a day or so
Product: [Retired] Red Hat Linux Reporter: P Fudd <ofudd>
Component: bindAssignee: Jason Vas Dias <jvdias>
Status: CLOSED WORKSFORME QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: jvdias
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-03 17:42:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description P Fudd 2003-08-14 23:18:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
Named stops responding to queries after two days.  The server in question is
being used as a nat firewall for a single laptop computer, prior to installation
in an internet cafe.  This happened on another server with the same
configuration, but I didn't submit a bug at that time, as I didn't think it was
reproducible.  I don't think that anymore.

Version-Release number of selected component (if applicable):
bind-9.2.1-16

How reproducible:
Sometimes

Steps to Reproduce:
1. service named start
2. sleep 172800
2.5.  surf the net like a demon
3. nslookup www.google.com localhost
    

Actual Results:  
;; connection timed out; no servers could be reached


Expected Results:  
Non-authoritative answer:
Name:   www.google.com
Address: 216.239.37.99


Additional info:

Redhat 9.0, with updates, with vanilla linux kernel 2.4.21.
rpm --verify bind says only /etc/rndc.key has changed.
rpm --verify caching-nameserver says nothing has changed.
The last syslog entry from named was a normal 'lame server resolving' message.

When I use strace or ltrace on the process, it doesn't print anything, which is
weird.

ps auxww | grep named:
named     1068  0.0  0.0     0    0 ?        Z    Jul27   0:00 [named <defunct>]
named     1064  0.0  1.7 11668 2208 ?        S    Jul27   0:00 /usr/sbin/named
-u named

pstack 1064:
1064: /usr/sbin/named -u named
(No symbols found)
0x402f86a8: ???? (bffffd20, 402512a0, 0, 40238696, bffffc9c, 40267838) + 120
0x40251824: ???? (3, bffffe34, 8084940, 804fc31, 4000c660, bffffe34) + 10
0x0805d865: ???? (3, bffffe34, bffffe44, 4001582c, 3, 8051ff0)
0x402e5917: ???? (805d780, 3, bffffe34, 807c474, 807c4a4, 4000c660) + 400001d8

lsof prints:
COMMAND  PID  USER   FD   TYPE     DEVICE    SIZE    NODE NAME
named   1064 named  cwd    DIR        3,3    4096 3597441 /var/named
named   1064 named  rtd    DIR        3,3    4096       2 /
named   1064 named  txt    REG        3,3  252928 4873142 /usr/sbin/named
named   1064 named  mem    REG        3,3  103044 2502027 /lib/ld-2.3.2.so
named   1064 named  mem    REG        3,3  263828 3957281 /usr/lib/liblwres.so.1.1.0
named   1064 named  mem    REG        3,3 2679858 3957277 /usr/lib/libdns.so.5.3.0
named   1064 named  mem    REG        3,3  968956 2502091 /lib/libcrypto.so.0.9.7a
named   1064 named  mem    REG        3,3  158083 3957401
/usr/lib/libisccfg.so.0.0.3
named   1064 named  mem    REG        3,3  116506 3957399 /usr/lib/libisccc.so.0.0.1
named   1064 named  mem    REG        3,3  763556 3957279 /usr/lib/libisc.so.4.1.0
named   1064 named  mem    REG        3,3   91604 2502042 /lib/libnsl-2.3.2.so
named   1064 named  mem    REG        3,3  103104 2502056 /lib/libpthread-0.10.so
named   1064 named  mem    REG        3,3 1549556 2502034 /lib/libc-2.3.2.so
named   1064 named  mem    REG        3,3   73756 4627704
/usr/kerberos/lib/libgssapi_krb5.so.2.2
named   1064 named  mem    REG        3,3  385220 4627720
/usr/kerberos/lib/libkrb5.so.3.1
named   1064 named  mem    REG        3,3   63880 4627710
/usr/kerberos/lib/libk5crypto.so.3.0
named   1064 named  mem    REG        3,3    5572 4627698
/usr/kerberos/lib/libcom_err.so.3.0
named   1064 named  mem    REG        3,3   15084 2502038 /lib/libdl-2.3.2.so
named   1064 named  mem    REG        3,3   52616 3957254 /usr/lib/libz.so.1.1.4
named   1064 named  mem    REG        3,3   52472 2502048 /lib/libnss_files-2.3.2.so
named   1064 named    0u   CHR        1,3           98114 /dev/null
named   1064 named    1u   CHR        1,3           98114 /dev/null
named   1064 named    2u   CHR        1,3           98114 /dev/null
named   1064 named    3u  unix 0xc7291b40            1358 socket
named   1064 named    4r  FIFO        0,5            1364 pipe
named   1064 named    5w  FIFO        0,5            1364 pipe
named   1064 named    6r  FIFO        0,5            1365 pipe
named   1064 named    7w  FIFO        0,5            1365 pipe
named   1064 named    8u  IPv4       1379             UDP *:1024 
named   1064 named    9u  IPv4       1373             UDP
localhost.localdomain:domain 
named   1064 named   10u  IPv4       1374             TCP
localhost.localdomain:domain (LISTEN)
named   1064 named   11u  IPv4       1375             UDP cafe2.nomad.zone:domain 
named   1064 named   12u  IPv4       1376             TCP
cafe2.nomad.zone:domain (LISTEN)
named   1064 named   13u  IPv4       1377             UDP 10.10.11.1:domain 
named   1064 named   14u  IPv4       1378             TCP 10.10.11.1:domain (LISTEN)
named   1064 named   15u  IPv4       1380             TCP
localhost.localdomain:rndc (LISTEN)
named   1064 named   16r   CHR        1,8          100055 /dev/random

Comment 1 Daniel Walsh 2004-03-25 15:39:49 UTC
Could you try one of the later binds from Rawhide?

Comment 2 Jason Vas Dias 2004-08-30 22:37:36 UTC
Cannot be reproduced with bind-9.2.4rc7-10.