Bug 88526

Summary: bind dies without error couple of times per week
Product: [Retired] Red Hat Linux Reporter: Toni Willberg <toniw>
Component: bindAssignee: Daniel Walsh <dwalsh>
Status: CLOSED CURRENTRELEASE QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 6.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-04-11 14:54:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Toni Willberg 2003-04-10 17:40:03 UTC
Description of problem:

 Bind running in this box dies without any
 error in /var/log/messages couple of times per week.

System:
 bind-9.2.1-0.6x.3
 kernel-2.2.22-6.2.3
 glibc-2.1.3-29
 redhat-release-6.2-1

The server is RH6.2 with all official upgrade rpms and it's a public DNS server
for dozens of zones. The load of the server is not very high, but it's not just
idling either. I have other server with all the same rpms installed, and bind
never crases on that one, and both servers serve the same zones.

I'm about to upgrade the server to more recent RH release, but as RH6.2 is still
widely used, and if this is a "real" bug, IMHO it should be tracked down and fixed.

I'd like to know if there are better ways to hunt this bug. I ran named process
in strace with following command:
  strace -t -q -T -F -f named -g -s -u named -d 999

Here's clip of last lines of stracing:
>>>
Apr 09 22:56:18.131 fctx 0x8231350: try
Apr 09 22:56:18.132 fctx 0x8231350: cancelqueries
Apr 09 22:56:18.132 fctx 0x8231350: getaddresses
Apr 09 22:56:18.134 expire_v4 set to MIN(2147483647,1050004578) import_rdataset
Apr 09 22:56:18.134 dns_adb_createfind: found A for name 0x81ebc90 in db
Apr 09 22:56:18.135 expire_v4 set to MIN(2147483647,1050004578) import_rdataset
Apr 09 22:56:18.136 dns_adb_createfind: found A for name 0x81ebbd8 in db
Apr 09 22:56:18.137 expire_v4 set to MIN(2147483647,1050004578) import_rdataset
Apr 09 22:56:18.138 dns_adb_createfind: found A for name 0x821aa80 in db
Apr 09 22:56:18.139 expire_v4 set to MIN(2147483647,1050004578) import_rdataset
Apr 09 22:56:18.139 dns_adb_createfind: found A for name 0x821a9c8 in db
Apr 09 22:56:18.140 fctx 0x8231350: query
Apr 09 22:56:18.141 resquery 0x821f038 (fctx 0x8231350): send
Apr 09 22:56:18.141 dispatch 0x80cbac8 response 0x811c890 205.152.16.20#53:
attached to task 0x80cc828
Apr 09 22:56:18.143 resquery 0x821f038 (fctx 0x8231350): sent
Apr 09 22:56:18.143 resquery 0x821f038 (fctx 0x8231350): senddone
Apr 09 22:56:18.300 socket 0x80cbc80: dispatch_recv:  event 0x81e4ed8 -> task
0x80cbd58
Apr 09 22:56:18.300 socket 0x80cbc80: internal_recv: task 0x80cbd58 got event
0x80cbcd4
Apr 09 22:56:18.301 socket 0x80cbc80 205.152.16.20#53: packet received correctly
Apr 09 22:56:18.302 dispatch 0x80cbac8: got packet: requests 1, buffers 1, recvs 1
Apr 09 22:56:18.302 dispatch 0x80cbac8: got valid DNS message header, /QR 1, id
30437
Apr 09 22:56:18.303 dispatch 0x80cbac8: search for response in bucket 61: found
Apr 09 22:56:18.303 dispatch 0x80cbac8 response 0x811c890 205.152.16.20#53: [a]
Sent event 0x810ed38 buffer 0x8230348 len 4096 to task 0x80cc828
Apr 09 22:56:18.304 sockmgr 0x808b548: watcher got message -3
Apr 09 22:56:18.305 sockmgr 0x808b548: watcher got message -2
Apr 09 22:56:18.305 socket 0x80cbc80: socket_recv: event 0x82155b8 -> task 0x80cbd58
Apr 09 22:56:18.306 resquery 0x821f038 (fctx 0x8231350): response
 <unfinished ...>
22:56:18 --- SIGSEGV (Segmentation fault) ---
22:56:18 +++ killed by SIGSEGV +++
<<<

Comment 1 Daniel Walsh 2003-04-11 14:54:50 UTC
Have you tried the current release of bind, to see if this still happens
bind-9.2.2-*

Dan