Bug 51992

Summary: named dies unexpectedly
Product: [Retired] Red Hat Public Beta Reporter: Jim Morton <jamorton>
Component: bindAssignee: Bernhard Rosenkraenzer <bero>
Status: CLOSED NOTABUG QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: roswell   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-08-17 20:11:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jim Morton 2001-08-17 20:11:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.6-3.1 i686)

Description of problem:
named, running as a caching server with no local domains crashes after
operating correctly for a period of time.   Named does write in messages
when it dies. ( see additional information )

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.restart named  with /etc/rc.d/init.d/named start  ( as root )
2.use internet and wait
3.
	

Actual Results:  Named starts properly,  and works for a while -  minutes
to hours.

Expected Results:  Named starts properly,  and keeps on ticking.

Additional info:

Here is /var/log/messages  for an entire named run-sequence.
>>>>>>>First,  I manually re-start named  in the morning :

Aug 17 09:57:22 roswell named[1388]: starting BIND 9.1.3 -u named
Aug 17 09:57:22 roswell named[1388]: using 1 CPU
Aug 17 09:57:22 roswell named: named startup succeeded
Aug 17 09:57:23 roswell named[1391]: loading configuration from
'/etc/named.conf'
Aug 17 09:57:23 roswell named[1391]: the default for the 'auth-nxdomain'
option is now 'no'
Aug 17 09:57:23 roswell named[1391]: no IPv6 interfaces found
Aug 17 09:57:23 roswell named[1391]: listening on IPv4 interface lo,
127.0.0.1#53
Aug 17 09:57:23 roswell named[1391]: listening on IPv4 interface eth0,
63.78.198.18#53
Aug 17 09:57:23 roswell named[1391]: command channel listening on
127.0.0.1#953
Aug 17 09:57:23 roswell named[1391]: running

>>> I idle the system for a while,  it seems to power down:

Aug 17 10:53:31 roswell network: Shutting down interface eth0:  succeeded
Aug 17 10:53:33 roswell apmd[634]: System Standby
Aug 17 10:57:23 roswell named[1391]: ifiter_ioctl.c:214: REQUIRE(iter->pos
< (unsigned int) iter->ifc.ifc_len) failed
Aug 17 10:57:23 roswell named[1391]: exiting (due to assertion failure)


>>>>> it is interesting that the time jumps!  Perhaps this is bios time
???   Anyway,  I think this is where I woke the system up.


Aug 17 05:44:19 roswell sysctl: net.ipv4.ip_forward = 0
Aug 17 05:44:19 roswell sysctl: net.ipv4.conf.all.rp_filter = 1
Aug 17 05:44:19 roswell sysctl: kernel.sysrq = 0
Aug 17 05:44:19 roswell network: Setting network parameters:  succeeded
Aug 17 05:44:20 roswell network: Bringing up interface lo:  succeeded
Aug 17 05:44:20 roswell kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Aug 17 05:44:20 roswell kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17
Modified by Andrey V. Savochkin <saw.com.sg> and others
Aug 17 05:44:20 roswell kernel: PCI: Found IRQ 11 for device 00:12.0
Aug 17 05:44:20 roswell kernel: eth0: OEM i82557/i82558 10/100 Ethernet,
00:08:C7:B9:9C:CC, IRQ 11.
Aug 17 05:44:20 roswell kernel:   Receiver lock-up bug exists -- enabling
work-around.
Aug 17 05:44:20 roswell kernel:   Board assembly 702536-006, Physical
connectors present: RJ45
Aug 17 05:44:20 roswell kernel:   Primary interface chip i82555 PHY #1.
Aug 17 05:44:20 roswell kernel:   General self-test: passed.
Aug 17 05:44:20 roswell kernel:   Serial sub-system self-test: passed.
Aug 17 05:44:20 roswell kernel:   Internal registers self-test: passed.
Aug 17 05:44:20 roswell kernel:   ROM checksum self-test: passed
(0x24c9f043).
Aug 17 05:44:20 roswell kernel:   Receiver lock-up workaround activated.
Aug 17 05:44:22 roswell network: Bringing up interface eth0:  succeeded
Aug 17 05:44:23 roswell netfs: Mounting other filesystems:  succeeded
Aug 17 12:44:24 roswell apmd[634]: Standby Resume after 01:50:51 (-1%
unknown) AC power

>>>>>>  And, I noticed bind dead,  and re-started it (again)

Aug 17 12:45:32 roswell named[1993]: starting BIND 9.1.3 -u named
Aug 17 12:45:32 roswell named[1993]: using 1 CPU
Aug 17 12:45:32 roswell named[1995]: loading configuration from
'/etc/named.conf'
Aug 17 12:45:32 roswell named[1995]: the default for the 'auth-nxdomain'
option is now 'no'
Aug 17 12:45:32 roswell named[1995]: no IPv6 interfaces found
Aug 17 12:45:32 roswell named[1995]: listening on IPv4 interface lo,
127.0.0.1#53
Aug 17 12:45:32 roswell named[1995]: listening on IPv4 interface eth0,
63.78.198.18#53
Aug 17 12:45:32 roswell named[1995]: command channel listening on
127.0.0.1#953
Aug 17 12:45:32 roswell named[1995]: running
Aug 17 12:45:33 roswell named: named startup succeeded

It could be related to APMD -  however the computer obviously can operate
in power-down state -  it makes log entries etc.    I will try to turn off
apmd  and keep the system from shutting down ( or whatever its called ) 
and see if named stops.      

Note:  I am not quite so deranged as to run a production nameserver on a
desktop with APMD toying with the system,  but  this is a beta machine and
for various obscure reasons  I do want it to run its own nameserver for its
own use.  I do not think that merely ODD behaviour on the part of a
sysadmin should be precluded by system implementation!   ;^)

Comment 1 Bernhard Rosenkraenzer 2001-08-18 11:18:41 UTC
This is almost certainly a configuration issue: APM can shut down networking 
when going into suspend mode (needed for some ultimately crappy notebook 
bioses). If you're doing that while named is running, that's the problem.