Bug 331111 - named: *** POKED TIMER *** - Then server crashes.
named: *** POKED TIMER *** - Then server crashes.
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: bind (Show other bugs)
5.0
i386 Linux
low Severity urgent
: ---
: ---
Assigned To: Adam Tkac
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-13 21:48 EDT by Chris Kloiber
Modified: 2013-04-30 19:37 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-19 05:06:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Chris Kloiber 2007-10-13 21:48:05 EDT
Description of problem:
Newly provisioned RHEL5 system running named configured for localhost-only
recursion and official DNS for ~5 zones. Server is hanging several times a day,
with many references to "named: *** POKED TIMER ***" in /var/log/messages.

Google searching turns up this error in many freebsd forums, and the symptoms
are that named is coredumping for some reason, and this often results in pegging
the CPU.

Version-Release number of selected component (if applicable):
bind-9.3.3-9.0.1.el5

How reproducible:
Several times a day.

Steps to Reproduce:
1. Configure bind to server a few zones, and allow-recursion { 127.0.0.1; };
2. Wait a few hours under normal load.
3. Screech! Bang! Thud. Dead server, reboot and repeat.

Hardware is an IBM x305 w/1.7Ghz Celeron, 1G ram, 40Gig IDE.

From /var/log/messages:

Oct 13 18:04:03 pebkac named[1602]: *** POKED TIMER ***
Oct 13 18:04:03 pebkac last message repeated 2 times
Oct 13 18:04:03 pebkac named[1602]: lame server resolving
'135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 13 18:04:03 pebkac named[1602]: lame server resolving
'135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 13 18:04:03 pebkac named[1602]: lame server resolving
'135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 13 18:04:03 pebkac named[1602]: lame server resolving
'135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 13 18:06:02 pebkac named[1602]: *** POKED TIMER ***
Oct 13 18:09:03 pebkac named[1602]: *** POKED TIMER ***
Oct 13 20:31:04 pebkac named[1602]: lame server resolving
'74.82.39.162.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 13 20:31:04 pebkac named[1602]: lame server resolving
'74.82.39.162.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 13 20:31:04 pebkac named[1602]: lame server resolving 'ns2.ip-pop.com' (in
'ip-pop.com'?): 66.232.32.10#53
Oct 13 20:31:04 pebkac named[1602]: lame server resolving 'ns1.ip-pop.com' (in
'ip-pop.com'?): 66.232.32.10#53
Oct 13 21:18:30 pebkac syslogd 1.4.1: restart.
Oct 13 21:18:30 pebkac kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 13 21:18:30 pebkac kernel: Linux version 2.6.18-8.1.14.el5
(brewbuilder@hs20-bc2-2.build.redhat.com) (gcc version 4.1.1 20070105 (Red Hat
4.1.1-52)) #1 SMP Tue Sep 25 11:45:53 EDT 2007
Comment 1 Adam Tkac 2007-10-15 09:33:45 EDT
I think crash isn't caused by "*** POKED TIMER ***" related problems. Best
should be get core dump from crash. Also please provide more info about your system:

1. do you have SELinux enabled
2. how many CPUs do you have?

Thanks, Adam
Comment 2 Chris Kloiber 2007-10-17 11:03:34 EDT
Selinux is disabled, I already told you the hardware is an IBM x305 with a
single Celeron 1.7Ghz. The server is in a remote data center, and I can't
netdump thanks to the manner the network is configured in that datacenter. Is
RHEL5 capable of diskdump to a bog standard IDE disk yet? (Assuming its
panicing, I can't tell from here). I could configure the system to allow
cordumps to be created, but if the CPU gets hung I'm not sure if I'll get anything.

I have since worked around the system crashing by scripting a restart of the
named service every 3 hours. That is not really a fix though.

Here is what the freebsd forums (from 2005) indicated this POKED TIMER issue is
often followed by a CPU getting hung. I would like to know if this is a similar
problem.

Comment 3 Adam Tkac 2007-10-17 13:15:39 EDT
(In reply to comment #2)
> 
> I have since worked around the system crashing by scripting a restart of the
> named service every 3 hours. That is not really a fix though.

Yes, We have to find & fix real problem

> Here is what the freebsd forums (from 2005) indicated this POKED TIMER issue is
> often followed by a CPU getting hung. I would like to know if this is a similar
> problem.

Hm, I've read forums and BIND code and I think this is very specific problem.
Especially when ISC haven't got report related to this. Also if you said named
isn't under heavy load it points to some issue with gettimeofday. Are you able
to test modified package? (I'm going improve timers logging, nothing else)

Adam
Comment 4 Chris Kloiber 2007-10-17 16:34:17 EDT
Yes. I can test modified packages. This is my personal email/web server, no
classic "production" load.
Comment 5 Adam Tkac 2007-10-18 13:55:07 EDT
So please test
http://people.redhat.com/atkac/test_srpms/bind-9.3.3-9.0.1.1.el5_0.src.rpm .
This contains improved "poked timer" message which should show if theory about
gettimeofday is correct.
Comment 6 Chris Kloiber 2007-10-18 14:13:25 EDT
Compiling/installing package now, and I'll disable my 3 hour restart of the
named service as well. Thanks.
Comment 7 Chris Kloiber 2007-10-19 13:09:11 EDT
Server locked up, requiring a reboot again, here's output from /var/log/messages:

Oct 19 11:24:02 pebkac xinetd[1623]: EXIT: ftp status=0 pid=17906 duration=0(sec)
Oct 19 11:25:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 19 11:25:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 19 11:25:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 19 11:25:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 19 11:26:06 pebkac named[1513]: lame server resolving
'102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 19 11:26:06 pebkac named[1513]: lame server resolving
'102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 19 11:26:06 pebkac named[1513]: lame server resolving
'102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 19 11:26:06 pebkac named[1513]: lame server resolving
'102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 19 11:26:29 pebkac named[1513]: *** POKED TIMER *** now:
1192807589:94091000, then: 1192807569:565387000, due: 1192807554:565387000
Oct 19 11:26:29 pebkac named[1513]: *** POKED TIMER *** now:
1192807589:94449000, then: 1192807569:565387000, due: 1192807554:565387000
Oct 19 11:26:29 pebkac named[1513]: *** POKED TIMER *** now:
1192807589:94661000, then: 1192807569:565387000, due: 1192807554:565387000
Oct 19 11:26:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 19 11:26:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 19 11:26:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53
Oct 19 11:26:29 pebkac named[1513]: lame server resolving
'215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53
Oct 19 11:27:02 pebkac xinetd[1623]: START: ftp pid=17913 from=66.33.98.93
Oct 19 11:27:02 pebkac proftpd[17913]: p1ea.pebkac.info
(66.33.98.93[66.33.98.93]) - FTP session opened. 
Oct 19 11:27:02 pebkac proftpd[17913]: p1ea.pebkac.info
(66.33.98.93[66.33.98.93]) - FTP session closed. 
Oct 19 11:27:02 pebkac xinetd[1623]: EXIT: ftp status=0 pid=17913 duration=0(sec)
Oct 19 11:30:03 pebkac named[1513]: *** POKED TIMER *** now:
1192807803:763288000, then: 1192807634:95605000, due: 1192807619:95605000
Oct 19 11:30:03 pebkac proftpd[17925]: p1ea.pebkac.info
(66.33.98.93[66.33.98.93]) - FTP session opened. 
Oct 19 11:30:03 pebkac proftpd[17925]: p1ea.pebkac.info
(66.33.98.93[66.33.98.93]) - FTP session closed. 
Oct 19 12:30:11 pebkac syslogd 1.4.1: restart.
Oct 19 12:30:11 pebkac kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 19 12:30:11 pebkac kernel: Linux version 2.6.18-8.1.14.el5
(brewbuilder@hs20-bc2-2.build.redhat.com) (gcc version 4.1.1 20070105 (Red Hat
4.1.1-52)) #1 SMP Tue Sep 25 11:45:53 EDT 2007
Comment 8 Chris Kloiber 2007-10-19 14:58:35 EDT
It's crashed again. I'm re-enabling my work-around for now, if you need more
info please let me know.
Comment 9 Adam Tkac 2007-10-23 09:24:36 EDT
I discussed this problem with upstream. Could you please try create empty zone
for "spamcop.org"? If it helps with CPU consuming try check source of queries
from spamcop.org and try stop them (deny queries from it). Also usefull will be
max-cache-size option which should help with memory consuming (set it in bytes,
for example to one half of your memory). Please report me your experience.

Adam
Comment 10 Chris Kloiber 2007-10-24 12:47:05 EDT
I had given up on this nameserver, it was crashing every hour or so. I moved my
official DNS elsewhere and completely disabled named so the server would stay up.

The source of the queries to bl.spamcop.org is my anti-spam RBL checks. Creating
an empty zone would make those checks not work. I have temporarily disabled the
RBL checks at the mail server for testing, it should have the same effect. I
will re-enable bind and see what happens after disabling RBL checks and setting
the max-cache size as suggested. If it stays up a while, I'll begin moving my
official DNS back to this server. Will keep you informed.
Comment 11 Chris Kloiber 2007-10-30 13:20:18 EDT
I'm no longer seeing the poked timer messages, but the server continues to
become unreachable so long as I am running named.
Comment 12 Adam Tkac 2007-11-16 04:46:04 EST
As we discussed in mail problem will be in old BIOS. Could you please verify if
newer BIOS fixes problem?

Thanks, Adam
Comment 13 Chris Kloiber 2007-11-17 11:32:31 EST
It seems to have worked for me, yet others here have reported their systems are
still crashing. I'm willing to close this issue and let others reopen it if they
have more information.
Comment 14 Adam Tkac 2007-11-19 05:06:00 EST
(In reply to comment #13)
> It seems to have worked for me, yet others here have reported their systems are
> still crashing. I'm willing to close this issue and let others reopen it if they
> have more information.

Thanks for feedback. If anyone has this problem please reopen. Closing for now

Note You need to log in before you can comment on or make changes to this bug.