Bug 331111
| Summary: | named: *** POKED TIMER *** - Then server crashes. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Chris Kloiber <ckloiber> |
| Component: | bind | Assignee: | Adam Tkac <atkac> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | urgent | Docs Contact: | |
| Priority: | low | ||
| Version: | 5.0 | CC: | ovasik |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-11-19 10:06:00 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I think crash isn't caused by "*** POKED TIMER ***" related problems. Best should be get core dump from crash. Also please provide more info about your system: 1. do you have SELinux enabled 2. how many CPUs do you have? Thanks, Adam Selinux is disabled, I already told you the hardware is an IBM x305 with a single Celeron 1.7Ghz. The server is in a remote data center, and I can't netdump thanks to the manner the network is configured in that datacenter. Is RHEL5 capable of diskdump to a bog standard IDE disk yet? (Assuming its panicing, I can't tell from here). I could configure the system to allow cordumps to be created, but if the CPU gets hung I'm not sure if I'll get anything. I have since worked around the system crashing by scripting a restart of the named service every 3 hours. That is not really a fix though. Here is what the freebsd forums (from 2005) indicated this POKED TIMER issue is often followed by a CPU getting hung. I would like to know if this is a similar problem. (In reply to comment #2) > > I have since worked around the system crashing by scripting a restart of the > named service every 3 hours. That is not really a fix though. Yes, We have to find & fix real problem > Here is what the freebsd forums (from 2005) indicated this POKED TIMER issue is > often followed by a CPU getting hung. I would like to know if this is a similar > problem. Hm, I've read forums and BIND code and I think this is very specific problem. Especially when ISC haven't got report related to this. Also if you said named isn't under heavy load it points to some issue with gettimeofday. Are you able to test modified package? (I'm going improve timers logging, nothing else) Adam Yes. I can test modified packages. This is my personal email/web server, no classic "production" load. So please test http://people.redhat.com/atkac/test_srpms/bind-9.3.3-9.0.1.1.el5_0.src.rpm . This contains improved "poked timer" message which should show if theory about gettimeofday is correct. Compiling/installing package now, and I'll disable my 3 hour restart of the named service as well. Thanks. Server locked up, requiring a reboot again, here's output from /var/log/messages: Oct 19 11:24:02 pebkac xinetd[1623]: EXIT: ftp status=0 pid=17906 duration=0(sec) Oct 19 11:25:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 19 11:25:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 19 11:25:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 19 11:25:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 19 11:26:06 pebkac named[1513]: lame server resolving '102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 19 11:26:06 pebkac named[1513]: lame server resolving '102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 19 11:26:06 pebkac named[1513]: lame server resolving '102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 19 11:26:06 pebkac named[1513]: lame server resolving '102.174.122.84.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 19 11:26:29 pebkac named[1513]: *** POKED TIMER *** now: 1192807589:94091000, then: 1192807569:565387000, due: 1192807554:565387000 Oct 19 11:26:29 pebkac named[1513]: *** POKED TIMER *** now: 1192807589:94449000, then: 1192807569:565387000, due: 1192807554:565387000 Oct 19 11:26:29 pebkac named[1513]: *** POKED TIMER *** now: 1192807589:94661000, then: 1192807569:565387000, due: 1192807554:565387000 Oct 19 11:26:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 19 11:26:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 19 11:26:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 19 11:26:29 pebkac named[1513]: lame server resolving '215.112.205.41.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 19 11:27:02 pebkac xinetd[1623]: START: ftp pid=17913 from=66.33.98.93 Oct 19 11:27:02 pebkac proftpd[17913]: p1ea.pebkac.info (66.33.98.93[66.33.98.93]) - FTP session opened. Oct 19 11:27:02 pebkac proftpd[17913]: p1ea.pebkac.info (66.33.98.93[66.33.98.93]) - FTP session closed. Oct 19 11:27:02 pebkac xinetd[1623]: EXIT: ftp status=0 pid=17913 duration=0(sec) Oct 19 11:30:03 pebkac named[1513]: *** POKED TIMER *** now: 1192807803:763288000, then: 1192807634:95605000, due: 1192807619:95605000 Oct 19 11:30:03 pebkac proftpd[17925]: p1ea.pebkac.info (66.33.98.93[66.33.98.93]) - FTP session opened. Oct 19 11:30:03 pebkac proftpd[17925]: p1ea.pebkac.info (66.33.98.93[66.33.98.93]) - FTP session closed. Oct 19 12:30:11 pebkac syslogd 1.4.1: restart. Oct 19 12:30:11 pebkac kernel: klogd 1.4.1, log source = /proc/kmsg started. Oct 19 12:30:11 pebkac kernel: Linux version 2.6.18-8.1.14.el5 (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Tue Sep 25 11:45:53 EDT 2007 It's crashed again. I'm re-enabling my work-around for now, if you need more info please let me know. I discussed this problem with upstream. Could you please try create empty zone for "spamcop.org"? If it helps with CPU consuming try check source of queries from spamcop.org and try stop them (deny queries from it). Also usefull will be max-cache-size option which should help with memory consuming (set it in bytes, for example to one half of your memory). Please report me your experience. Adam I had given up on this nameserver, it was crashing every hour or so. I moved my official DNS elsewhere and completely disabled named so the server would stay up. The source of the queries to bl.spamcop.org is my anti-spam RBL checks. Creating an empty zone would make those checks not work. I have temporarily disabled the RBL checks at the mail server for testing, it should have the same effect. I will re-enable bind and see what happens after disabling RBL checks and setting the max-cache size as suggested. If it stays up a while, I'll begin moving my official DNS back to this server. Will keep you informed. I'm no longer seeing the poked timer messages, but the server continues to become unreachable so long as I am running named. As we discussed in mail problem will be in old BIOS. Could you please verify if newer BIOS fixes problem? Thanks, Adam It seems to have worked for me, yet others here have reported their systems are still crashing. I'm willing to close this issue and let others reopen it if they have more information. (In reply to comment #13) > It seems to have worked for me, yet others here have reported their systems are > still crashing. I'm willing to close this issue and let others reopen it if they > have more information. Thanks for feedback. If anyone has this problem please reopen. Closing for now |
Description of problem: Newly provisioned RHEL5 system running named configured for localhost-only recursion and official DNS for ~5 zones. Server is hanging several times a day, with many references to "named: *** POKED TIMER ***" in /var/log/messages. Google searching turns up this error in many freebsd forums, and the symptoms are that named is coredumping for some reason, and this often results in pegging the CPU. Version-Release number of selected component (if applicable): bind-9.3.3-9.0.1.el5 How reproducible: Several times a day. Steps to Reproduce: 1. Configure bind to server a few zones, and allow-recursion { 127.0.0.1; }; 2. Wait a few hours under normal load. 3. Screech! Bang! Thud. Dead server, reboot and repeat. Hardware is an IBM x305 w/1.7Ghz Celeron, 1G ram, 40Gig IDE. From /var/log/messages: Oct 13 18:04:03 pebkac named[1602]: *** POKED TIMER *** Oct 13 18:04:03 pebkac last message repeated 2 times Oct 13 18:04:03 pebkac named[1602]: lame server resolving '135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 13 18:04:03 pebkac named[1602]: lame server resolving '135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 13 18:04:03 pebkac named[1602]: lame server resolving '135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 13 18:04:03 pebkac named[1602]: lame server resolving '135.152.166.200.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 13 18:06:02 pebkac named[1602]: *** POKED TIMER *** Oct 13 18:09:03 pebkac named[1602]: *** POKED TIMER *** Oct 13 20:31:04 pebkac named[1602]: lame server resolving '74.82.39.162.bl.spamcop.org' (in 'spamcop.org'?): 66.232.1.2#53 Oct 13 20:31:04 pebkac named[1602]: lame server resolving '74.82.39.162.bl.spamcop.org' (in 'spamcop.org'?): 66.232.32.10#53 Oct 13 20:31:04 pebkac named[1602]: lame server resolving 'ns2.ip-pop.com' (in 'ip-pop.com'?): 66.232.32.10#53 Oct 13 20:31:04 pebkac named[1602]: lame server resolving 'ns1.ip-pop.com' (in 'ip-pop.com'?): 66.232.32.10#53 Oct 13 21:18:30 pebkac syslogd 1.4.1: restart. Oct 13 21:18:30 pebkac kernel: klogd 1.4.1, log source = /proc/kmsg started. Oct 13 21:18:30 pebkac kernel: Linux version 2.6.18-8.1.14.el5 (brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Tue Sep 25 11:45:53 EDT 2007