Bug 62291
Summary: | Latest bind 8.2.3 erratum for 6.2 and 7.0 is totally unuseable | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Mike A. Harris <mharris> |
Component: | bind | Assignee: | Daniel Walsh <dwalsh> |
Status: | CLOSED WORKSFORME | QA Contact: | Ben Levenson <benl> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | CC: | anders |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-12-18 20:21:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mike A. Harris
2002-03-29 08:24:25 UTC
pts/0 root@gw:/# nslookup irc.redhat.com Server: gw.capslock.lan Address: 192.168.1.1 *** gw.capslock.lan can't find irc.redhat.com: Server failed pts/0 root@gw:/# service named restart Shutting down named: [ OK ] Starting named: [ OK ] pts/0 root@gw:/# nslookup irc.redhat.com Server: gw.capslock.lan Address: 192.168.1.1 Non-authoritative answer: Name: irc.openprojects.net Addresses: 64.28.67.98, 207.106.22.229, 216.53.71.65, 198.186.203.27 Aliases: irc.redhat.com pts/0 root@gw:/# nslookup irc.redhat.com Server: gw.capslock.lan Address: 192.168.1.1 *** gw.capslock.lan can't find irc.redhat.com: Server failed pts/0 root@gw:/# nslookup www.redhat.com Server: gw.capslock.lan Address: 192.168.1.1 Non-authoritative answer: Name: www.redhat.com Addresses: 216.148.218.197, 216.148.218.195 pts/0 root@gw:/# nslookup irc.lame.org Server: gw.capslock.lan Address: 192.168.1.1 *** gw.capslock.lan can't find irc.lame.org: Server failed Sometimes I get lookups working for 2 to 3 minutes, maybe as many as 5 to 10 minutes. Leaving the machine completely idle, and waiting 2-3 minutes then hitting up-arrow-enter of the last lookup is all that need be done. If it works one time, it will fail at some point. If I cant get one to fail, I try a different host, and generally one fails right away. Restarting bind does not guarantee it will work right away either. It might work, or might fail immediately. Sometimes 2-3 restarts are needed. I just rebuild bind 9.1.0 from RHL 7.1 on RHL 7.0 after removing the dependancy on tar, and replacing tar -j with bzcat piped to tar... Results.... same thing. So, it appears this might be more than a bind issue, but perhaps a library issue or somesuch. I dont know enough about bind to debug the issue further, but I've discussed it with a few other people now too, and they're having similar problems. :o/ That didn't make much sense.. considering I am debugging the issue further... Try to find out more tomorrow. Any news on this one ? Is this still a open bug or did you figure out the problem? It was just assigned to me. Dan This problem drove me completely nuts to the point where 3 DNS experts (one of which was Bryce) couldn't fix it, and couldn't determine what the problem could be - complete bafflement. As such, I just stopped using bind entirely, disabled local DNS, and started using /etc/hosts on all machines mirrored via cron, the good old fashioned 1970's way. I pointed DNS to my ISP's servers, and all problems went away for quite some time. Many many months later, I began having new DNS problems, in particular in mozilla, and oddly - only from certain machines on my network. Frustration once again, and with many of the same symptoms as the problem described here. I was essentially unable to use the Internet properly while my whole LAN seemed to work fine. I began suspecting something wrong on my firewall perhaps. I investigated the configuration of pretty much everything on my firewall and tested many things, all to no avail. Couldn't find any problems. Then I checked /var/log/messages, and scanned it for anything even remotely possible to be the culprit of the trouble I was having. Lo and behold..... messages.2.gz:Oct 16 10:28:34 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34844). messages.2.gz:Oct 16 10:29:39 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34844). messages.2.gz:Oct 16 10:33:37 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34844). messages.2.gz:Oct 16 10:34:12 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34846). messages.2.gz:Oct 16 10:37:39 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34846). messages.2.gz:Oct 16 10:37:48 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34846). messages.2.gz:Oct 16 10:39:53 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34848). messages.2.gz:Oct 16 10:39:55 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34848). messages.2.gz:Oct 16 10:39:58 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34849). messages.2.gz:Oct 16 10:40:00 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34848). messages.2.gz:Oct 16 10:45:36 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34849). messages.2.gz:Oct 16 10:45:43 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34857). messages.2.gz:Oct 16 10:46:17 gw kernel: IP_MASQ:ip_masq_new(proto=UDP): could not get free masq entry (free=34860). IP Masquerading was failing for UDP due to a filled masquerade table. But for some odd reason, *only* on *certain* machines. ARRRRGHHHHH!! In other words, a 2.2.x kernel bug (IMHO). The solution was to reboot the machine. The problem went away for a couple months and returned, and another reboot solved it again. I do not know explicitly if this kernel bug/issue is/was responsible for the bind issue I am reporting in this report, however it is entirely likely that it is/was the problem at that point in time as well. Since nobody else seems to have experienced this problem, I am considering it a local issue now, due to the specifics of my own kernel (which is *cough* homebrew *cough* from stock kernel.org sources). I have deprecated my trusty 486-DX2/66 now, and plan on putting a newer RHL 8.0 capable machine in its place with iptables, and a stock Red Hat kernel rather than the minimalized kernel I had no choice but to use on the 12Mb 486. ;o) In short, I consider this issue closed due to kernel funkification. Closing as WORKSFORME now. |