Bug 155422
Summary: | NetworkManager looses DNS after a short while | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Christian Schaller <uraeus> | ||||
Component: | NetworkManager | Assignee: | Christopher Aillon <caillon> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | jan.mynarik, jvdias | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-02-20 18:52:06 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 136451 | ||||||
Attachments: |
|
Description
Christian Schaller
2005-04-20 07:33:59 UTC
Can you post the output of /var/log/messages right after it stops working? Can you also try to "killall -HUP named" right after it seems to stop, and see if that makes it work again? I think I might have just run into this... If you know how, can you attach to the 'named' process using gdb and run a 't a a bt' (thread apply all backtrace) and post the output in here? eh... 1) "ps aux | grep named" 2) note the PID of named (should be the second column, right after 'root') 3) "gdb attach <pid from step 2>" 4) "t a a bt" 5) paste the output in here 6) "detach" 7) "quit" Thanks! Dan It could also be the case that named was not given the options "forwarders { IP; IP; ...} forward only; " in the global "options { ... };" section and is timing out trying to contact root nameservers (by default, the resolver timeouts are very long - @ 5mins). Are you running behind a firewall? Please also attach your named.conf configuration file (file passed as '-c' command line option by NetworkManager). Here is the content of my named.conf file, I will attach a bt as soon as my dns stops working again. Also regarding a firewall, I am not behind one as such, but afaik so do the combined ADSL/wireless router device from 3com to some degree function as a firewall. // Named configuration, generated by NetworkManager options { directory "/"; query-source address * port *; forward only; forwarders { 192.168.2.1; }; listen-on { 127.0.0.1; }; pid-file "/var/named/data/NetworkManager-pid-named"; }; // Disable rndc controls { }; This happened to me again this morning after leaving the box run all night. If you kill -9 the named process, NetworkManager will spawn a new named. However, that new named will be stuck in the same situation, and doesn't resolve at all. Picking a new access point or going to wired seems to kick named enough that it starts resolving again... hmm, false alarm on my previous comment. vpnc had been running more than 8 hours (and therefore past the rekeying threshold), and therefore the VPN was not technically "running" and vpnc couldnt' contact the nameservers listed in the conf file anyway. That would be why multiple kicks of named didn't work this morning for me. Dan I will attach a bt, but I have to point out that the process did not crash so I guess it will be of little use. Also noticed while testing after this happened that I was able to ping a few named addresses, but I am not sure what the pattern behind those addresses is except that I didn't access any of them before dns 'disappered'. Still it seems to be something among 1 in 10 addresses I am able to reach. Created attachment 114219 [details]
Backtrace of named
The backtrace shows named in its normal state - its main thread is in sigsuspend waiting for a SIGHUP (reload) or SIGTERM (shutdown) signal. All the work is done by the other threads, which seem to be in normal states. Can you for instance lookup the localhost address, ie. do: # dig -x 127.0.0.1 @127.0.0.1 1.0.0.127.in-addr.arpa domain name pointer localhost. (You should have the rfc1912 localhost zones in your named.conf, ie 'zone "0.0.127.in-addr.arpa" { type master; file "localhost.zone"; };' - if not, this could itself be a problem, since named will try to lookup 127.0.0.1 and fail for every query - install the 'caching-nameserver' package or remove all your named configuration files and run 'system-config-bind'). If this does not succeed, and you do have the $ROOTDIR/var/named/localhost.zone file (where ROOTDIR is defined in /etc/sysconfig/named), then we definitely do have a named problem . Before reproducing, put named in debug mode - as root: # chmod g+w $ROOTDIR/var/named # rndc trace 99 named will create a $ROOTDIR/var/named/named.run file. Please compress this file and append it to this bug / send it in the mail to us. If the localhost lookup does succeed, then this is not a named problem - it could be a firewall problem. Do you have a firewall enabled ? When you have reproduced the problem, please gather tcpdump information - as root, do : # tcpdump -nl -i any -vvv -s 2048 port domain >/tmp/tcpdump.log 2>&1 & then do some queries which fail, and (if possible) some which succeed, eg. using the 'dig' command . Then: # pkill tcpdump and attach the /tmp/tcpdump.log file to this bug or send it to us. jason: NetworkManager starts named from its main thread, and it runs a total of 1 main thread + 1 thread per device. Been using NetworkManager as my primary connection method for quite a while with FC4 and I have noticed one thing. This problem only happens at home, never at work. Which makes me believe the problem lies somewhere with the interaction between named and the dns system inside my 3com ADSL router. I am not sure what my router actually does, but when I get an IP from it my dns is set to be the routers, so it might contain its own caching DNS server. And as mentioned earlier a few addresses seems to keep working when this happens. My current guess would be that somehow named and the dns system inside my router do something together which one of the parts don't fully comprehend so the reply I get back is that the dns name is unknown instead of the ADSL router system sending the request further upstream for the real reply. (In reply to comment #11) > Been using NetworkManager as my primary connection method for quite a while with > FC4 and I have noticed one thing. This problem only happens at home, never at > work. Which makes me believe the problem lies somewhere with the interaction > between named and the dns system inside my 3com ADSL router. I am not sure what > my router actually does, but when I get an IP from it my dns is set to be the > routers, so it might contain its own caching DNS server. And as mentioned > earlier a few addresses seems to keep working when this happens. My current > guess would be that somehow named and the dns system inside my router do > something together which one of the parts don't fully comprehend so the reply I > get back is that the dns name is unknown instead of the ADSL router system > sending the request further upstream for the real reply. This is exactly my experience, I'd like to confirm this behaviour too with ADSL router SMC 7804WBR-B EU. It has its own DNS cashing too, just like in Christian's case. Tested on Ubuntu Breezy with NetworkManager CVS version from 2005-07-27 (with named, resolvconf disabled for those curious :-)). I read about this bug on Christian's blog and just wanted to tell it's not RedHat specific ;-) Tested using nslookup at home as I assumed it might be the same issue as reported in bug 165588. Unfortunatly the behaviour at home is different than 165588 as nslookup wasn't even able to connect to the router dns server directly when this issue occurs. I too am suffering from the same problem. However, I was previously using wpa_supplicant to connect to my wireless network and dhcpcd to do all the dhcp stuff. Using dhcpcd as a dhcp client seemed to work flawlessly on any wireless network, surely there's a way of scripting a work-around to perhaps use that instead? Re-starting bind seems to fix my problem, perhaps it differs slightly from above? FYI, at least one case of this problem should be fixed in CVS HEAD since we talk to named directly now with dbus, and don't spawn our own copy of named. This fixes the issue where forwarders don't get cleared when updating DNS information. This, however, was most often seen when switching connections and/or using VPN (ie, situations in which your nameservers would change at least once). Ok, tested with 0.5.1 of NetworkManager and problem persists for me. That said I guess this bug is a duplicate of my 165588 bug, I will test some more tonight to try to figure out if these two problems are in fact one and the same issue. After some more testing it seems 0.5.1 has solved this issue for me. So I am now closing this bug report. Thanks for the good work. bug report still open -- last comment said it would be closed? closing |