Description of problem: Unbound attempts to prefect records as they are about to expire. Consider google. It has a 300 second TTL. That means, every 300 seconds unbound will expire google.com. The next lookup to google generates ~400 requests / responses. For a mobile 3G user, this does not aid me, and in fact wastes my quota by repeatedly querying dns for records I am not using (and unbound because I am on a mobile hotspot, won't use my mobile hotspot's dns which is free, it recursively looks up which does cost me data) When a zone is flushed, it expires all records, but they are marked expired. They are not actually deleted from the cache memory. So when you change network, and make the first dns query, unbound attempts to prefetch *every single* dns record you have ever visited while your OS was booted. When I go from home to mobile data on my laptop this generated 2976 dns lookups and responses. Lets assume 1 to 1 mapping of request to response, and wireshark shows a request is 81 bytes. The average response was ~400 bytes. This means my laptop generated 715728 bytes, or 698 KB of traffic. Given I live in Australia, which practically has 2nd world mobile data, I get about 100kb/s. Initial network connect then, with no packet loss takes 7 seconds of DNS queries alone to repopulate my cache. My telco charges $0.10 per 1MB (Which is cheap in Australia ... ) which means unbound literally costs me approximately $0.07 cents every time I connect to my mobile hotspot. Which for me is twice daily. The point of unbound was to make DNS faster. This behaviour on a mobile device is wasteful, and costly to a user. I end up with a non responsive network for a number of seconds when this runs. Prefetching must be disabled for workstation installs. How reproducible: Always
First of all, prefetching is only enabled for cached entries that are still being asked for. So not pre-fetching google.com will just lead to identical behaviour a few seconds later, except then the user is actively waiting on it because there is no cached entry. Second, there should not be 400 requests just to rename a google.com A record. That looks like a bug in the interaction of flushing and prefetching. (another shot in the foot by split DNS views and the requirement to flush caches) I'll talk to upstream about this one.
(In reply to Paul Wouters from comment #1) > First of all, prefetching is only enabled for cached entries that are still > being asked for. And how do you determine that something is "still being asked for"?
If it is still being queried for, i.e. the number of cache hits
from upstream: Unbound does not pre-fetch expired records. They are fetched normally, i.e. when a user asks for it. So the cache flush does not cause lookups. It also does not cause pre-fetch. When a record is 'near expiry' then a pre-fetch happens (if prefetch is enabled). This pre-fetch only happens when a user queries for that record. The pre-fetch does not happen if the user does not query for the record. For that 400 queries part of the report. So unbound will only generate an outbound query (except for trust-anchor updates on schedule) when there is an inbound user query. Also for pre-fetch. So, those 400 queries would only happen if the user is performing 400 lookups. I am trying to deduce what unbound would do in theory. If in practice this is not happening, some bug hunting is required. Does the user have real data or are these his estimates? Perhaps his mobile performs a lot of lookups when it connects to a hotspot, which is possible, with, I don't know, a lot of apps.
Then the issue isn't prefetching: But it's not explaining the symptoms I see. I closed all my applications (Except gedit) and opened wireshark. I made sure to deselect hostname to ip resoultion via dns so as not to polute the data. I then ran "unbound-control flush_zone ." as network manager does on a network change. ~4000 request / responses for DNS were made. I have provided stats before and after below. thread0.num.queries=2761597 thread0.num.cachehits=1848794 thread0.num.cachemiss=912803 thread0.num.prefetch=0 thread0.num.recursivereplies=912803 thread0.requestlist.avg=1.90029 thread0.requestlist.max=370 thread0.requestlist.overwritten=0 thread0.requestlist.exceeded=0 thread0.requestlist.current.all=0 thread0.requestlist.current.user=0 thread0.recursion.time.avg=0.040583 thread0.recursion.time.median=0.0325666 thread1.num.queries=3107851 thread1.num.cachehits=2121638 thread1.num.cachemiss=986213 thread1.num.prefetch=0 thread1.num.recursivereplies=986213 thread1.requestlist.avg=1.9175 thread1.requestlist.max=292 thread1.requestlist.overwritten=0 thread1.requestlist.exceeded=0 thread1.requestlist.current.all=0 thread1.requestlist.current.user=0 thread1.recursion.time.avg=0.041769 thread1.recursion.time.median=0.0322179 total.num.queries=5869448 total.num.cachehits=3970432 total.num.cachemiss=1899016 total.num.prefetch=0 total.num.recursivereplies=1899016 total.requestlist.avg=1.90923 total.requestlist.max=370 total.requestlist.overwritten=0 total.requestlist.exceeded=0 total.requestlist.current.all=0 total.requestlist.current.user=0 total.recursion.time.avg=0.041199 total.recursion.time.median=0.0323922 time.now=1403061635.517116 time.up=69012.793916 time.elapsed=69012.793916 mem.total.sbrk=16527360 mem.cache.rrset=2040582 mem.cache.message=2622140 mem.mod.iterator=16532 mem.mod.validator=245809 histogram.000000.000000.to.000000.000001=677 histogram.000000.000001.to.000000.000002=0 histogram.000000.000002.to.000000.000004=4 histogram.000000.000004.to.000000.000008=87 histogram.000000.000008.to.000000.000016=286 histogram.000000.000016.to.000000.000032=376 histogram.000000.000032.to.000000.000064=457 histogram.000000.000064.to.000000.000128=1407 histogram.000000.000128.to.000000.000256=5171 histogram.000000.000256.to.000000.000512=5931 histogram.000000.000512.to.000000.001024=12765 histogram.000000.001024.to.000000.002048=48677 histogram.000000.002048.to.000000.004096=62429 histogram.000000.004096.to.000000.008192=93367 histogram.000000.008192.to.000000.016384=146763 histogram.000000.016384.to.000000.032768=584974 histogram.000000.032768.to.000000.065536=739025 histogram.000000.065536.to.000000.131072=141874 histogram.000000.131072.to.000000.262144=15671 histogram.000000.262144.to.000000.524288=36335 histogram.000000.524288.to.000001.000000=1514 histogram.000001.000000.to.000002.000000=855 histogram.000002.000000.to.000004.000000=210 histogram.000004.000000.to.000008.000000=114 histogram.000008.000000.to.000016.000000=26 histogram.000016.000000.to.000032.000000=20 histogram.000032.000000.to.000064.000000=1 histogram.000064.000000.to.000128.000000=0 histogram.000128.000000.to.000256.000000=0 histogram.000256.000000.to.000512.000000=0 histogram.000512.000000.to.001024.000000=0 histogram.001024.000000.to.002048.000000=0 histogram.002048.000000.to.004096.000000=0 histogram.004096.000000.to.008192.000000=0 histogram.008192.000000.to.016384.000000=0 histogram.016384.000000.to.032768.000000=0 histogram.032768.000000.to.065536.000000=0 histogram.065536.000000.to.131072.000000=0 histogram.131072.000000.to.262144.000000=0 histogram.262144.000000.to.524288.000000=0 num.query.type.A=2935965 num.query.type.SOA=8 num.query.type.PTR=56 num.query.type.AAAA=2933218 num.query.type.SRV=201 num.query.class.IN=5869448 num.query.opcode.QUERY=5869448 num.query.tcp=0 num.query.ipv6=0 num.query.flags.QR=0 num.query.flags.AA=0 num.query.flags.TC=0 num.query.flags.RD=5869448 num.query.flags.RA=0 num.query.flags.Z=0 num.query.flags.AD=1 num.query.flags.CD=0 num.query.edns.present=1 num.query.edns.DO=0 num.answer.rcode.NOERROR=15361 num.answer.rcode.SERVFAIL=1897707 num.answer.rcode.NXDOMAIN=3956380 num.answer.rcode.nodata=2225 num.answer.secure=358 num.answer.bogus=1971 num.rrset.bogus=68 unwanted.queries=0 unwanted.replies=8 AFTER thread0.num.queries=2761606 thread0.num.cachehits=1848799 thread0.num.cachemiss=912807 thread0.num.prefetch=0 thread0.num.recursivereplies=912807 thread0.requestlist.avg=1.90028 thread0.requestlist.max=370 thread0.requestlist.overwritten=0 thread0.requestlist.exceeded=0 thread0.requestlist.current.all=0 thread0.requestlist.current.user=0 thread0.recursion.time.avg=0.040586 thread0.recursion.time.median=0.0325667 thread1.num.queries=3107864 thread1.num.cachehits=2121644 thread1.num.cachemiss=986220 thread1.num.prefetch=0 thread1.num.recursivereplies=986220 thread1.requestlist.avg=1.91751 thread1.requestlist.max=292 thread1.requestlist.overwritten=0 thread1.requestlist.exceeded=0 thread1.requestlist.current.all=0 thread1.requestlist.current.user=0 thread1.recursion.time.avg=0.041777 thread1.recursion.time.median=0.0322181 total.num.queries=5869470 total.num.cachehits=3970443 total.num.cachemiss=1899027 total.num.prefetch=0 total.num.recursivereplies=1899027 total.requestlist.avg=1.90923 total.requestlist.max=370 total.requestlist.overwritten=0 total.requestlist.exceeded=0 total.requestlist.current.all=0 total.requestlist.current.user=0 total.recursion.time.avg=0.041205 total.recursion.time.median=0.0323924 time.now=1403061790.716212 time.up=69167.993012 time.elapsed=69167.993012 mem.total.sbrk=16527360 mem.cache.rrset=2041436 mem.cache.message=2624320 mem.mod.iterator=16532 mem.mod.validator=245531 histogram.000000.000000.to.000000.000001=677 histogram.000000.000001.to.000000.000002=0 histogram.000000.000002.to.000000.000004=4 histogram.000000.000004.to.000000.000008=87 histogram.000000.000008.to.000000.000016=286 histogram.000000.000016.to.000000.000032=376 histogram.000000.000032.to.000000.000064=457 histogram.000000.000064.to.000000.000128=1407 histogram.000000.000128.to.000000.000256=5171 histogram.000000.000256.to.000000.000512=5931 histogram.000000.000512.to.000000.001024=12765 histogram.000000.001024.to.000000.002048=48677 histogram.000000.002048.to.000000.004096=62429 histogram.000000.004096.to.000000.008192=93367 histogram.000000.008192.to.000000.016384=146763 histogram.000000.016384.to.000000.032768=584974 histogram.000000.032768.to.000000.065536=739025 histogram.000000.065536.to.000000.131072=141874 histogram.000000.131072.to.000000.262144=15673 histogram.000000.262144.to.000000.524288=36335 histogram.000000.524288.to.000001.000000=1515 histogram.000001.000000.to.000002.000000=862 histogram.000002.000000.to.000004.000000=211 histogram.000004.000000.to.000008.000000=114 histogram.000008.000000.to.000016.000000=26 histogram.000016.000000.to.000032.000000=20 histogram.000032.000000.to.000064.000000=1 histogram.000064.000000.to.000128.000000=0 histogram.000128.000000.to.000256.000000=0 histogram.000256.000000.to.000512.000000=0 histogram.000512.000000.to.001024.000000=0 histogram.001024.000000.to.002048.000000=0 histogram.002048.000000.to.004096.000000=0 histogram.004096.000000.to.008192.000000=0 histogram.008192.000000.to.016384.000000=0 histogram.016384.000000.to.032768.000000=0 histogram.032768.000000.to.065536.000000=0 histogram.065536.000000.to.131072.000000=0 histogram.131072.000000.to.262144.000000=0 histogram.262144.000000.to.524288.000000=0 num.query.type.A=2935977 num.query.type.SOA=8 num.query.type.PTR=56 num.query.type.AAAA=2933226 num.query.type.SRV=203 num.query.class.IN=5869470 num.query.opcode.QUERY=5869470 num.query.tcp=0 num.query.ipv6=0 num.query.flags.QR=0 num.query.flags.AA=0 num.query.flags.TC=0 num.query.flags.RD=5869470 num.query.flags.RA=0 num.query.flags.Z=0 num.query.flags.AD=1 num.query.flags.CD=0 num.query.edns.present=1 num.query.edns.DO=0 num.answer.rcode.NOERROR=15383 num.answer.rcode.SERVFAIL=1897707 num.answer.rcode.NXDOMAIN=3956380 num.answer.rcode.nodata=2233 num.answer.secure=358 num.answer.bogus=1971 num.rrset.bogus=68 unwanted.queries=0 unwanted.replies=8
Any news on why this happens?
flushing "." means dropping the _entire_ cache. So a single application doing a DNS call will trigger a huge cache fill. It has nothing to do with prefetching. It might even be that unbound itself triggers these, for example by priming the root and fetching the root key and records to see if the buildin root hints needs updating. The test is not a good test at all. If you want to test prefetching, query for a record that you know no one else will query for. with a short ttl then later dump the cache and look for the entry.
So a single lookup to foo.com, after a cache flush of ., causes unbound to try and refetch every record that it ever held it's cache? That seems very inefficient. Why not just fetch "what's needed when requested?" It may not have been prefetching that caused my issue, but the core of my concern is that when you flush the cache, that unbound attempts to fetch a large number of records to repopulate itself. I will update description accordingly, but I think this behavior should be corrected.
It is not "re-populate" the cache. It is "querying starting from an empty cache". So yes if you flush everything, it needs to go to the root. to COM to ORG to root-nameservers.net to affilias.* and all those thousands of queries that you normally already find in a cache that's been up. You seem to want to flush the cache and have it to This is not a bug. This is the behaviour of a caching resolver that starts up from scratch.
No, I want it to flush the cache and not proactively attempt to fill the cache with every DNS record that I have ever looked at. For example, just because I used foo.example.com a few hours ago, doesn't mean that unbound should zealously repopulate the record unless I ask for it again. I have tested this with the current version and it appears that this over aggresive behaviour is no longer the case. As a result, I will close the issue.
(In reply to William Brown from comment #11) > I have tested this with the current version and it appears that this over > aggresive behaviour is no longer the case. Thanks!
This is not about PRO_ACTIVELY filling the cache! If you nuke the entire cache, and do one single lookup for fedoraproject.org You need to: - query the root nameservers for NS - query the ORG nameservers for .org - query for afilias-nst.info - query for info. - query for afilias-nst.info. and so on and so. you will require hundreds or thousands of queries to get all the domains involved in NS and DNSKEY and records. Please do not notice later on that this is "happening again" and re-open the bug. when you flush the cache, unbound has NO LIST OF ANYTHING you queried before, so it cannot be "proactively" be doing anything.