1089767 – Unbound caches missed entries.

Bug 1089767 - Unbound caches missed entries.

Summary: Unbound caches missed entries.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dnssec-trigger
Sub Component:
Version:	20
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Pavel Šimerda (pavlix)
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-21 23:35 UTC by William Brown
Modified:	2014-09-19 10:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:	dnssec-trigger-0.12-13.fc20
Clone Of:
Environment:
Last Closed:	2014-09-19 10:07:02 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1089910	0	unspecified	CLOSED	Unbound cache across networks causes traffic leaking.	2021-02-22 00:41:40 UTC

Internal Links: 1089910

Description William Brown 2014-04-21 23:35:40 UTC

Description of problem:
Unbound caches missed entries. I discovered this moving between a split view DNS network.



How reproducible:
Always

Steps to Reproduce:
1. Start on a network without access to some record X. Query it with dig @::1 X AAAA
2. Move to a network with access to record X. Add the correct forwarder to unbound.
3. Query the record.

Actual results:
Unbound returns the EXTERNAL view of the DNS: IE it returns the cached miss (No data)

Expected results:
Unbound should not cache record misses. It should query the forwarder again. Unbound should flush the cache between networks to avoid split-view DNS issues such as this. (Not all potential forwarders can be advertised via dhcp etc)

Comment 1 Paul Wouters 2014-04-22 01:56:52 UTC

unbound can be used as an ISP scale resolver, as well as an individual setup on a laptop. So the statement "unbound should not cache misses" is wrong. It could not work 

What you mean is "when moving between networks, flush the negative cache"

This is currently not possible with unbound. I'll talk to upstream to see if they would be okay with a patch that adds a flush_negative option.

As a workaround, you can try setting the negative cache size to 0:

sudo unbound-control set_option neg-cache-size:0

Be aware there could be bad effects or bad performance as a result of this. I have not tested it. I suspect on a single user computer, this will work. I am not sure if reducing this option during runtime will cause the cache to be dropped.

As for dropping all the cache during network move, I am still not convinced that is a justifiable default, as we discussed on the fedora-devel list.

Comment 2 William Brown 2014-04-22 02:32:09 UTC

A flush negative option is a good idea. 

I will follow that work around. Is that option a permanent setting? Or do I need to add it to a configuration file to make it permanent. 

Why not make for fedora this the default? Especially for workstations / laptops. 

How long is a negative cache entry stored for? IMO, it should be "as short as possible" if you insist on caching it ... 

Consider the situation where I have an internal DNS record with a TTL of say 1h. This is not unreasonable on my internal network. It is reasonable for this number to cross a network boundary, and my user to have an unfunctional service for an hour after they leave work? Split view DNS may not be a "good thing" (I actually dislike it ... ) But it's not going to vanish in my lifetime I expect.  

Alternately, if you want the cache maintained across network boundaries, why not at least flush the negative cache across a network boundary? Consider say a network where they block a website, say facebook, by faking the zone. We don't want that cached as a user moves.  

Finally, this is an example of the cross network move where maintaining a cache, especially a negative cache, is an issue. Just because I can solve this issue, does not mean all users can.  Remember, we want users to have a good experience of the Fedora product :)

Comment 3 William Brown 2014-04-22 04:39:49 UTC

Additionally, I do not believe BIND9 caches misses, and this is certainly the benchmark of "ISP resolver".

Comment 4 Tomáš Hozza 2014-04-22 07:55:14 UTC

(In reply to William Brown from comment #3)
> Additionally, I do not believe BIND9 caches misses, and this is certainly
> the benchmark of "ISP resolver".

BIND9 DOES cache negative answers! For example see "max-ncache-ttl" BIND config
option. By default negative answers are cached for 3 hours...

The reason is to save network traffic and increase the performance.

Comment 5 William Brown 2014-04-22 08:17:58 UTC

My bad for that.

Regardless, bind is generally installed in a single location. My laptop moves. I shouldn't take negative cache entries (Well, nor cache entries, but that's not the point of this bug) with me.

Flushing the negative cache on interface change is an appropriate solution for this example.

Comment 6 Pavel Šimerda (pavlix) 2014-04-22 08:25:00 UTC

In my opinion, this is not a problem in unbound but an integration point with the rest of the system including the tool that handles dynamic network configuration changes. I nominate this bug for NOTABUG and advice the original reporter to follow the current development in dnssec-trigger, NetworkManager and other related tools.

Comment 7 Tomáš Hozza 2014-04-22 08:32:07 UTC

(In reply to William Brown from comment #5)
> Flushing the negative cache on interface change is an appropriate solution
> for this example.

You would need some hook for that (e.g. NM dispatcher script) which would do
such a thing. Untill there is a flush_negative command mentioned by Paul,
you can flush the cache completely as a workaround.

Comment 8 William Brown 2014-04-22 08:46:16 UTC

At the moment, I am happy to flush the cache completely on network changes: It will resolve another issue I have just raised. See (https://bugzilla.redhat.com/show_bug.cgi?id=1089910)

But really, this should be automated, or part of the default configuration.

Here are the outcomes I see

A) Unbound adds a flush_negative and dnssec-trigger/NM calls it on every interface state change
B) dnssec-trigger calls unbound to flush the whole cache between network interface changes.
C) Unbound's default negative cache timeout is lowered or set to 0.

I obviously prefer B, but C is an acceptable default in my mind.

The point of the bugzilla is not so that I can have work arounds for my self, but to raise that other users may run into these issues. Not everyone is as capable as you or I at solving these issues, and they *will* occur. My aim is to make sure that the default experience is a smooth and correct one for *all* users.

Comment 9 Tomáš Hozza 2014-04-22 08:54:18 UTC

(In reply to William Brown from comment #8)
> At the moment, I am happy to flush the cache completely on network changes:
> It will resolve another issue I have just raised. See
> (https://bugzilla.redhat.com/show_bug.cgi?id=1089910)
> 
> But really, this should be automated, or part of the default configuration.
> 
> Here are the outcomes I see
> 
> A) Unbound adds a flush_negative and dnssec-trigger/NM calls it on every
> interface state change
> B) dnssec-trigger calls unbound to flush the whole cache between network
> interface changes.
> C) Unbound's default negative cache timeout is lowered or set to 0.
> 
> I obviously prefer B, but C is an acceptable default in my mind.

I would also preffer B, until there is A.
 
> The point of the bugzilla is not so that I can have work arounds for my
> self, but to raise that other users may run into these issues. Not everyone
> is as capable as you or I at solving these issues, and they *will* occur. My
> aim is to make sure that the default experience is a smooth and correct one
> for *all* users.

The workaround was meant as a temporary solution for you until there is proper
resolution. We want to come up with a good and smooth experience for everyone, too ;)

Comment 10 Paul Wouters 2014-04-22 15:35:47 UTC

I am okay with B until there is A.

Changed component to dnssec-trigger

Comment 11 Petr Spacek 2014-04-22 16:28:58 UTC

(In reply to William Brown from comment #2)
> How long is a negative cache entry stored for? IMO, it should be "as short
> as possible" if you insist on caching it ... 
William, see http://tools.ietf.org/html/rfc2308 . This setting is controlled by DNS zone administrator.

Anyway, I agree with William that B seems reasonable configuration for the beginning.

Personally, I think we can try B (full cache-flush after any network change) for a longer time (e.g. first Fedora release with Unbound installed by default?) to get operational experience with Unbound+NM combo and fine-tune it later.

It will be an improvement even if we flush the cache once a while. Consider that now we don't have *any* cache.

Comment 12 Pavel Šimerda (pavlix) 2014-04-22 19:36:34 UTC

In the current script included in dnssec-trigger upstream, any actual DNS configuration results in a flush of the respective subtree. Please look into that code and/or test the latest dnssec-trigger from git (I'll soon come up with a test build) let's see how it can be extended to fulfill your requirements.

I would like to check whether I understand the use case correctly. I assume that the use case is about an unchanged DNS configuration (namely list of global name servers) while the connectivity is different, i.e. we're asking the same nameservers but get access to different views. Do I understand the test (or use case) correctly?

Comment 13 William Brown 2014-04-23 00:17:49 UTC

When you have a test build please let me know, and I will run it / test it for you. 

The use case here is with moving between networks that have different DNS views. The test is:

Network A cannot resolve entries in private.zone.local. The negative cache is populated.

Move to Network B which can resolve entries in private.zone.local.

Fail case: On network B, the negative cache entry is returned.

Pass case: On network B, the correct entry is returned (The cache was flushed). 


The global name servers change from network A to B for the record.

Comment 14 Petr Spacek 2014-04-23 10:51:59 UTC

IMHO the problem here is that DHCP doesn't provide full list of locally served domains.

I suspect that this will be the case very often so I advocate for full flush (see comment #10).

Comment 15 Pavel Šimerda (pavlix) 2014-04-23 11:38:31 UTC

(In reply to William Brown from comment #13)
> When you have a test build please let me know, and I will run it / test it
> for you.

Great!

> Network A cannot resolve entries in private.zone.local. The negative cache
> is populated. Move to Network B which can resolve entries in private.zone.local.

That's still not enough information, see Petr's interpretation:

(In reply to Petr Spacek from comment #14)
> IMHO the problem here is that DHCP doesn't provide full list of locally
> served domains.

It's important to know whether we are talking about local subtree configuration or global DNS configuration. The output of `unbound-control list_forwards` before and after the change will help a lot. Or details about NetworkManager active connections.

By the way the current upstream version of the script can be seen here:

http://www.nlnetlabs.nl/svn/dnssec-trigger/trunk/dnssec-trigger-script.in

Comment 16 Paul Wouters 2014-04-23 16:17:29 UTC

What William is saying is that there is no such list. They run too many private domains to hand over in DHCP domain options. They are running many zones with internal/external DNS views. Those records need to go away when changing network.

So list_forwards won't show the domains affected. This is why he needs a flush_negative and/or a total flush_cache

Comment 17 Paul Wouters 2014-04-23 16:28:17 UTC

(In reply to Petr Spacek from comment #14)

> I suspect that this will be the case very often so I advocate for full flush
> (see comment #10).

"very often" is a subjective term. For support, 15x is very often. percentage wise, I would say it is 0.0001%

We should really _not_ do a full flush when I leave a trusted network to connect to an untrusted (open wifi) network. That just makes us more vulnerable because we start without any cache. Applications that re-connect during hotspot mode will be send to the wrong resolved IP if the hotspot messes with DNS (think pidgin for example). So there is really a price to pay when flushing a cache, which is why I would like to try and prevent this from being done per default.

In Williams case the problem is that even if his campus wifi is trusted, he needs to entries gone when the user connects at an open wifi later on. It is an exceptional case. Which in a way, I feel should be addressed by using short TTLs on the publisher side on the internal DNS view.

There is no issue returning from open wifi to trusted campus network. As the network is trusted, we will forward all (".") to their assigned nameserver and in that case, we could clearly perform a cache flush - we're in a trusted environment.

Comment 18 Pavel Šimerda (pavlix) 2014-04-23 16:54:06 UTC

If I understood correctly, we're talking about an entity with an unspecified list of domains which are only resolvable inside. That means William would expect a cache without negative items when just connected to the internal network.

But then another person will come and say that they also have an unspecified list of domains, that the domains are resolvable both inside and outside and that the results differ. Then the same reasoning will lead into clearing the whole cache.

This is pretty much a problem of all multiviewed domain names. A short TTL for the local view will not fix the TTL for the negative answer received outside. A short TTL on both views may not be always feasible. As Paul says, flushing the global cache comes with a price, so we might just want to make it configurable. I'm not sure whether we need to distinguish negative items, though.

Comment 19 Paul Wouters 2014-04-23 17:13:29 UTC

I dont see any performance or privacy issue with flushing the negative cache on network change. It will only negatively impact _real_ non-existing entries, which well, don't exist anyway so won't be affecting the user.

whereas flushing my regular cache will impact all my running applications, especially web browsers (although they might cache _anyway_ ignoring TTL, which is something William has to live with but as he pointed out, he has shift-reload to clear those)

Comment 20 Tomáš Hozza 2014-04-24 10:56:42 UTC

I agree with Paul in that flushing of negative answers is way better than flushing all the cache. Also in some corner case when the user uses dnssec-trigger and there are no reachable DNSSEC enabled nameservers, the last resort is to disable the DNS lookup and use only already cached records.

By flushing everything on network change we would loose this option completely.

Paul, did you write to the unbound upstream about the flush_negative option?
I was not able to find anything regarding this. I think we could contribute such
change if they are not willing to implement it or don't have time right now.
I just want to make sure it will get done at some point and we will not just
end up discussing.

Comment 21 Pavel Šimerda (pavlix) 2014-04-24 11:03:12 UTC

So, the bug is now assigned to myself as I'm going to contact upstream and prepare the patches. But I don't yet see any clear decision on what exactly needs to be done. Let's summarize what we have:

1) The problem seems to occur in IMO pretty rare situations when multiple DNS views are being used and the list of multiview domains cannot be distributed by DHCP or similar means.

2) The original reporter is concerned about a multiview situation where one view doesn't contain names the other view does. Therefore he's specifically concerned about negative caching.

3) Another situation is possible where the multiview would have different data for the same names. Then the issue wouldn't be specific to negative caching.

4) Paul is concerned about clearing out non-negative cache entries but doesn't object clearing only negative cache entries. Petr would rather flush the whole cache to solve the problem as described in #3.

5) The multiview is typically implemented by disabling DNSSEC for at least one of the cases. Therefore this setup will probably also require turning off DNSSEC for the respective internal connection. But then when we stop using non-DNSSEC resolution, we should stop using the non-DNSSEC cache entries possibly by flushing the whole cache.

So I think the main topic here is #4 and we seem to have a number of options when leaving a network:

a) leave the cache as is

Can be viewed as a security problem when leaving non-DNSSEC connections, can cause multiview problems in all cases.

b) clear all cache entries

Must be IMO done when leaving non-DNSSEC connection. Could be done for all connections, fixing all multiview issues.

c) clear only negative cache entries

Can be viewed as a security problem when leaving non-DNSSEC connections, can cause multiview problems but would fix this specific one.

It sounds pretty clear that Petr would prefer #a for all while Paul and Tomáš would prefer #c for DNSSEC connections. Did I miss something?

(In reply to Tomas Hozza from comment #20)
> Paul, did you write to the unbound upstream about the flush_negative option?
> I was not able to find anything regarding this. I think we could contribute
> such
> change if they are not willing to implement it or don't have time right now.
> I just want to make sure it will get done at some point and we will not just
> end up discussing.

I can handle the upstream request if you wish so.

Comment 22 Pavel Šimerda (pavlix) 2014-04-24 11:04:18 UTC

Also note that non-DNSSEC connections are currently not supported at all but I'm afraid William's use case won't work without it.

Comment 23 Tomáš Hozza 2014-04-24 11:39:17 UTC

(In reply to Pavel Šimerda (pavlix) from comment #21)
> So, the bug is now assigned to myself as I'm going to contact upstream and
> prepare the patches. But I don't yet see any clear decision on what exactly
> needs to be done. Let's summarize what we have:
> 
> 1) The problem seems to occur in IMO pretty rare situations when multiple
> DNS views are being used and the list of multiview domains cannot be
> distributed by DHCP or similar means.
> 
> 2) The original reporter is concerned about a multiview situation where one
> view doesn't contain names the other view does. Therefore he's specifically
> concerned about negative caching.
> 
> 3) Another situation is possible where the multiview would have different
> data for the same names. Then the issue wouldn't be specific to negative
> caching.
> 
> 4) Paul is concerned about clearing out non-negative cache entries but
> doesn't object clearing only negative cache entries. Petr would rather flush
> the whole cache to solve the problem as described in #3.
> 
> 5) The multiview is typically implemented by disabling DNSSEC for at least
> one of the cases. Therefore this setup will probably also require turning
> off DNSSEC for the respective internal connection. But then when we stop
> using non-DNSSEC resolution, we should stop using the non-DNSSEC cache
> entries possibly by flushing the whole cache.

Also how can we distinguish when to turn-off the DNSSEC and when to turn it on?

> So I think the main topic here is #4 and we seem to have a number of options
> when leaving a network:
> 
> a) leave the cache as is
> 
> Can be viewed as a security problem when leaving non-DNSSEC connections, can
> cause multiview problems in all cases.
> 
> b) clear all cache entries
> 
> Must be IMO done when leaving non-DNSSEC connection. Could be done for all
> connections, fixing all multiview issues.
> 
> c) clear only negative cache entries
> 
> Can be viewed as a security problem when leaving non-DNSSEC connections, can
> cause multiview problems but would fix this specific one.
> 
> It sounds pretty clear that Petr would prefer #a for all while Paul and

I think you've meant Petr would like #b.

> Tomáš would prefer #c for DNSSEC connections. Did I miss something?
> 
> (In reply to Tomas Hozza from comment #20)
> > Paul, did you write to the unbound upstream about the flush_negative option?
> > I was not able to find anything regarding this. I think we could contribute
> > such
> > change if they are not willing to implement it or don't have time right now.
> > I just want to make sure it will get done at some point and we will not just
> > end up discussing.
> 
> I can handle the upstream request if you wish so.

Please do so.

Comment 24 Petr Spacek 2014-04-24 12:15:45 UTC

(In reply to Tomas Hozza from comment #23)
> (In reply to Pavel Šimerda (pavlix) from comment #21)
> > b) clear all cache entries
> > 
> > Must be IMO done when leaving non-DNSSEC connection. Could be done for all
> > connections, fixing all multiview issues.
> > 
> > It sounds pretty clear that Petr would prefer #a for all while Paul and
> 
> I think you've meant Petr would like #b.
Yes, for now I would prefer #b - flush the cache completely after any change in network topology.

Please note that I'm not saying that it should be like that forever. I just don't belive that we need to optimal solution from the beginning. IMHO it is much more important to have DNSSEC validation running on clients ASAP.

From my point of view, cache is "a" optimization. DNS without DNSSEC validation has to be treared as untrusted in all cases and cache preservation between networks can't change that. (I agree that it can save some corner cases but I don't think it justifies this whole effort.)

I prefer functional solution (with non-optimal performance) ASAP instead of theoretically-functional-and-optimal solution later.

Comment 25 Pavel Šimerda (pavlix) 2014-04-24 18:39:18 UTC

(In reply to Petr Spacek from comment #24)
> From my point of view, cache is "a" optimization.

I agree that local DNS caching can be viewed as an optimization and solution #b is a valid option and indeed the best one from the multiview safety point of view.

> I prefer functional solution (with non-optimal performance) ASAP instead of
> theoretically-functional-and-optimal solution later.

+1

Comment 26 William Brown 2014-04-27 23:56:15 UTC

> > > b) clear all cache entries
> > > 
> > > Must be IMO done when leaving non-DNSSEC connection. Could be done for all
> > > connections, fixing all multiview issues.

What's to say that a DNSSEC view doesn't have split-views (there is an RFC somewhere about enabling this.)

A better solution perhaps is to be able to make a distinction of flushing DNSSEC-entries and non-SECed entries. I don't know enough about DNSSEC split view to know if 


> > > 
> > > It sounds pretty clear that Petr would prefer #a for all while Paul and
> > 
> > I think you've meant Petr would like #b.
> Yes, for now I would prefer #b - flush the cache completely after any change
> in network topology.
> 

Yes, thank you. This is my perferred outcome also.

> Also note that non-DNSSEC connections are currently not supported at all but
> I'm afraid William's use case won't work without it.

Correct.

Comment 27 Pavel Šimerda (pavlix) 2014-04-28 08:13:29 UTC

(In reply to William Brown from comment #26)
> > > > b) clear all cache entries
> > > > 
> > > > Must be IMO done when leaving non-DNSSEC connection. Could be done for all
> > > > connections, fixing all multiview issues.
> 
> What's to say that a DNSSEC view doesn't have split-views (there is an RFC
> somewhere about enabling this.)
> 
> A better solution perhaps is to be able to make a distinction of flushing
> DNSSEC-entries and non-SECed entries. I don't know enough about DNSSEC split
> view to know if

Thank you for pointing this out.

Paul/Thomas: Any comments on DNSSEC split view?

> > > I think you've meant Petr would like #b.
> > Yes, for now I would prefer #b - flush the cache completely after any change
> > in network topology.
> > 
> 
> Yes, thank you. This is my perferred outcome also.
> 
> > Also note that non-DNSSEC connections are currently not supported at all but
> > I'm afraid William's use case won't work without it.
> 
> Correct.

Please (others) also see:

https://bugzilla.redhat.com/show_bug.cgi?id=1089910#c4

If we want to address that, we need to flush the cache when leaving any network. This is also the easiest solution that can later be improved. I have no objections against making it configurable and/or improving it to explicitly keep DNSSEC validated information in the cache (needs support in unbound).

Comment 28 Petr Spacek 2014-04-28 10:16:25 UTC

(In reply to Pavel Šimerda (pavlix) from comment #27)
> (In reply to William Brown from comment #26)
> > > > > b) clear all cache entries
> > > > > 
> > > > > Must be IMO done when leaving non-DNSSEC connection. Could be done for all
> > > > > connections, fixing all multiview issues.
> > 
> > What's to say that a DNSSEC view doesn't have split-views
This is factually incorrect. DNSSEC doesn't limit your ability to use 'views', you just need to be more careful about it.

> > (there is an RFC
> > somewhere about enabling this.)
http://tools.ietf.org/html/draft-krishnaswamy-dnsop-dnssec-split-view

This RFC doesn't "enable" anything. It simply describes what you need to do to configure it correctly.

However, careful reading can reveal some cases where VPNs need special handling. (E.g. Public/internal versions of a zone are signed/unsugned or signed with key1/key2 etc.)

...
> If we want to address that, we need to flush the cache when leaving any
> network. This is also the easiest solution that can later be improved. I
I 100 % agree.

> have no objections against making it configurable and/or improving it to
> explicitly keep DNSSEC validated information in the cache (needs support in
> unbound).
Personally, I would not spend time on any optimization now. We will see how it works with full-flush approach and then we can start to think about optimizations...

Comment 29 Tomáš Hozza 2014-04-28 12:21:18 UTC

I agree with Petr's comment #28. We should come up with a generally usable
'proof of concept' and optimize it later as needed.

So I agree with flushing everything on network connection change for now.

Comment 30 Paul Wouters 2014-04-29 17:22:45 UTC

commit 3125 in upstream unbound implements flush_negative, so should be in the next release.

Comment 31 Pavel Šimerda (pavlix) 2014-04-29 20:11:31 UTC

(In reply to Paul Wouters from comment #30)
> commit 3125 in upstream unbound implements flush_negative, so should be in
> the next release.

Thanks!

Comment 32 Pavel Šimerda (pavlix) 2014-06-06 15:09:40 UTC

This has been (hopefully) fixed in 0.12 now available in rawhide, dnssec-trigger-script now flushes the whole cache on each change of the dynamically configured list of DNS servers. If necessary, please start a separate bug report for issues caused by excessive cache flushing and link to this bug report so that it's taken into account.

Comment 33 Pavel Šimerda (pavlix) 2014-06-06 15:10:48 UTC

As this bug report was filed for Fedora 20, changing to MODIFIED until an update to F20 is released.

Comment 34 William Brown 2014-06-16 03:18:56 UTC

This still can cause issues even with full flush. I have observed the following race condition.

On a network with slow DHCP:

* Ethernet is connected
* NetworkManager triggers that the network is avaliable
* firefox etc tries to send dns query. As DHCP lease isn't completed yet, unbound caches a negative entry.
* DHCP leasing completes, and dns is avaliable.
* Certain websites etc continue to fail because of cached negative entry.

This causes issues on slow networks, wireless with low signal etc.

Again, the full flush helps, but caching negative entries for clients can have awful effects. Can the default configuration be to minimise the length of the negative cache to prevent this issue where possible.

Comment 35 William Brown 2014-06-16 03:25:19 UTC

At the moment, the only way I can see to do this would be to set:

neg-cache-size: 0m

If that is a valid configuration.

Comment 36 William Brown 2014-06-16 04:09:17 UTC

In fact, combined with prefetching, you can very quickly have a cache completely populated with negative entries because of this race condition.

Comment 37 Petr Spacek 2014-06-16 07:08:25 UTC

Or you can simply flush cache when new IP address is assigned to the interface ...

Comment 38 Paul Wouters 2014-06-16 16:56:17 UTC

Of course, this problem exists partially _because_ there is a flush of real cached DNS data :/

We should add a hook to run unbound-control flush_negative and flush_requestlist when we gain an IP address.

Comment 39 William Brown 2014-06-16 23:29:54 UTC

(In reply to Petr Spacek from comment #37)
> Or you can simply flush cache when new IP address is assigned to the
> interface ...

Yes, I can, and it's what I have been forced to do: But can most "normal users" do this? I have already covered why negative caching is bad, and this again highlights that it should be disabled or minimal in time.

(In reply to Paul Wouters from comment #38)
> Of course, this problem exists partially _because_ there is a flush of real
> cached DNS data :/

But the flush of the real DNS data needs to happen anyway, for all the reasons I have already discussed. 

> 
> We should add a hook to run unbound-control flush_negative and
> flush_requestlist when we gain an IP address.

Sure, but this still needs to run a full flush as already discussed, even if a flush negative tool becomes avaliable. Doing it *after* the IP has been leased is the correct time to carry out the flush IMO to prevent this issue.

Comment 40 Petr Spacek 2014-06-17 07:48:05 UTC

(In reply to William Brown from comment #39)
> (In reply to Petr Spacek from comment #37)
> > Or you can simply flush cache when new IP address is assigned to the
> > interface ...
> 
> Yes, I can, and it's what I have been forced to do: But can most "normal
> users" do this? I have already covered why negative caching is bad, and this
> again highlights that it should be disabled or minimal in time.

I'm sorry for not being clear. My comment was meant as advice for implementers, not to users :-)

Comment 41 William Brown 2014-06-18 03:29:28 UTC

> I'm sorry for not being clear. My comment was meant as advice for
> implementers, not to users :-)

I apologise for sounding rude, it wasn't my intent. I have just gotten used to a certain attitude with regards to this topic that has been directed at me.

Comment 42 Pavel Šimerda (pavlix) 2014-06-18 06:46:12 UTC

(In reply to William Brown from comment #41)
> > I'm sorry for not being clear. My comment was meant as advice for
> > implementers, not to users :-)
> 
> I apologise for sounding rude,

Not at all. I should have taken a couple of more seconds to state explicitly that I'm talking about what the script should do. The rawhide version of dnssec-trigger now flushes the whole cache on changes which should solve the problem. On the other hand, we have a couple of other issues, so we're not yet updating Fedora 20.

Comment 43 Fedora Update System 2014-06-30 15:30:06 UTC

dnssec-trigger-0.12-12.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/dnssec-trigger-0.12-12.fc20

Comment 44 Fedora Update System 2014-07-01 07:21:46 UTC

Package dnssec-trigger-0.12-12.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dnssec-trigger-0.12-12.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-7942/dnssec-trigger-0.12-12.fc20
then log in and leave karma (feedback).

Comment 45 Fedora Update System 2014-08-11 11:40:42 UTC

dnssec-trigger-0.12-13.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/dnssec-trigger-0.12-13.fc20

Comment 46 William Brown 2014-08-21 00:59:29 UTC

I must again, re-open this issue.

Unbound should not cache missed entries. IE negative caching should be disabled.

Over time I have noticed that dropped packets on wireless, brief internet outages etc are being cached by unbound. As a result of these external influences my browsing / working experience has been affected. 

Unbound should not cache negative entries as it is unable to determine the difference between a lost packet and a true negative lookup.

Comment 47 Petr Spacek 2014-08-21 06:31:58 UTC

William, there is clear distinction (at DNS protocol level) between negative reply and "no reply" (lost answer) and "server failure" reply. If you are able to reproduce the bug please fill it against unbound component. It is not dnssec-trigger business at all.

Comment 48 Tomáš Hozza 2014-08-21 07:34:23 UTC

(In reply to William Brown from comment #46)
> I must again, re-open this issue.
> 
> Unbound should not cache missed entries. IE negative caching should be
> disabled.

Caching negative responses is completely valid. You are welcome to configure unbound not to cache negative responses.

> Over time I have noticed that dropped packets on wireless, brief internet
> outages etc are being cached by unbound. As a result of these external
> influences my browsing / working experience has been affected. 
> 
> Unbound should not cache negative entries as it is unable to determine the
> difference between a lost packet and a true negative lookup.

If this is the case, then I agree that lost answers should not be cached.
In this case please file a bug against unbound as Petr already suggested.

Thank you.

Comment 49 Paul Wouters 2014-08-21 12:36:58 UTC

It would be good if we can reduce the negative cache lifetime in the cache for these dnssec-triggerd enduser deployments of unbound. I have also at times noticed it can causes dns failures when especially at first many packets got dropped, and unbound marks a chunk of tree as undeterminable and servfails on it.

Clearly, negative caching cannot be fully disabled or else we would run the risk of sending a continuous queries for some badly configured zones (eg ones with unreachable NS entries)

But this is something that upstream should give us a new option for that we can set using unbound-control

Comment 50 Pavel Šimerda (pavlix) 2014-08-21 15:22:05 UTC

I think there's a *huge* difference between a servfail and an authoritative negative answer. In my opinion a servfail should not be treated as a cacheable event at all and the only valid reason not to ask again is rate limiting.

Comment 51 Fedora Update System 2014-09-19 10:07:02 UTC

dnssec-trigger-0.12-13.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.