1082041 – IPv6: connectivity broken, router advertisements with preferred_lft == zero are ignored, network stack uses deprecated address for outgoing connection

Bug 1082041 - IPv6: connectivity broken, router advertisements with preferred_lft == zero are ignored, network stack uses deprecated address for outgoing connection

Summary: IPv6: connectivity broken, router advertisements with preferred_lft == zero a...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	20
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Thomas Haller
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1083283
TreeView+	depends on / blocked

Reported:	2014-03-28 14:20 UTC by Kai Engert (:kaie) (inactive account)
Modified:	2014-06-15 01:51 UTC (History)
CC List:	4 users (show)
Fixed In Version:	NetworkManager-0.9.9.0-39.git20131003.fc20
Clone Of:
Clones:	1083283 (view as bug list)
Environment:
Last Closed:	2014-06-15 01:51:55 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch v1 (1.52 KB, patch) 2014-03-31 21:57 UTC, Kai Engert (:kaie) (inactive account)	no flags	Details \| Diff
View All

Description Kai Engert (:kaie) (inactive account) 2014-03-28 14:20:56 UTC

Introduction:
=============
The ISP that natively connects my home network to IPv6 uses "temporary" prefixes.

This means, each time the router gets restarted, or each time the connections from the router to the ISP gets dropped, the ISP will assign a new, random /56 prefix, with prefix delegation.

I restarted my router multiple times during the past 24 hours, and thus, multiple different ipv6 prefixes had been used during that time.

Even after a restart of a Fedora 20 system, the network stack still learns about the old prefixes (and the previous ipv6 addresses), as shown by various commands such as "ifconfig".

Using 
  tcpdump -i eth0 -s 1500 icmp6 -v
it can be seens, it's the router that still advertises the old prefixes.


Problem:
========
As soon as the Fedora system "learns" about the "old" prefixes (the ones that are no longer valid for outside connectivity), the IPv6 connectivity to the outside world is broken.

ping6 to known hosts fails.

I can temporarily "trick" the network environment to work again, by running multiple
  ifconfig $DEV del $prefix/$size 
commands, to remove all the old prefixes.

Then ping6 works again. However, after a few moments, the Fedora network learns about the prefixes again from advertisements, and ping6 stops working.

(ping6 is just an example. All external ipv6 communication is broken.)


Initial analysis:
=================
The problem cannot be reproduced on other Linux distributions. Despite ifconfig showing a variety of ipv6 addresses related to old prefixes, ping6 works.

Using tcpdump, I was able to see that one of the old IP addresses is being used for the external connection.

I found a potential cause for the problem, which is a difference in the output of command "ip -6 a s".

On Fedora 20, all prefixes, old and current, are shown as
  "scope global dynamic".

However, on a different Linux distribution, all of the old prefixes are shown as "scope deprecated global dynamic".


My guess: The network stack on Fedora 20 fails to learn that some of the announced prefixes are deprecated, and continues to use them.

Detection of ipv6 prefixes classified as "deprecated" should be added in Fedora, with the hope that the network stack would use the non-deprecated ip address for outgoing connections, only, and fix the problem I'm experiencing.

Comment 1 Kai Engert (:kaie) (inactive account) 2014-03-28 16:03:34 UTC

I tried to look at the router advertisement message, which includes the multiple valid prefixes. I'm trying to understand which attribute in the advertisement can be used to decide which of the prefixes are of status "deprecated".

I can see that the old prefixes are advertised with a "prefered time" of "0 s", and the current one is advertised with a nonzero value (I see values between 1200 and 1800 seconds).

Should the code that consumes these announcements be changed to mark a prefix as deprecated, as soon as it is being advertised by a router with a prefered time of zero seconds?

(I'm only guessing, I don't have any real knowledge. This is an attempt to provide helpful information that might enable you to make progress on this issue.)

Comment 2 Kai Engert (:kaie) (inactive account) 2014-03-31 17:51:38 UTC

I've found this article:
http://www.davidc.net/networking/ipv6-source-address-selection-linux

I've learned that the deprecated status is related to the preferred_lft setting of an ipv6 address. If that value is zero, the address is shown as deprecated by the "ip -6 a s" command.

After executing
    ip -6 addr change OLD_ADDR/LEN dev $DEV preferred_lft 0
on all the old addresses,
"ip -6 a s" shows status deprecated for all the old addresses, and I can successfully use ping6.

Moments later I received a router advertisement. It advertised one of the old addresses, with valid time > 0, and pref time == 0.

Nevertheless, despite the announcement of pref time == 0, status "deprecated" disappears from all the old addresses listed by "ip -6 a s". The pref time is set to a value identical to valid time.

I think the handling of the advertised prefered times is broken.

Comment 3 Kai Engert (:kaie) (inactive account) 2014-03-31 19:09:47 UTC

After talking to jbenc, we believe the kernel processing is correct, and NetworkManager handles it incorrectly.

When disabling NetworkManager and configuring 
    sysctl net.ipv6.conf.$(DEV).accept_ra=1
and after receiving the router advertisement,
the output of "ip -6 a s"
correctly shows the old prefix with an preferred_lft==0 and status deprecated.

Comment 4 Kai Engert (:kaie) (inactive account) 2014-03-31 19:13:12 UTC

jbenc also argued that the router shouldn't send out any more advertisements with valid>0 and preferred=0.

However, IMHO it should still be handled correctly by NM, because it works fine on other distributions.

Furthermore, it was argued the router should have routed Fedora's ipv6 connection attempts, that use the old IP addresses for the source address, and it's a bug that the router doesn't.

jbenc pointed to:
http://tools.ietf.org/html/rfc4862#section-5.5.4
and
https://tools.ietf.org/html/rfc4861#section-6.3.4

Comment 5 Jiri Benc 2014-03-31 19:29:12 UTC

(In reply to Kai Engert (:kaie) from comment #4)
> jbenc also argued that the router shouldn't send out any more advertisements
> with valid>0 and preferred=0.

It can certainly send them but it has to be prepared to route the resulting addresses correctly for the rest of their valid lifetime.

> Furthermore, it was argued the router should have routed Fedora's ipv6
> connection attempts, that use the old IP addresses for the source address,
> and it's a bug that the router doesn't.

Correct.

But the addresses should be marked as deprecated anyway, the fact the router is broken is a different issue that should be addressed by its vendor (or the internet provider if it's a misconfiguration on their side, which is possible as well).

Comment 6 Kai Engert (:kaie) (inactive account) 2014-03-31 21:57:33 UTC

Created attachment 881113 [details]
patch v1

This patch seems to work for me on the initial testing. I'll test it more tomorrow.

Comment 7 Kai Engert (:kaie) (inactive account) 2014-04-01 10:46:16 UTC

In addition to the proposed bugfix, I have another question, regarding code in function nm_platform_ip6_address_sync.

It seems the preferred_lft, that was received in the advertisement, is never used by nm_platform_ip6_address_sync.

We have these lines:
        lifetime = subtract_guint32 (known_address->lifetime, shift);
        preferred = subtract_guint32 (known_address->lifetime, shift);

Is that correct? Or should the second line be:
        preferred = subtract_guint32 (known_address->preferred, shift);

Comment 8 Dan Williams 2014-04-01 19:26:51 UTC

Kernel RA code does this:

	valid_lft = ntohl(pinfo->valid);
	prefered_lft = ntohl(pinfo->prefered);

	if (prefered_lft > valid_lft) {
		net_warn_ratelimited("addrconf: prefix option has invalid lifetime\n");
		return;
	}

so yeah, it seems NM doesn't do the right thing here.  Thomas/Pavel, does Kai's change look right?

Comment 9 Kai Engert (:kaie) (inactive account) 2014-04-01 19:37:23 UTC

FYI, the issue mentioned in comment 7 can be seen in the equivalent ipv4 function, too.

Comment 10 Dan Williams 2014-04-01 19:38:24 UTC

I think we should enforce the "preferred <= lifetime" restriction in nm-lndp-rdisc.c, eg:

	address.lifetime = ndp_msg_opt_prefix_valid_time (msg, offset);
	address.preferred = ndp_msg_opt_prefix_preferred_time (msg, offset);
+	if (address.preferred > address.lifetime)
+		address.preferred = address.lifetime;

	fill_address_from_mac (&address.address, lladdr);

Also I think the preferred time in nm-platform.c for both IPv4 and IPv6 shouldn't be padded with the result of the lifetime shift.  It should either get it's own shift, or it should just be left alone entirely.  Thoughts?

Comment 11 Kai Engert (:kaie) (inactive account) 2014-04-01 19:45:51 UTC

FWIW, I had initially attempted to fix the reported issue by making the change suggested in comment 7. But that didn't fix my scenario.

I cannot talk about the shift, but with my limited understanding, I think the scenario with "preferred == 0" should be handled specially, to ensure those addresses get immediately marked as deprecated.

Comment 12 Thomas Haller 2014-04-01 21:49:56 UTC

I am also not sure, what the proper fix to this is.

Do we really need the padding of 5 seconds? And if we pad, what about preferred==0?


What do you think about the commits:

  platform: fix handling of preferred address time
  rdisc: limit the preferred time to address lifetime

on branch th/bgo727382_platform_fix_addr_lifetime?

@Dan, it does also enforce the "preferred <= lifetime" restriction" you suggested.
@Kai, does this branch fix your scenario?

Comment 13 Pavel Šimerda (pavlix) 2014-04-02 07:16:43 UTC

Padding by 5 seconds was I think Dan Williams' requirement to ensure that the configuration items are always handled by NetworkManager first, before the kernel. They should IMO be reconsidered and the reasons properly documented in the code. I tend to think it's useful to pad the valid lifetime and it's IMO optional to pad the preferred lifetime which is handled by the kernel. On the other hand, if NetworkManager wanted to react to the loss of preferred lifetime in time, it might be useful.

Comment 14 Pavel Šimerda (pavlix) 2014-04-02 07:47:17 UTC

(In reply to Kai Engert (:kaie) from comment #7)
> In addition to the proposed bugfix, I have another question, regarding code
> in function nm_platform_ip6_address_sync.
> 
> It seems the preferred_lft, that was received in the advertisement, is never
> used by nm_platform_ip6_address_sync.
> 
> We have these lines:
>         lifetime = subtract_guint32 (known_address->lifetime, shift);
>         preferred = subtract_guint32 (known_address->lifetime, shift);
> 
> Is that correct? Or should the second line be:
>         preferred = subtract_guint32 (known_address->preferred, shift);

Yep, this is probably the bug you are experiencing and should be fixed. The lifetimes should never be mixed (except enforcing the relation as dcbw suggested).

From thaller's branch:

> This is a hack to fix invalid lifetime (valid and prefered) of
> addresses. Platform currently caches libnl objects (struct rtnl_addr),
> which natively have relative lifetimes without start timestamp.

Erm... there's another bug:

0209db41        (Pavel Šimerda  2013-03-27 22:23:24 +0100       877)    address->ifindex = rtnl_addr_get_ifindex (rtnladdr);
0209db41        (Pavel Šimerda  2013-03-27 22:23:24 +0100       878)    address->plen = rtnl_addr_get_prefixlen (rtnladdr);
0535aa44        (Thomas Haller  2013-12-10 19:04:22 +0100       879)    address->timestamp = nm_utils_get_monotonic_timestamp_s ();
f121995f        (Pavel Šimerda  2013-06-29 11:30:11 +0200       880)    address->lifetime = rtnl_addr_get_valid_lifetime (rtnladdr);
f121995f        (Pavel Šimerda  2013-06-29 11:30:11 +0200       881)    address->preferred = rtnl_addr_get_preferred_lifetime (rtnladdr);

The address->timestamp is supposed to be the date and time of the address itself, not from the time where the conversion occurs.

Please look at:

rtnl_addr_get_create_time (struct rtnl_addr *addr)

I don't think it's a good idea to actually modify the cached data at all. In my opinion both lifetimes should be read only and upon read should be adjusted according to the age of the cached object.

The change of the zero lifetime semantics is a big one, so please double check no code uses lifetime=0 preferred=0 for static addresses as it was before.

Comment 15 Thomas Haller 2014-04-02 08:24:09 UTC

(In reply to Pavel Šimerda (pavlix) from comment #14)
> (In reply to Kai Engert (:kaie) from comment #7)
> > In addition to the proposed bugfix, I have another question, regarding code
> > in function nm_platform_ip6_address_sync.
> > 
> > It seems the preferred_lft, that was received in the advertisement, is never
> > used by nm_platform_ip6_address_sync.
> > 
> > We have these lines:
> >         lifetime = subtract_guint32 (known_address->lifetime, shift);
> >         preferred = subtract_guint32 (known_address->lifetime, shift);
> > 
> > Is that correct? Or should the second line be:
> >         preferred = subtract_guint32 (known_address->preferred, shift);
> 
> Yep, this is probably the bug you are experiencing and should be fixed. The
> lifetimes should never be mixed (except enforcing the relation as dcbw
> suggested).
> 
> From thaller's branch:
> 
> > This is a hack to fix invalid lifetime (valid and prefered) of
> > addresses. Platform currently caches libnl objects (struct rtnl_addr),
> > which natively have relative lifetimes without start timestamp.
> 
> Erm... there's another bug:
> 
> 0209db41        (Pavel Šimerda  2013-03-27 22:23:24 +0100       877)   
> address->ifindex = rtnl_addr_get_ifindex (rtnladdr);
> 0209db41        (Pavel Šimerda  2013-03-27 22:23:24 +0100       878)   
> address->plen = rtnl_addr_get_prefixlen (rtnladdr);
> 0535aa44        (Thomas Haller  2013-12-10 19:04:22 +0100       879)   
> address->timestamp = nm_utils_get_monotonic_timestamp_s ();
> f121995f        (Pavel Šimerda  2013-06-29 11:30:11 +0200       880)   
> address->lifetime = rtnl_addr_get_valid_lifetime (rtnladdr);
> f121995f        (Pavel Šimerda  2013-06-29 11:30:11 +0200       881)   
> address->preferred = rtnl_addr_get_preferred_lifetime (rtnladdr);
> 
> The address->timestamp is supposed to be the date and time of the address
> itself, not from the time where the conversion occurs.

I think this is not necessary. The timestamp is there to get absolute times (from the relative lifetime/preferred times). But anyway, I will change that (see below).


> Please look at:
> 
> rtnl_addr_get_create_time (struct rtnl_addr *addr)
> 
> I don't think it's a good idea to actually modify the cached data at all. In
> my opinion both lifetimes should be read only and upon read should be
> adjusted according to the age of the cached object.

In general yes. I couldn't come up with a better solution. But indeed there is a better solution:


rtnl_addr_get_create_time() and rtnl_addr_get_last_update_time() return the absolute timestamps of when the address was created/last_updated. However, preferred and lifetime counts down as time goes by.

Eg. reading the address at timestamp 20, you might get 
create_time = 10, update_time = 20, valid = 100, preferred = 50

reading it again at timestamp 30, you will get:
create_time = 10, update_time = 20, valid = 90, preferred = 40


So, the solution is, the use update_time and clock_gettime() to compute
valid = valid + (click_gettime(CLOCK_BOOTTIME(??)) - update_time)

(or something similar. Patch follows...)




> The change of the zero lifetime semantics is a big one, so please double
> check no code uses lifetime=0 preferred=0 for static addresses as it was
> before.

fully agree on that. But I think the change makes sense (provided, that all components set valid lifetimes).

Comment 16 Pavel Šimerda (pavlix) 2014-04-02 10:08:09 UTC

(In reply to Thomas Haller from comment #15)
> (In reply to Pavel Šimerda (pavlix) from comment #14)
> > From thaller's branch:
> > 
> > > This is a hack to fix invalid lifetime (valid and prefered) of
> > > addresses. Platform currently caches libnl objects (struct rtnl_addr),
> > > which natively have relative lifetimes without start timestamp.
> > 
> > Erm... there's another bug:
> > 
> > address->timestamp = nm_utils_get_monotonic_timestamp_s ();
> > 
> > The address->timestamp is supposed to be the date and time of the address
> > itself, not from the time where the conversion occurs.
> 
> I think this is not necessary.

The meaning of address->timestamp is to provide the libnl object timestamp to anchor the relative times. It's not the only way to do that but I believed it's the easiest way at the time of writing the code and I still do.

> The timestamp is there to get absolute times

An alternative is to store the absolute times themselves but that's more tricky as you have to store INFINITY somehow and it involves more hackery. On the other hand, the lifetimes have the INFINITY value specified by the RFC and they always fit in uint32_t, while the timestamp is an actual time stamp of the object without any special cases.

> > Please look at:
> > 
> > rtnl_addr_get_create_time (struct rtnl_addr *addr)
> > 
> > I don't think it's a good idea to actually modify the cached data at all. In
> > my opinion both lifetimes should be read only and upon read should be
> > adjusted according to the age of the cached object.
> 
> In general yes. I couldn't come up with a better solution. But indeed there
> is a better solution:
> 
> 
> rtnl_addr_get_create_time() and rtnl_addr_get_last_update_time() return the
> absolute timestamps of when the address was created/last_updated. However,
> preferred and lifetime counts down as time goes by.

I believe the former one returns the age of the *address object*, not the *kernel address* and I designed the code around this assumption. Please correct me if I'm wrong.

Therefore during the lifetime of the object, its address lifetimes don't change at all and they can be always interpretted using the object timestamp.

> Eg. reading the address at timestamp 20, you might get 
> create_time = 10, update_time = 20, valid = 100, preferred = 50
> 
> reading it again at timestamp 30, you will get:
> create_time = 10, update_time = 20, valid = 90, preferred = 40h

I believe the objects in the cache are *never modified*, therefore the timestamp and the lifetimes are always consistent. When a kernel change is accepted, the object is replaced with a new object, resulting in a new timestamp but also new values of lifetimes consistent with the new timestamp.

> So, the solution is, the use update_time and clock_gettime() to compute
> valid = valid + (click_gettime(CLOCK_BOOTTIME(??)) - update_time)

In my opinion, the solution is just to fill in the timestamp correctly, which was probably an omission on my side. The rest of the code seems to be correct (except the bug found by kaie) and up to the design I described above.

> > The change of the zero lifetime semantics is a big one, so please double
> > check no code uses lifetime=0 preferred=0 for static addresses as it was
> > before.
> 
> fully agree on that. But I think the change makes sense (provided, that all
> components set valid lifetimes).

I have no objections.

Comment 17 Pavel Šimerda (pavlix) 2014-04-02 10:19:42 UTC

(In reply to Pavel Šimerda (pavlix) from comment #16)
> > > The change of the zero lifetime semantics is a big one, so please double
> > > check no code uses lifetime=0 preferred=0 for static addresses as it was
> > > before.
> > 
> > fully agree on that. But I think the change makes sense (provided, that all
> > components set valid lifetimes).
> 
> I have no objections.

Just a note, don't take it as an opinion on how it should be. The original reason to treat valid=0 preferred=0 as infinity was to allow nm_platform_*_address_sync() to handle lists of addresses where lifetimes are omitted (because the objects are cleared, they default to zero).

Comment 18 Thomas Haller 2014-04-02 15:43:07 UTC

(In reply to Pavel Šimerda (pavlix) from comment #16)
> (In reply to Thomas Haller from comment #15)
> > (In reply to Pavel Šimerda (pavlix) from comment #14)
> > > From thaller's branch:
> > > 
> > > > This is a hack to fix invalid lifetime (valid and prefered) of
> > > > addresses. Platform currently caches libnl objects (struct rtnl_addr),
> > > > which natively have relative lifetimes without start timestamp.
> > > 
> > > Erm... there's another bug:
> > > 
> > > address->timestamp = nm_utils_get_monotonic_timestamp_s ();
> > > 
> > > The address->timestamp is supposed to be the date and time of the address
> > > itself, not from the time where the conversion occurs.
> > 
> > I think this is not necessary.
> 
> The meaning of address->timestamp is to provide the libnl object timestamp
> to anchor the relative times. It's not the only way to do that but I
> believed it's the easiest way at the time of writing the code and I still do.

Yes, I meant exactly that. Sorry for imprecise wording.

> > The timestamp is there to get absolute times
> 
> An alternative is to store the absolute times themselves but that's more
> tricky as you have to store INFINITY somehow and it involves more hackery.
> On the other hand, the lifetimes have the INFINITY value specified by the
> RFC and they always fit in uint32_t, while the timestamp is an actual time
> stamp of the object without any special cases.
>
> > > Please look at:
> > > 
> > > rtnl_addr_get_create_time (struct rtnl_addr *addr)
> > > 
> > > I don't think it's a good idea to actually modify the cached data at all. In
> > > my opinion both lifetimes should be read only and upon read should be
> > > adjusted according to the age of the cached object.
> > 
> > In general yes. I couldn't come up with a better solution. But indeed there
> > is a better solution:
> > 
> > 
> > rtnl_addr_get_create_time() and rtnl_addr_get_last_update_time() return the
> > absolute timestamps of when the address was created/last_updated. However,
> > preferred and lifetime counts down as time goes by.
> 
> I believe the former one returns the age of the *address object*, not the
> *kernel address* and I designed the code around this assumption. Please
> correct me if I'm wrong.
> 
> Therefore during the lifetime of the object, its address lifetimes don't
> change at all and they can be always interpretted using the object timestamp.
> 
> > Eg. reading the address at timestamp 20, you might get 
> > create_time = 10, update_time = 20, valid = 100, preferred = 50
> > 
> > reading it again at timestamp 30, you will get:
> > create_time = 10, update_time = 20, valid = 90, preferred = 40h
> 
> I believe the objects in the cache are *never modified*, therefore the
> timestamp and the lifetimes are always consistent. When a kernel change is
> accepted, the object is replaced with a new object, resulting in a new
> timestamp but also new values of lifetimes consistent with the new timestamp.
> 
> > So, the solution is, the use update_time and clock_gettime() to compute
> > valid = valid + (click_gettime(CLOCK_BOOTTIME(??)) - update_time)
> 
> In my opinion, the solution is just to fill in the timestamp correctly,
> which was probably an omission on my side. The rest of the code seems to be
> correct (except the bug found by kaie) and up to the design I described
> above.


Originally, you set the address->timestamp to rtnl_addr_get_create_time().
Then I found something ill, and changed it to "now" (see http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=177c767320578c490ff94e723115d830d1753ae1
http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=0535aa44db419eae77c77a07012644e732690ac1 )

However, this change was still wrong, because the valid/preferred times in the libnl object are not anchored by create_time() or last_update_time(). Instead the are counting from when the kernel sends the message.

So my previous change fixed it when receiving a new address, but not when accessing an older address from the cache.


Repushed the branch. Now it should be correct...

Comment 19 Kai Engert (:kaie) (inactive account) 2014-04-02 17:18:51 UTC

thaller gave me RPMs for fedora 20, and I confirm that code fixes my ipv6 issue. Thanks for your work on it!

(we need to fix an apparent openvpn regression, but I'll work with thaller on that by email)

Comment 20 Thomas Haller 2014-04-02 20:40:42 UTC

(In reply to Kai Engert (:kaie) from comment #19)
> thaller gave me RPMs for fedora 20, and I confirm that code fixes my ipv6
> issue. Thanks for your work on it!
> 
> (we need to fix an apparent openvpn regression, but I'll work with thaller
> on that by email)

I think, openvpn was broken to to the change

> platform: fix handling of preferred address time
>
>    Note, that this change also makes the assumption, that a lifetime of
>    zero means that the address is expired -- contrary to before, where it
>    meant that it is a permanent address. But I think, the lifetime should
>    always be set to a positive value.

I reverted that part and restored the previous behavior.

Branch repushed.

Comment 21 Pavel Šimerda (pavlix) 2014-04-02 23:51:41 UTC

I'm still not sure whether I got the reason why the create_time shouldn't be used. After all we're not talking about subsecond precision, here. And I don't have the time to read the code carefully. But overall it looks good and Kai already confirmed it works for him.

Comment 22 Kai Engert (:kaie) (inactive account) 2014-04-03 08:12:46 UTC

Test results for NetworkManager-0.9.9.1-7861.9ae328b966:

Yes, the vpn route issue is fixed, the routes work, and I can access the vpn.

Unfortunately, it no longer fixes the original issue. The preferred times are no longer set to zero. For the expired prefixes, valid and preferred are set to the identical values again. None of the addresses in "ip -6 a s" is shown as deprecated. (Although I can use ping6 ipv6-host, I'm worried it might have randomly selected a working outgoing address?)

Also, I would like to mention another change of behaviour, which I saw with both versions you gave to me (previous one and this one), when compared to the stock f20 package. I have set all my wired connections to manual (not automatic), requiring me to manually select one, prior to establishing a wired connection. With stock f20 package, after starting up NetworkManager, the "disconnected" applet is shown in notification area. With the packages you gave to me, the status icon automatically changes to the connected state. Sometimes it only does so because of the "virbr0" interface and reports that one as connected in a notification. But at other times, I have also seen it bring up one of the wired connections I have configured as manual.

Comment 23 Kai Engert (:kaie) (inactive account) 2014-04-03 08:21:35 UTC

Test results for NetworkManager-0.9.9.1-7861.9ae328b966:

(a)
Yes, the vpn route issue is fixed, the routes work, and I can access the vpn.

(b)
Unfortunately, it no longer fixes the original issue. The preferred times are no longer set to zero. For the expired prefixes, valid and preferred are set to the identical values again. None of the addresses in "ip -6 a s" is shown as deprecated. Using ping6 worked initially, but by the time I was ready to submit this bugzilla comment, ping6 stopped working, and I assume a nonworking local ipv6 address was selected as the source address. This also had the effect that I was unable to open web pages, I had to change networking to be able to submit this comment.

(c)
Also, I would like to mention another change of behaviour, which I saw with both versions you gave to me (previous one and this one), when compared to the original f20 package. I have set all my wired connections to manual (not automatic) as I use static ipv4 at home, requiring me to manually select one of the configured connections, prior to being able to use a wired connection. With original f20 package, after starting up NetworkManager, the "disconnected" applet is shown in notification area (as expected). With the packages you gave to me, the status icon automatically changes to the connected state (unexpected). Sometimes it only does so because of the "virbr0" interface and reports that one interface as connected in a popup notification. But at other times, I have also seen it bring up one of the wired connections I have configured as manual.

Comment 24 Kai Engert (:kaie) (inactive account) 2014-04-03 08:22:23 UTC

(Sorry for the double post. The second post is clearer. When I submitted the first post, the bugzilla page didn't load.)

Comment 25 Thomas Haller 2014-04-03 11:52:31 UTC

(In reply to Pavel Šimerda (pavlix) from comment #21)
> I'm still not sure whether I got the reason why the create_time shouldn't be
> used. After all we're not talking about subsecond precision, here. And I
> don't have the time to read the code carefully. But overall it looks good
> and Kai already confirmed it works for him.

It has nothing to do with sub-second precision.


To show the problem that commit "platform: fix preferred and valid lifetimes in platform cache" fixes:


create a non-permanent address:
# ip addr add 172.16.166.166/24 dev dummy valid_lft 50 preferred_lft 30


if you show this address immediately after creation with libnl/nl-addr-list you get:

# nl-addr-list --details
172.16.166.166/24 inet dev dummy scope global
   label dummy
   valid-lifetime 50s preferred-lifetime 30s
   created boot-time+5h 1m 19s 720msec last-updated boot-time+5h 1m 19s 720msec

(as expected).



But if you query the address again 10 seconds later, you get:

# nl-addr-list --details
172.16.166.166/24 inet dev dummy scope global
   label dummy
   valid-lifetime 40s preferred-lifetime 20s
   created boot-time+5h 1m 19s 720msec last-updated boot-time+5h 1m 19s 720msec


Note that the created/last-updated timestamps do not change, but the valid/preferred times count down -- because they count from *now* and are not anchored by either created/last-updated. If we would cache this libnl object without special handling, we don't know later on when the address will expire (because we don't know, when we received/cached it).

The "special handling" is  "_normalize_timestamps (struct rtnl_addr *rtnladdr)" which anchoring the valid/preferred times on last-updated.

Comment 26 Thomas Haller 2014-04-03 12:13:28 UTC

(In reply to Kai Engert (:kaie) from comment #23)
> Test results for NetworkManager-0.9.9.1-7861.9ae328b966:

Thanks for testing!!


> (a)
> Yes, the vpn route issue is fixed, the routes work, and I can access the vpn.

great.



> (b)
> Unfortunately, it no longer fixes the original issue. The preferred times
> are no longer set to zero. For the expired prefixes, valid and preferred are
> set to the identical values again. None of the addresses in "ip -6 a s" is
> shown as deprecated. Using ping6 worked initially, but by the time I was
> ready to submit this bugzilla comment, ping6 stopped working, and I assume a
> nonworking local ipv6 address was selected as the source address. This also
> had the effect that I was unable to open web pages, I had to change
> networking to be able to submit this comment.

Oh yes. My recent change broke this again. I pushed a new fixup! commit on th/bgo727382_platform_fix_addr_lifetime. With this, it should work again.


> (c)
> Also, I would like to mention another change of behaviour, which I saw with
> both versions you gave to me (previous one and this one), when compared to
> the original f20 package. I have set all my wired connections to manual (not
> automatic) as I use static ipv4 at home, requiring me to manually select one
> of the configured connections, prior to being able to use a wired
> connection. With original f20 package, after starting up NetworkManager, the
> "disconnected" applet is shown in notification area (as expected). With the
> packages you gave to me, the status icon automatically changes to the
> connected state (unexpected). Sometimes it only does so because of the
> "virbr0" interface and reports that one interface as connected in a popup
> notification. But at other times, I have also seen it bring up one of the
> wired connections I have configured as manual.

The NM version in F20 is several months old and I gave you a test version based on master. So, I would expect some differences.

Mote, that when NM starts, it tries to take over the currently configured interface without changing it. Sometimes that means, it creates a new connection in memory. Maybe now NM sees the configuration of your interfaces, and deduces that one of your connection matches (and is already active). This would be intended behaviour. This is not the same as auto-activation or "activating" a connection, because in that moment, NM does change the interface configuration.

Or it could be a bug in the UI client that you are using. Is it the gnome applet? Or nm-applet? (or KDE/nm-plasma?).
Or it could be that NM changes the client/DBUS API (which it shouldn't).

Anyway, let's try to sort this out via IRC (and possibly open a new BZ).


Thank you for all your help!!

Comment 27 Kai Engert (:kaie) (inactive account) 2014-04-03 18:51:49 UTC

(In reply to Thomas Haller from comment #26)
> I pushed a new fixup! commit on
> th/bgo727382_platform_fix_addr_lifetime. With this, it should work again.

I've built 268abf33ad70d69394c4c0f6fdf3e748ffb9696b and confirm it works for me. IPv6 addresses are correctly deprecated, and VPN works, too.


> The NM version in F20 is several months old and I gave you a test version
> based on master. So, I would expect some differences.

ok, this wasn't a real problem on my side, I simply wanted to let you know.


> Mote, that when NM starts, it tries to take over the currently configured
> interface without changing it. Sometimes that means, it creates a new
> connection in memory. Maybe now NM sees the configuration of your
> interfaces, and deduces that one of your connection matches (and is already
> active). This would be intended behaviour. This is not the same as
> auto-activation or "activating" a connection, because in that moment, NM
> does change the interface configuration.
> 
> Or it could be a bug in the UI client that you are using. Is it the gnome
> applet? Or nm-applet? (or KDE/nm-plasma?).
> Or it could be that NM changes the client/DBUS API (which it shouldn't).

I'm running Cinnamon and the running nm-applet is from
network-manager-applet-0.9.9.0-9.git20140123.fc20.x86_64

Thanks for the explanation of "taking over current interface".
I'll reboot and see how it behaves on clean startup.

Comment 28 Kai Engert (:kaie) (inactive account) 2014-04-03 19:07:38 UTC

(In reply to Kai Engert (:kaie) from comment #27)
> Thanks for the explanation of "taking over current interface".
> I'll reboot and see how it behaves on clean startup.

It look good, it remained disconnected on boot, as expected :)
Thanks

Comment 29 Thomas Haller 2014-06-09 09:32:02 UTC

Note: merged upstream as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d90b9ff2c855bf4350070957a72b267f7edb7898

Comment 30 Fedora Update System 2014-06-12 14:08:54 UTC

NetworkManager-0.9.9.0-39.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-39.git20131003.fc20

Comment 31 Fedora Update System 2014-06-13 05:30:01 UTC

Package NetworkManager-0.9.9.0-39.git20131003.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-39.git20131003.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-7329/NetworkManager-0.9.9.0-39.git20131003.fc20
then log in and leave karma (feedback).

Comment 32 Fedora Update System 2014-06-15 01:51:55 UTC

NetworkManager-0.9.9.0-39.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.