Bug 1066705 - Add vxlan interface support
Add vxlan interface support
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager (Show other bugs)
7.0
Unspecified Unspecified
high Severity medium
: rc
: ---
Assigned To: Dan Winship
Desktop QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-18 18:05 EST by Dan Williams
Modified: 2015-03-05 08:49 EST (History)
6 users (show)

See Also:
Fixed In Version: NetworkManager-0.995.0.0-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-03-05 08:49:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
[PATCH 1/1] fixup! platform, devices: add support for vxlan devices (11.67 KB, patch)
2014-03-03 09:23 EST, Thomas Haller
no flags Details | Diff

  None (edit)
Description Dan Williams 2014-02-18 18:05:22 EST
Now that the upstream kernel has finished IPv6 support for VxLAN interfaces, we should resurrect danw's vxlan work from last year, finish it, and push it to master.  VxLAN is used extensively in enterprise environments and we should support it natively.
Comment 1 Dan Winship 2014-02-19 15:04:57 EST
pushed danw/vxlan

Not much is changed from the earlier version. IPv6 support was added in the kernel by adding new "local6" and "group6" properties in addition to the (IPv4-only) "local" and "group" properties (and at least one of the two pairs will always be unset). NMPlatform exposes both properties, but NMDeviceVxlan merges them together into the (string-valued) "local" and "group" properties. I wasn't sure if maybe I should expose a "family" property too?

The only other kernel addition since the original patch is the "port" property. In the interest of being more self-documenting than the kernel API, "port" is exposed as "dst_port", and the older "port_min" and "port_max" properties are now renamed to "src_port_min" and "src_port_max".
Comment 2 Thomas Haller 2014-02-20 09:45:40 EST
Is this compile time check really necessary? Even if at compile time the feature is available/not-available in the headers, users could boot another kernel with the feature disabled/enabled. NM should in both cases be able to function properly and provide VxLAN support, whenever the kernel can.

Also, if a kernel update brings support to a distribution, the NM package has to be rebuild as well.

Maybe you should just copy the IFLA_VXLAN_* enum from the linux/if_link.h (and name them NM_IFLA_VXLAN_*). Those numeric values won't change (because the kernel doesn't break user space) and libnl3 is agnostic to those, because we parse the netlink message ourselves.





I would coerce the gboolean values to 0 or 1.

-    props->l3miss = nla_get_u8 (tb[IFLA_VXLAN_L3MISS]);
+    props->l3miss = !!nla_get_u8 (tb[IFLA_VXLAN_L3MISS]);



Also, let vxlan_get_properties() write a debug() message if nle!=0, so that we can figure out whats going on. Notice, there is a nm_log_warn() in update_properties(), but at that point, there is no indication what went wrong, because the nle is not available. Or how about giving a GError parameter to nm_platform_vxlan_get_properties() and logging that?




I would at the very beginning of vxlan_info_data_parser() "memset (props, 0, sizeof (*props)), to ensure that we always set all fields to 0/NULL.

And then get rid of the lines such as
    else
        memset (&props->group6, 0, sizeof (props->group6));




-#define NM_DEVICE_VXLAN_SRC_PORT_MIN "src_port_min"
-#define NM_DEVICE_VXLAN_SRC_PORT_MAX "src_port_max"
+#define NM_DEVICE_VXLAN_SRC_PORT_MIN "src-port-min"
+#define NM_DEVICE_VXLAN_SRC_PORT_MAX "src-port-max"



- Copyright 2013 Red Hat, Inc.



In new gobjects, could we adhere to the pattern with obj_properties? See "src/nm-ip6-config.c". The advantage is, that when raising a notify signal in update_properties(), you don't have to do a lookup by string. And also, if we later need to get hold onto the GParamSpec instance to get meta information, we can do it more easily.



Use nm_utils_inet[46]_ntop instead of inet_ntop.


Is it correct, that the group/group6/local/local6 properties collide to one NM_DEVICE_VXLAN_GROUP/NM_DEVICE_VXLAN_LOCAL? Are they mutually exclusive? vxlan_info_data_parser() does not enforce that...


-         else if (priv->props.group6.s6_addr[0])
+         else if (!IN6_IS_ADDR_UNSPECIFIED (&priv->props.group6.s6_addr[0]))
and likewise for local6.




NMDeviceVxlan keeps a weak pointer to the parent device. If at the moment of update_properties() the parent device does not exist (e.g. because it was not announced before being visible via UDEV), this will stay NULL. NMDeviceVxlan has to watch nm_manager() to check, whether the parent might be created. Similarly, using a weak pointer only helps to know, when the device gets disposed. But we want to get notified, when the device gets removed from nm_manager_get_device_by_ifindex() -- which might not be the same.
Comment 3 Dan Winship 2014-02-20 10:27:34 EST
(In reply to Thomas Haller from comment #2)
> Is this compile time check really necessary?

As you point out, it could be worked around, and there are arguments for doing so. However, being able to see the properties of vxlan devices over D-Bus is not a Must Have Killer Feature in the general case, and F20 and RHEL7 already have new enough kernels anyway, so it didn't seem worth it to do any extra work to support those cases.

> In new gobjects, could we adhere to the pattern with obj_properties?

> Use nm_utils_inet[46]_ntop instead of inet_ntop.

(The explanation for both of these is that the code was actually written a year ago...)

> Is it correct, that the group/group6/local/local6 properties collide to one
> NM_DEVICE_VXLAN_GROUP/NM_DEVICE_VXLAN_LOCAL? Are they mutually exclusive?
> vxlan_info_data_parser() does not enforce that...

They are mutually exclusive; the device can only have a single local address and a single multicast group address. But the original properties were only IPv4-address-sized, so when they added IPv6 support, they had to add new properties. The kernel enforces the fact that only one of them can be set.
Comment 4 Thomas Haller 2014-02-20 11:18:00 EST
(In reply to Dan Winship from comment #3)
> (In reply to Thomas Haller from comment #2)
> > Is this compile time check really necessary?
> 
> As you point out, it could be worked around, and there are arguments for
> doing so. However, being able to see the properties of vxlan devices over
> D-Bus is not a Must Have Killer Feature in the general case, and F20 and
> RHEL7 already have new enough kernels anyway, so it didn't seem worth it to
> do any extra work to support those cases.

The workaround would be to copy&paste 15 lines from linux/if_link.h, saving the entire "#if HAVE_VXLAN". It will probably use less lines of code.
Comment 5 Dan Winship 2014-02-20 14:59:18 EST
(In reply to Thomas Haller from comment #2)
> Maybe you should just copy the IFLA_VXLAN_* enum from the linux/if_link.h
> (and name them NM_IFLA_VXLAN_*).

I copied the enum but redid them as #defines using the same names rather than an enum with different names, so that when we later require kernel 3.11+, we can just remove the defines and not need to change anything else.

> I would coerce the gboolean values to 0 or 1.
> Also, let vxlan_get_properties() write a debug() message if nle!=0

Done and done, though I went with warning() rather than debug() since it's not supposed to happen...

Also, I added two more commits making the other device types in NMLinuxPlatform consistent with that, and another with a drive-by bug fix

> In new gobjects, could we adhere to the pattern with obj_properties?

I didn't do this, but we can update them all later.

> NMDeviceVxlan keeps a weak pointer to the parent device. If at the moment of
> update_properties() the parent device does not exist (e.g. because it was
> not announced before being visible via UDEV), this will stay NULL.

The parent has to exist before the vxlan does, so the only way this could happen would be if udev saw the creation of both the parent and the vxlan, but then announced the vxlan before the parent. I guess this technically could happen, but it seems improbable. And if that happened with a VLAN, we'd completely fail to create the device, so there's a larger issue here...

> Similarly, using a weak pointer only helps to know, when the
> device gets disposed. But we want to get notified, when the device gets
> removed from nm_manager_get_device_by_ifindex() -- which might not be the
> same.

It should be the same; if it wasn't, then the device would continue to be exported via D-Bus even after the manager thought it didn't exist any more.


Everything else should be fixed. Branch re-pushed
Comment 6 Thomas Haller 2014-02-20 15:21:42 EST
(In reply to Dan Winship from comment #5)

> > In new gobjects, could we adhere to the pattern with obj_properties?
> 
> I didn't do this, but we can update them all later.

why not now? Either now or never. It would be useful already now, seeing all those notify() calls.


> > NMDeviceVxlan keeps a weak pointer to the parent device. If at the moment of
> > update_properties() the parent device does not exist (e.g. because it was
> > not announced before being visible via UDEV), this will stay NULL.
> 
> The parent has to exist before the vxlan does, so the only way this could
> happen would be if udev saw the creation of both the parent and the vxlan,
> but then announced the vxlan before the parent. I guess this technically
> could happen, but it seems improbable. And if that happened with a VLAN,
> we'd completely fail to create the device, so there's a larger issue here...

When you startup NM, it will receive via netlink quickly the parent and the vxlan. The vxlan is_anounceable immediately, because  link_is_software(). The parent (e.g. em1), is only announcable later, when the UDev event arrives. 


> > Similarly, using a weak pointer only helps to know, when the
> > device gets disposed. But we want to get notified, when the device gets
> > removed from nm_manager_get_device_by_ifindex() -- which might not be the
> > same.
> 
> It should be the same; if it wasn't, then the device would continue to be
> exported via D-Bus even after the manager thought it didn't exist any more.

I don't agree. We should not make such assumptions, it's not uncommon to take references and it might be hard to verify, that the reference is returned when the device is removed form the manager. If that is a problem with exporting via DBUS, that is then ~another~ problem.




Also, I think link_is_software() to return TRUE for Vxlan.
Comment 7 Dan Williams 2014-02-21 13:44:13 EST
I think the first 3 trivial patches could be merged to git master immediately.
Comment 8 Dan Williams 2014-02-21 13:53:26 EST
I don't have a strong feeling on the _NOTIFY stuff, but if we do that, we should put it somewhere common so we don't have to redefine it in every file.  Yeah, it's only one line, but...

The udev thing probably will be an issue; so it probably will have to watch for added/removed signals for the parent or something.

The device reference thing *could* be a larger problem; in general we don't have many cross-references between devices, and that's typically what we want.  Into the future though, we'll likely have more (with parent/child relationships) so we'll have to be a lot more careful about this.

The dcbw/dev-plugins branch has some stuff for a device 'removed' signal, which we should start using more extensively, and also un-export the object when we get that signal.  This means that child devices could listen for that signal and update their own properties too.
Comment 9 Dan Winship 2014-02-26 13:56:11 EST
(In reply to Thomas Haller from comment #6)
> > > In new gobjects, could we adhere to the pattern with obj_properties?
> > 
> > I didn't do this, but we can update them all later.
> 
> why not now?

Because I don't want to make nm-device-vxlan different from all the other generic devices, and I don't want to port all of the rest of them over unnecessarily at this point in the release cycle.

> > > NMDeviceVxlan keeps a weak pointer to the parent device. If at the moment of
> > > update_properties() the parent device does not exist (e.g. because it was
> > > not announced before being visible via UDEV), this will stay NULL.

OK, I've improved the situation here, though still with one small bug. But I think it's good enough for now (particularly since the bug is not likely to be relevant at startup anyway).

danw/trackdevice has the start of the larger fix, but it clearly has a lot of potential for breaking things.

> Also, I think link_is_software() to return TRUE for Vxlan.

It does, that's handled automatically by having it in the right place in the NMLinkType enumeration.
Comment 10 Dan Williams 2014-02-28 13:22:53 EST
> core: make nm_manager_get() not ref the manager

I've got basically the same thing on dcbw/dev-plugins as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=dcbw/dev-plugins&id=dd9cf51525e9c1ab8e3d8479677fe3a3270ba84d and we should just cherry-pick one of these to git master immediately.  I don't care which, take the one you like best.

But in reality, the branch looks fine to me now, so if Thomas is OK with it, just merge your bits and I'll drop my manager ref patch from dcbw/dev-plugins and rebase.
Comment 11 Thomas Haller 2014-03-03 09:16:41 EST
>> devices: fix up parent/peer tracking in some virtual devices

NMDeviceVeth::dispose() should set priv->peer = NULL after removing the weak pointer. Also, this is relatively bad bug with potential crash. You could split it to its own commit (in case we have to backport it).



>> platform, devices: add support for vxlan devices

+static const struct nla_policy vxlan_info_policy[IFLA_VXLAN_MAX + 1] = {
+»···[IFLA_VXLAN_ID]»»···= { .type = NLA_U32 },

Maybe replace the tabs before '=' by spaces?




(In reply to Dan Winship from comment #9)
> > Also, I think link_is_software() to return TRUE for Vxlan.
> 
> It does, that's handled automatically by having it in the right place in the
> NMLinkType enumeration.

Are you sure about this? I am talking about link_is_software() in src/platform/nm-linux-platform.c .
Comment 12 Thomas Haller 2014-03-03 09:23:27 EST
Created attachment 869942 [details]
[PATCH 1/1] fixup! platform, devices: add support for vxlan devices

I don't mind about the use of the _NOTIFY macro (or whether it is defined in a header file), but I think it's a waste to do lookups by string names that can easily avoided.

Also, I don't suggest, that all old files should be updated. So, there will always be the case that newer files use a better(?) style/function/best-practice that older.

If we agree, that this pattern with g_object_class_install_properties() / obj_properties is useful, then it should be done for new objects.
Comment 13 Dan Winship 2014-03-03 10:21:02 EST
(In reply to Thomas Haller from comment #11)
> NMDeviceVeth::dispose() should set priv->peer = NULL after removing the weak
> pointer. Also, this is relatively bad bug with potential crash. You could
> split it to its own commit (in case we have to backport it).

fixed and squashed; NMDeviceVeth doesn't exist in 0.9.8, so there's no possibility of a backport.

> +static const struct nla_policy vxlan_info_policy[IFLA_VXLAN_MAX + 1] = {
> +»···[IFLA_VXLAN_ID]»»···= { .type = NLA_U32 },
> 
> Maybe replace the tabs before '=' by spaces?

fixed, and added another commit fixing the same bug in the other nla_policy declarations.

> (In reply to Dan Winship from comment #9)
> Are you sure about this? I am talking about link_is_software() in
> src/platform/nm-linux-platform.c .

Ugh. We really need to fix that code. Fixed by adding "vxlan" to the list for now.


Repushed, without your fixup. Let's fix up all the classes in a month or so.

(Oh, also, I replaced my "don't ref the NMManager singleton" commit with dcbw's from dcbw/dev-plugins)
Comment 14 Thomas Haller 2014-03-03 11:03:48 EST
If anything, I would modify the commit message

> platform: fix some tabs-vs-spaces
> trivial/platform: fix some tabs-vs-spaces

or something like that.



Otherwise I got nothing left to say. Good
Comment 15 Jirka Klimes 2014-03-04 08:15:41 EST
danw/vxlan branch looks right to me now. I was going through and didn't spot any problem.
Comment 16 Dan Winship 2014-03-06 09:51:00 EST
pushed to master
Comment 17 Ludek Smid 2014-06-26 07:16:40 EDT
This request was not resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you need
to escalate this bug.
Comment 22 errata-xmlrpc 2015-03-05 08:49:26 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0311.html

Note You need to log in before you can comment on or make changes to this bug.