Bug 103469

Summary: Wrong source MAC w/ bridge originating ICMPv6 messages
Product: [Retired] Red Hat Linux Beta Reporter: David Woodhouse <dwmw2>
Component: radvdAssignee: Elliot Lee <sopwith>
Status: CLOSED RAWHIDE QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: beta1CC: pekkas
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-09-19 20:06:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to warn about zero link-layer address
none
new patch for linux to warn about zero link-layer address
none
the latest patch none

Description David Woodhouse 2003-08-31 15:53:12 UTC
radvd advertises itself as router with MAC address 00:00:00:00:00:00 out all
physical devices which are part of an Ethernet bridge, rather than using the
real MAC address of each physical device.

14:07:03.327200 0:0:0:0:0:0 33:33:0:0:0:1 ip6 110: fe80::200:ff:fe00:0 >
ff02::1: icmp6: router advertisement

Steps to reproduce: 

ifdown eth1 
brctl addbr br0 
brctl addif br0 eth1 
cd /etc/sysconfig/network-scripts
cat ifcfg-eth1 | sed s/eth1/br0/g > ifcfg-br0
ifup br0
perl -pi -e s/eth1/br0/ /etc/radvd.conf
service radvd restart

... or something like that. In fact it was triggered with BNEP interfaces as
part of a bridge, since we _have_ to handle them that way for the benefit of dhcpd.

Comment 1 Pekka Savola 2003-09-01 10:13:02 UTC
This seems to be a kernel problem, so changing the component.  Radvd code
doesn't fiddle with MAC addresses at all, and it works on routed networks:

# tcpdump -e -n -vvv -s 1500 icmp6
tcpdump: listening on eth0
13:11:59.546764 0:0:f8:8:3e:d 33:33:0:0:0:1 ip6 110: fe80::200:f8ff:fe08:3e0d >
ff02::1: icmp6: router advertisement(chlim=64, router_ltime=300,
reachable_time=0, retrans_time=0)(prefix info: LA valid_ltime=2592000,
preffered_ltime=604800, prefix=2001:708:10:10::/64)(src lladdr: 0:0:f8:8:3e:d)
(len 56, hlim 255)

Comment 2 David Woodhouse 2003-09-01 10:49:53 UTC
What does the 'AdvSourceLLAddress' option do? I see it adds the MAC address to
the outgoing packet -- what does this achieve? Then there are _three_ copies of
the MAC address on the outgoing packet; one in the actual MAC header, one
encoded in the IPv6 source address, and one in the payload... only the latter of
which is correct (and then only because I hardcoded it in my copy of radvd for
the pan0 interface).

Which of the three MAC addresses would a client look at? Would I be able to fix
this by adding a configuration option to radvd for the MAC address to use in
outgoing adverts?

Comment 3 David Miller 2003-09-01 10:58:27 UTC
David, while you're editing the radvd sources, can you see
what call it makes into the kernel to obtain the MAC address
of an interface in the first place?

This would allow me to analyze what bridging might be doing
wrong faster.

Thanks.


Comment 4 David Woodhouse 2003-09-01 11:03:02 UTC
In device-linux.c, in setup_deviceinfo(), it uses ioctl(SIOCFIGHWADDR).

It only adds this to the packet payload though -- the MAC address used in the
packet itself, for the hardware header and the IPv6 source address, come from
elsewhere. If you set AdvSourceLLAddress to 'off' in the configuration, radvd
won't use the MAC address it obtain this way at all.



Comment 5 Pekka Savola 2003-09-01 11:09:13 UTC
First, AdvSourceLLAddress adds the source L2 address to the packets it sends, in
above example, "(src lladdr: 0:0:f8:8:3e:d)".

The reason why the option is in the specification is that omitting src lladdr
could be used in some load-balancing scenarios; then all the hosts must request
the IPv6 -> L2 address mapping themselves through neighbor discovery procedures.

The problem you're having occurs because you want your system to act both a
bridge between the two links, *and* the router for the subnet consisting of the
two links.  

I believe this is a difficult problem to address, sure, but not something
specific to radvd.

You might get around this specific problem by configuring radvd to perform
identical advertisements on both the physical interfaces (instead of the bridge
pseudointerface).  Or even different ones, even though that would be confusing.
 Not sure if you've tested this already?

This problem seems analogous to IPv4 + ARP + bridging scenario: 
 1) configuring IP address range like 10.0.0.1/255.255.255.0 on the logical
bridge interface consisting of at least 2 physical interfaces,
 2) sending a packet to 10.0.0.255, and
 3) observing which interfaces the Ethernet broadcast goes to, and using which
source MAC addresses.

(My hunch for the "correct" behaviour would be: the router should send two
Ethernet/L2 broadcasts on each physical interface, each with the specific
physical interface's source address.  But I'm not sure..)

Perhaps you could try to test something similar to see how it goes?  Depending
on that behaviour, perhaps one could figure out whether there's something to be
fixed in the kernel's IPv6 bridge + router logic?


Comment 6 David Woodhouse 2003-09-01 11:47:50 UTC
I can probably get around this specific problem by setting AdvSourceLLAddress to
off. I've observed with tcpdump that neighbour solicitation _does_ work, and
after neighbour solicitation I see _correctly_ addressed packets from the
client... until the next radv goes out.

So if I configure radvd to omit the lladdr from its outgoing packets, the client
should rely solely on neighbour solicitation? That should work for me.

In the case of an Ethernet-bridge, surely any client on any physical interface
(which is part of the bridge) and use _any_ MAC address to send packets to the
router? So it should be sufficient for radvd to send out only a single L2
broadcast, which will go out on each physical interface, with _any_ of the valid
MAC addresses?

In fact, it seems that SIOCGHWADDR _will_ return a valid MAC address if there
are any devices attached to the bridge at the time; it only returns all zeroes
when there are none. So changing radvd to do SIOCGHWADDR each time it sends a
packet, rather than only once at startup, also ought to work.






Comment 7 David Woodhouse 2003-09-01 11:53:56 UTC
Btw, the situation here is that there are _no_ devices on the bridge most of the
time -- a 'pan0' bridge is created, and dhcpd is configured to listen on it.

As and when Bluetooth BNEP clients arrive, their individual bnep%d interface is
added to the bridge... and hence dhcpd works without having to muck around with
its configuration and restart it.

It looks like in the case where there is at least one physical device on the
bridge when radvd starts, it would have worked. Perhaps we can just
automatically turn off AdvSourceLLAddr if the MAC address is all zeroes or
otherwise obviously invalid?

Comment 8 David Miller 2003-09-01 12:03:12 UTC
I like better your idea to make radvd probe for the hw address
when it sends the packets, not just once at startup.

This is sounding more and more like a radvd issue, and less and
less like a kernel issue.  Do you guys mind if I change the component
back to radvd? :-)



Comment 9 David Woodhouse 2003-09-01 12:20:58 UTC
Just adding a call to setup_deviceinfo(sock, iface) inside the
        if (iface->AdvSourceLLAddress && iface->if_hwaddr_len != -1)
at around line 140 of send.c seems to do the right thing. Will test it when I
get home.

Suspect there's no need even to check for failure -- it should never happen and
we'll just revert to the current behaviour then anyway; using the MAC address we
had before.


Comment 10 Pekka Savola 2003-09-01 12:33:08 UTC
Well, I would mind ... 

That is, I fail to see why this is a radvd problem.  This seems more like a
problem in the kernel (if it doesn't return correct HW addresses or send the
ICMP packets out on all the interfaces, maybe it does) or the initscripts used
to set up the networking for those Bluetooth etc. devices (why are they starting
radvd on an empty bridge interface?) or their plug-on/off features (why are they
not signalling the daemons that there has been a change in topology?).

Has anyone tested how sending packets behaves with IPv4, like I described?  But
perhaps that's irrelevant to this discussion..

Regarding the MAC addresses, there are some scenarios where it might be
important to use the correct ones (there are mechanisms which are under
specification mapping MAC and IPv6 identifier together; there are mechanisms in
Ethernet switches which police allowed MAC addresses, etc.), but in the general
case, I guess you can use any one of them -- the main point is that the
broadcast packet must be sent on all the physical interfaces.

As for the fixes, 

1) having to do an IOCTL (and lots of other things) every time radvd sends a
packet just to cope with a very weird scenario?  Uh, no. 

2) Turning off AdvSourceLLAddr with a warning could be considered if empty
HWADDR is obtained from the kernel.  Breaks the principle of least surprise a
bit though..

3) Or, we could just require that initscripts restart radvd everytime bridge
becomes empty or becomes populated again.  Might be feasible if these changes
are done via hotplug.

It looks to me that the third might be viable.  As a matter of fact, we already
do similar stuff with 6to4 scripts.  Sending HUP to radvd every time there is a
critical change should make it reconsider its IP addresses and HW addresses
(among other things).

I guess it boils down to the argument on whether the daemons should continuously
monitor the changes in the environment they run at (IP addresses, interfaces,
etc.), or whether they should expect them to be OK after startup.

Comment 11 David Miller 2003-09-01 12:48:16 UTC
All I care about is whether David's specific bug is a kernel component issue
any more, which I think it is not.

I would rather this be assigned to the person who would work on it,
which if it isn't a kernel issue would not be me.

So please Pekka, pick the component you think is appropriate.


Comment 12 David Woodhouse 2003-09-01 13:07:16 UTC
The kernel _is_ returning correct HW addresses, and is sending out the packets
on all interfaces. What's happening is this:

Bridge is set up with no devices attached.
radvd queries for MAC address of bridge.
kernel returns 00:00:00:00:00:00.
radvd sends packets out advertising 00:00:00:00:00:00 to... no devices.
A device is added. Kernel would now return _its_ MAC address if asked.
radvd continues to send packets out advertising old all-zero MAC address.

I suppose I could change the BNEP dev-up script to 'killall -HUP radvd' after
adding the new bnep%d device to the bridge, if that's what's considered most
appropriate. Since nobody else really uses bridges with _no_ devices attached,
that would probably be sufficient to avoid the problem in practice.



Comment 13 Pekka Savola 2003-09-01 15:28:00 UTC
Ok, I think the fix belongs in BNEP initscripts (whichever component that is),
one can just do /sbin/service radvd reload.

However, I also think patching radvd so it'll warn about all-zeroes HWADDR when
starting up would be appropriate.

I'll change this PR back to radvd component, but perhaps something relating to
BNEP would be more appropriate.

Comment 14 Pekka Savola 2003-09-01 16:32:23 UTC
Created attachment 94119 [details]
patch to warn about zero link-layer address

Please test the attached patch whether it logs a warning when you start radvd
without the link-layer address.  I'm a bit dubious whether my hack to detect
zero lladdr was successful or not :-).

Comment 15 David Woodhouse 2003-09-01 22:19:23 UTC
Yeah -- I'm dubious about it too :)

Casting your char * to an int and then comparing it with zero is only going to
be true if it was a NULL pointer.

Sending SIGHUP in the initscripts whenever they add a device to a bridge works
fine. As does 'AdvSourceLLAddress off;' for the bridge in radvd.conf. As does
the one-liner to make it refetch the MAC address each time it sends a radv.

Comment 16 Pekka Savola 2003-09-02 06:29:05 UTC
I debugged this a bit on RHL9 system w/ an empty bridge at br0.

For me, the ioctl returns the identical MAC address as from executing the IOCTL
on eth0.

(I added a log() call before the lines of my patch, to print out iface->hwaddr,
expecting to get something like zero for br0, but didn't.)

What's different?  How does it look like with you?

(Note: /sbin/ifconfig br0 gives all-zero lladdr though)

Comment 17 David Woodhouse 2003-09-02 10:04:07 UTC
Bizarre. On the Cambridge kernel I see all-zero MAC address from ifconfig and
from SIOCGIFHWADDR, until I've added a device to the bridge.

What does 'brctl show br0' say?

Or are you really printing iface->hwaddr rather than *(iface->hwaddr)? This
gives me all zeroes...

       if (iface->if_hwaddr_len == 48) {
                dlog(LOG_DEBUG, 3, "MAC address of %s is
%02x:%02x:%02x:%02x:%02x:%02x",
                        iface->Name, iface->if_hwaddr[0], iface->if_hwaddr[1],
                iface->if_hwaddr[2], iface->if_hwaddr[3], iface->if_hwaddr[4],
iface->if_hwaddr[5]);
        }


Comment 18 Pekka Savola 2003-09-02 13:11:38 UTC
Created attachment 94140 [details]
new patch for linux to warn about zero link-layer address

Indeed, I was checking it badly, not printing out the real MAC address.

See if this works for me.  Works for me at least.  This is what I thought of
doing in the first place, but looked like too heavy-weight a solution.	Check
it out.

Comment 19 David Woodhouse 2003-09-02 13:21:30 UTC
Now consider the if_hwaddr_len == -1 case :)

Couldn't you just loop over elements if iface->hwaddr[] checking for zero one
byte at a time? 

Also -- is an all-zeroes LL address invalid on _all_ hardware types?

TBH I suspect the better answer was to reread the MAC address each time we send
-- it really isn't much more than a single ioctl().

Comment 20 Elliot Lee 2003-09-05 17:52:41 UTC
It sounds like you all are way ahead of me as far as understanding the problem. Just tell 
me what patch to apply where...

Comment 21 Pekka Savola 2003-09-11 19:11:23 UTC
Created attachment 94424 [details]
the latest patch

I've committed the attached patch to radvd CVS after hearing no 
objections on the radvd development list.  You might want to use it, or a
subset of it 
(e.g. the device-linux part) in your radvd packaging -- as we probably won't
release the next
version of radvd any time soon.

Comment 22 David Woodhouse 2003-09-16 05:21:20 UTC
Latest patch looks sane, thanks.

Also, initscripts patch against Bug #104421 causes radvd to be sent a HUP each
time a device is added to, or removed from a bridge.

I note radvd refuses to start at all, if an interface it wants to see is
missing. Perhaps it should only bitch a little and continue, ignoring that
interface?

Also, it could warn about there not being a _route_ out an interface to match
what it's advertising in that direction.

Finally, the init script could probably just send a HUP for 'reload' instead of
stopping and starting again.


Comment 23 Elliot Lee 2003-09-19 20:06:06 UTC
Fix should be in rawhide soon