Bug 103469
Summary: | Wrong source MAC w/ bridge originating ICMPv6 messages | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux Beta | Reporter: | David Woodhouse <dwmw2> | ||||||||
Component: | radvd | Assignee: | Elliot Lee <sopwith> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Ben Levenson <benl> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | beta1 | CC: | pekkas | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2003-09-19 20:06:06 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
David Woodhouse
2003-08-31 15:53:12 UTC
This seems to be a kernel problem, so changing the component. Radvd code doesn't fiddle with MAC addresses at all, and it works on routed networks: # tcpdump -e -n -vvv -s 1500 icmp6 tcpdump: listening on eth0 13:11:59.546764 0:0:f8:8:3e:d 33:33:0:0:0:1 ip6 110: fe80::200:f8ff:fe08:3e0d > ff02::1: icmp6: router advertisement(chlim=64, router_ltime=300, reachable_time=0, retrans_time=0)(prefix info: LA valid_ltime=2592000, preffered_ltime=604800, prefix=2001:708:10:10::/64)(src lladdr: 0:0:f8:8:3e:d) (len 56, hlim 255) What does the 'AdvSourceLLAddress' option do? I see it adds the MAC address to the outgoing packet -- what does this achieve? Then there are _three_ copies of the MAC address on the outgoing packet; one in the actual MAC header, one encoded in the IPv6 source address, and one in the payload... only the latter of which is correct (and then only because I hardcoded it in my copy of radvd for the pan0 interface). Which of the three MAC addresses would a client look at? Would I be able to fix this by adding a configuration option to radvd for the MAC address to use in outgoing adverts? David, while you're editing the radvd sources, can you see what call it makes into the kernel to obtain the MAC address of an interface in the first place? This would allow me to analyze what bridging might be doing wrong faster. Thanks. In device-linux.c, in setup_deviceinfo(), it uses ioctl(SIOCFIGHWADDR). It only adds this to the packet payload though -- the MAC address used in the packet itself, for the hardware header and the IPv6 source address, come from elsewhere. If you set AdvSourceLLAddress to 'off' in the configuration, radvd won't use the MAC address it obtain this way at all. First, AdvSourceLLAddress adds the source L2 address to the packets it sends, in above example, "(src lladdr: 0:0:f8:8:3e:d)". The reason why the option is in the specification is that omitting src lladdr could be used in some load-balancing scenarios; then all the hosts must request the IPv6 -> L2 address mapping themselves through neighbor discovery procedures. The problem you're having occurs because you want your system to act both a bridge between the two links, *and* the router for the subnet consisting of the two links. I believe this is a difficult problem to address, sure, but not something specific to radvd. You might get around this specific problem by configuring radvd to perform identical advertisements on both the physical interfaces (instead of the bridge pseudointerface). Or even different ones, even though that would be confusing. Not sure if you've tested this already? This problem seems analogous to IPv4 + ARP + bridging scenario: 1) configuring IP address range like 10.0.0.1/255.255.255.0 on the logical bridge interface consisting of at least 2 physical interfaces, 2) sending a packet to 10.0.0.255, and 3) observing which interfaces the Ethernet broadcast goes to, and using which source MAC addresses. (My hunch for the "correct" behaviour would be: the router should send two Ethernet/L2 broadcasts on each physical interface, each with the specific physical interface's source address. But I'm not sure..) Perhaps you could try to test something similar to see how it goes? Depending on that behaviour, perhaps one could figure out whether there's something to be fixed in the kernel's IPv6 bridge + router logic? I can probably get around this specific problem by setting AdvSourceLLAddress to off. I've observed with tcpdump that neighbour solicitation _does_ work, and after neighbour solicitation I see _correctly_ addressed packets from the client... until the next radv goes out. So if I configure radvd to omit the lladdr from its outgoing packets, the client should rely solely on neighbour solicitation? That should work for me. In the case of an Ethernet-bridge, surely any client on any physical interface (which is part of the bridge) and use _any_ MAC address to send packets to the router? So it should be sufficient for radvd to send out only a single L2 broadcast, which will go out on each physical interface, with _any_ of the valid MAC addresses? In fact, it seems that SIOCGHWADDR _will_ return a valid MAC address if there are any devices attached to the bridge at the time; it only returns all zeroes when there are none. So changing radvd to do SIOCGHWADDR each time it sends a packet, rather than only once at startup, also ought to work. Btw, the situation here is that there are _no_ devices on the bridge most of the time -- a 'pan0' bridge is created, and dhcpd is configured to listen on it. As and when Bluetooth BNEP clients arrive, their individual bnep%d interface is added to the bridge... and hence dhcpd works without having to muck around with its configuration and restart it. It looks like in the case where there is at least one physical device on the bridge when radvd starts, it would have worked. Perhaps we can just automatically turn off AdvSourceLLAddr if the MAC address is all zeroes or otherwise obviously invalid? I like better your idea to make radvd probe for the hw address when it sends the packets, not just once at startup. This is sounding more and more like a radvd issue, and less and less like a kernel issue. Do you guys mind if I change the component back to radvd? :-) Just adding a call to setup_deviceinfo(sock, iface) inside the if (iface->AdvSourceLLAddress && iface->if_hwaddr_len != -1) at around line 140 of send.c seems to do the right thing. Will test it when I get home. Suspect there's no need even to check for failure -- it should never happen and we'll just revert to the current behaviour then anyway; using the MAC address we had before. Well, I would mind ... That is, I fail to see why this is a radvd problem. This seems more like a problem in the kernel (if it doesn't return correct HW addresses or send the ICMP packets out on all the interfaces, maybe it does) or the initscripts used to set up the networking for those Bluetooth etc. devices (why are they starting radvd on an empty bridge interface?) or their plug-on/off features (why are they not signalling the daemons that there has been a change in topology?). Has anyone tested how sending packets behaves with IPv4, like I described? But perhaps that's irrelevant to this discussion.. Regarding the MAC addresses, there are some scenarios where it might be important to use the correct ones (there are mechanisms which are under specification mapping MAC and IPv6 identifier together; there are mechanisms in Ethernet switches which police allowed MAC addresses, etc.), but in the general case, I guess you can use any one of them -- the main point is that the broadcast packet must be sent on all the physical interfaces. As for the fixes, 1) having to do an IOCTL (and lots of other things) every time radvd sends a packet just to cope with a very weird scenario? Uh, no. 2) Turning off AdvSourceLLAddr with a warning could be considered if empty HWADDR is obtained from the kernel. Breaks the principle of least surprise a bit though.. 3) Or, we could just require that initscripts restart radvd everytime bridge becomes empty or becomes populated again. Might be feasible if these changes are done via hotplug. It looks to me that the third might be viable. As a matter of fact, we already do similar stuff with 6to4 scripts. Sending HUP to radvd every time there is a critical change should make it reconsider its IP addresses and HW addresses (among other things). I guess it boils down to the argument on whether the daemons should continuously monitor the changes in the environment they run at (IP addresses, interfaces, etc.), or whether they should expect them to be OK after startup. All I care about is whether David's specific bug is a kernel component issue any more, which I think it is not. I would rather this be assigned to the person who would work on it, which if it isn't a kernel issue would not be me. So please Pekka, pick the component you think is appropriate. The kernel _is_ returning correct HW addresses, and is sending out the packets on all interfaces. What's happening is this: Bridge is set up with no devices attached. radvd queries for MAC address of bridge. kernel returns 00:00:00:00:00:00. radvd sends packets out advertising 00:00:00:00:00:00 to... no devices. A device is added. Kernel would now return _its_ MAC address if asked. radvd continues to send packets out advertising old all-zero MAC address. I suppose I could change the BNEP dev-up script to 'killall -HUP radvd' after adding the new bnep%d device to the bridge, if that's what's considered most appropriate. Since nobody else really uses bridges with _no_ devices attached, that would probably be sufficient to avoid the problem in practice. Ok, I think the fix belongs in BNEP initscripts (whichever component that is), one can just do /sbin/service radvd reload. However, I also think patching radvd so it'll warn about all-zeroes HWADDR when starting up would be appropriate. I'll change this PR back to radvd component, but perhaps something relating to BNEP would be more appropriate. Created attachment 94119 [details]
patch to warn about zero link-layer address
Please test the attached patch whether it logs a warning when you start radvd
without the link-layer address. I'm a bit dubious whether my hack to detect
zero lladdr was successful or not :-).
Yeah -- I'm dubious about it too :) Casting your char * to an int and then comparing it with zero is only going to be true if it was a NULL pointer. Sending SIGHUP in the initscripts whenever they add a device to a bridge works fine. As does 'AdvSourceLLAddress off;' for the bridge in radvd.conf. As does the one-liner to make it refetch the MAC address each time it sends a radv. I debugged this a bit on RHL9 system w/ an empty bridge at br0. For me, the ioctl returns the identical MAC address as from executing the IOCTL on eth0. (I added a log() call before the lines of my patch, to print out iface->hwaddr, expecting to get something like zero for br0, but didn't.) What's different? How does it look like with you? (Note: /sbin/ifconfig br0 gives all-zero lladdr though) Bizarre. On the Cambridge kernel I see all-zero MAC address from ifconfig and from SIOCGIFHWADDR, until I've added a device to the bridge. What does 'brctl show br0' say? Or are you really printing iface->hwaddr rather than *(iface->hwaddr)? This gives me all zeroes... if (iface->if_hwaddr_len == 48) { dlog(LOG_DEBUG, 3, "MAC address of %s is %02x:%02x:%02x:%02x:%02x:%02x", iface->Name, iface->if_hwaddr[0], iface->if_hwaddr[1], iface->if_hwaddr[2], iface->if_hwaddr[3], iface->if_hwaddr[4], iface->if_hwaddr[5]); } Created attachment 94140 [details]
new patch for linux to warn about zero link-layer address
Indeed, I was checking it badly, not printing out the real MAC address.
See if this works for me. Works for me at least. This is what I thought of
doing in the first place, but looked like too heavy-weight a solution. Check
it out.
Now consider the if_hwaddr_len == -1 case :) Couldn't you just loop over elements if iface->hwaddr[] checking for zero one byte at a time? Also -- is an all-zeroes LL address invalid on _all_ hardware types? TBH I suspect the better answer was to reread the MAC address each time we send -- it really isn't much more than a single ioctl(). It sounds like you all are way ahead of me as far as understanding the problem. Just tell me what patch to apply where... Created attachment 94424 [details]
the latest patch
I've committed the attached patch to radvd CVS after hearing no
objections on the radvd development list. You might want to use it, or a
subset of it
(e.g. the device-linux part) in your radvd packaging -- as we probably won't
release the next
version of radvd any time soon.
Latest patch looks sane, thanks. Also, initscripts patch against Bug #104421 causes radvd to be sent a HUP each time a device is added to, or removed from a bridge. I note radvd refuses to start at all, if an interface it wants to see is missing. Perhaps it should only bitch a little and continue, ignoring that interface? Also, it could warn about there not being a _route_ out an interface to match what it's advertising in that direction. Finally, the init script could probably just send a HUP for 'reload' instead of stopping and starting again. Fix should be in rawhide soon |