Bug 1487930
| Summary: | lldpad not working for some ports, when LLDP frames are sent in a vlan | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michael Burman <mburman> |
| Component: | lldpad | Assignee: | Aaron Conole <aconole> |
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-daemons |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.4 | CC: | bugs, cleech, danken, dholler, hsowa, jeharris, jnikolak, loberman, mburman, mleitner, revers, rkhan, sassmann, tcarlin, thaller |
| Target Milestone: | pre-dev-freeze | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-23 14:52:21 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1458501, 1477205 | ||
| Attachments: | |||
|
Description
Michael Burman
2017-09-03 12:25:43 UTC
Created attachment 1321507 [details]
record Lldptool not working
It seems to be lldpad that is not working well, regardless of oVirt. Chris, can you or someone else from lldpad help oVirt QE pinpoint when and why does lldpad seems to ingnore incoming LLDP data? Created attachment 1322636 [details]
process trace log
Created attachment 1322637 [details]
messages log
journalctl provides the error: Sep 06 10:29:59 vega01.qa.lab.tlv.redhat.com lldpad[17055]: recvfrom(Event interface): No buffer space available The problem occurs even before the "recvfrom(Event interface): No buffer space available" is logged. Created attachment 1322694 [details]
strace -s 1000 -tt -f -ff -o log /usr/sbin/lldpad -t for around 180 seconds
Rashid suspects a kernel driver issue. Burman, can you specify the make+model and driver+version of interfaces on which you see the problem? Created attachment 1322701 [details]
strace -s 1000 -tt -f -ff -o log /usr/sbin/lldpad -t with tcpdump after around 1 minute
After 1 minute, run tcpdump like this:
tcpdump -i enp1s0f1 ether proto 0x88cc -c 1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp1s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes
18:04:48.936744 LLDP, length 301: rack03-sw03-lab4.tlv.redhat.com
1 packet captured
1 packet received by filter
0 packets dropped by kernel
Can you also please show the content of /proc/net/dev_mcast (while tcpdump is not running and no LLDP messages are received) Created attachment 1322894 [details]
cat /proc/net/dev_mcast > dev_mcast.log
Created attachment 1322896 [details]
lspci -vv
Setting back the need info requested by danken in comment#4 Maybe helpful: ethtool -i enp2s0f1 driver: igb version: 5.4.0-k firmware-version: 1.52.0 expansion-rom-version: bus-info: 0000:02:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no Could you please provide an SOS report from the machine in question? Thanks First thoughts but *I HAVEN'T VERIFIED THAT, BUT HIGHLY LIKELY*: First of all, it looks like the Cisco is sending LLDP packets encapsulated in VLAN1, which is probably the native tag on the port. This is a known problem of Cisco devices. Because vlan1 is not enabled on the linux box, we didn't tell the igb card to receive vlan1, thus it is filtered. When we start tcpdump, we actually disable vlan filters and suddenly we also start to receive all traffic. Because lldapd doesn't check for PACKET_AUXDATA and the vlan offload information, it wrongly receives the LLDP frame, which is actually in vlan 1, instead of the native wire. I would suggest trying to fix the Cisco box. Secondly we need to patch lldapd. Third option is to simply create a vlan with id 1. Please try to subscribe to vlan 1 and check if it works instead of tcpdump: ip link add link eth0 name enp2s0f1.1 type vlan id 1 ip link set up dev enp2s0f1.1 Created attachment 1323732 [details]
tcpdump -i enp1s0f1 ether proto 0x88cc -c 1 -w vega04-enp1s0f1.pcap
Thank you very much Hannes Frederic! Looks like ip link add link enp1s0f1 name enp1s0f1.1 type vlan id 1 ip link set up dev enp1s0f1.1 making lldpad receiving the missing LLDP packets. Also dump vega04-enp1s0f1.pcap shows vlan id in 802.1Q header, too. Hi, (In reply to Dominik Holler from comment #21) > Thank you very much Hannes Frederic! > Looks like > > ip link add link enp1s0f1 name enp1s0f1.1 type vlan id 1 > ip link set up dev enp1s0f1.1 > > making lldpad receiving the missing LLDP packets. > Also dump vega04-enp1s0f1.pcap shows vlan id in 802.1Q header, too. So we can't do anything in Linux to help you here. The problem is that some devices send LLDP frames in the native vlan, some always in vlan 1. You have to guess here or tell customers that the equipment is broken. What I would propose is that we can fix lldpd/lldpad in a way that it consistently doesn't recognize the LLDP packets if they are encapsulated, thus eliminating this inconsistent behaviour. What do you think? If we want to keep the current logic, we can probably close this bug. Thanks! Hannes, do you have a reference to this Cisco "misunderstood feature"? Do they always use vlan id 1, or do they sometimes use another value? I'm asking because I am thinking about another "fix" for lldpad: when enabling rx on a nic, let it also unfilter vlan 1 for that nic. It smells like the caving in under the pressure of faulty hardware, but it sounds more practical than your suggestion (which would make lldpad useless with a whole class of prevalent devices). Is it possible/reasonable in your opinion? Hi Dan, I saw this feature thing happening myself already. I would guess it is pretty common. Quick googling shows some hits: https://networkengineering.stackexchange.com/questions/9526/behavior-of-lldp-packets-through-tagged-port https://kb.juniper.net/InfoCenter/index?page=content&id=KB23996 Also mentioned here with lldp: https://github.com/vincentbernat/lldpd/blob/master/README.md In the last link it is also mentioned that LLDP packets can be tagged with the native vlan number again. So I fear it is a bit more complicated. My proposal would obviously make the behaviour of lldapd deterministic and ignore tagged frames and obviously document this behaviour so that admins/scripts can send up the appropriate vlans accordingly. Unfortunately this is not a plug and play installation anymore. Also adding vlan1 to the host has obvious side effects, like starting to export all IP addresses from the Linux system into an accidental vlan 1. So the vlan 1 solution is also pretty scary for me. I don't know if we can just unfilter vlans, maybe we should think about that. Thanks, Hannes Is implementable to receive Ethernet frames tagged with VLAN1 in lldpad, without creating a VLAN interface like eth0.1? The problem is receiving such frames. If you don't add such vlan to the system, the packet is considered unwanted by the NIC and is discarded/dropped automatically. What tcpdump (and similars) do to receive them even without the vlan in there is to put the interface in promisc mode, which disables such filter, but we don't want to do that for long term/production as that has security implications and cause additional load. > The problem is receiving such frames. If you don't add such vlan to > the system, I wonder if there is another way than adding a interface with full IP stack. > the packet is considered unwanted by the NIC and is > discarded/dropped automatically. Since libnl has some layer 2 functionality, maybe libnl is able to configure the layer 2 filters in a way that frames tagged as VLAN 1 and with LLDP EtherType are received. > What tcpdump (and similars) do to > receive them even without the vlan in there is to put the interface > in promisc mode, which disables such filter, but we don't want to do > that for long term/production as that has security implications and > cause additional load. Sure. It's orthogonal to libnl. libnl can set up the filters it want, but if the NIC isn't instructed to receive such packets, it won't send them to the kernel and libnl won't see them. Consider it as 2 different levels of filters. And there is only these two ways to configure the NIC to accept it: either add the VLAN or enable promisc mode, but both doesn't suit. Ah, we could also add a tc rule to strip the vlan header in case it is vlan1, then we wouldn't have to add the vlan1 to the system, but we still hit this issue with instructing the NIC to receive such frames. > It's orthogonal to libnl. libnl can set up the filters it want, but > if the NIC isn't instructed to receive such packets, it won't send > them to the kernel and libnl won't see them. Consider it as 2 > different levels of filters. I expected that libnl configures the filtering in the kernel via netlink. > And there is only these two ways to > configure the NIC to accept it: either add the VLAN or enable promisc > mode, but both doesn't suit. Maybe there are multiple ways to add VLANs. http://www.fser.info/?q=node/20 purposes an ioctl based idea. Can this approach help here? > http://www.fser.info/?q=node/20 purposes an ioctl based idea.
> Can this approach help here?
I see, ioctl(socket, SIOCSIFVLAN, &args) creates full VLAN interface.
Bottom line is that we do not have much to do to fix it. Hosts would not be able to hear and report lldp data if the switch sends it on vlan 1. A user can work around that by configuring vlan1 or a bridge on the nic, or otherwise force it into promiscuous mode. Isn't there a disconnect between saying "not much to do to fix it" and defining the issue as NOTABUG ? NOTABUG on an lldpad bugzilla only means that this is not a bug in lldpad. It does not mean that the report is wrong or that there is no bug SOMEWHERE. I believe that the bug is real, but is in the switch, and it is fixed in new Cisco witches. |