RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1487930 - lldpad not working for some ports, when LLDP frames are sent in a vlan
Summary: lldpad not working for some ports, when LLDP frames are sent in a vlan
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lldpad
Version: 7.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Aaron Conole
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks: 1458501 1477205
TreeView+ depends on / blocked
 
Reported: 2017-09-03 12:25 UTC by Michael Burman
Modified: 2021-12-10 12:50 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-23 14:52:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
record Lldptool not working (3.80 MB, application/x-gzip)
2017-09-03 12:34 UTC, Michael Burman
no flags Details
process trace log (6.88 KB, text/plain)
2017-09-06 12:03 UTC, Michael Burman
no flags Details
messages log (1.89 MB, text/plain)
2017-09-06 12:03 UTC, Michael Burman
no flags Details
strace -s 1000 -tt -f -ff -o log /usr/sbin/lldpad -t for around 180 seconds (569.89 KB, text/plain)
2017-09-06 14:55 UTC, Dominik Holler
no flags Details
strace -s 1000 -tt -f -ff -o log /usr/sbin/lldpad -t with tcpdump after around 1 minute (586.28 KB, text/plain)
2017-09-06 15:17 UTC, Dominik Holler
no flags Details
cat /proc/net/dev_mcast > dev_mcast.log (2.11 KB, text/plain)
2017-09-07 05:44 UTC, Dominik Holler
no flags Details
lspci -vv (35.35 KB, text/plain)
2017-09-07 06:00 UTC, Michael Burman
no flags Details
tcpdump -i enp1s0f1 ether proto 0x88cc -c 1 -w vega04-enp1s0f1.pcap (355 bytes, application/octet-stream)
2017-09-08 12:43 UTC, Dominik Holler
no flags Details

Description Michael Burman 2017-09-03 12:25:43 UTC
Description of problem:
Lldp sometimes not working for some ports.

For example the next port enp2s0f1 is enabled=true via vdsm-client ->

cat <<EOF | vdsm-client -f - Host getLldp
> {
>     "filter": {}
> }
> EOF

}, 
    "enp2s0f1": {
        "enabled": true, 
        "tlvs": []

- But when running lldptool on this port we get nothing back - 

[root@vega04 ~]# lldptool get-tlv -n -i enp2s0f1

- Only after we will run a tcpdump on this port we will manage to get lldp working for this port - 

[root@vega04 ~]# tcpdump  -i enp2s0f1 ether proto 0x88cc -c 1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp2s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes
15:12:52.121062 LLDP, length 297: rack03-sw02-lab4.tlv.redhat.com
1 packet captured
1 packet received by filter
0 packets dropped by kernel

[root@vega04 ~]# lldptool get-tlv -n -i enp2s0f1
Chassis ID TLV
        MAC: 18:ef:63:a1:75:10
Port ID TLV
        Local: Gi0/16
Time to Live TLV
        120
System Name TLV
        rack03-sw02-lab4.tlv.redhat.com
System Description TLV
        Cisco IOS Software, C3560 Software (C3560-ADVIPSERVICESK9-M), Version 12.2(44)SE6, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2009 by Cisco Systems, Inc.
Compiled Mon 09-Mar-09 17:42 by gereddy
Port Description TLV
        GigabitEthernet0/16
System Capabilities TLV
        System capabilities:  Bridge, Router
        Enabled capabilities: Bridge
Port VLAN ID TLV
        PVID: 161
MAC/PHY Configuration Status TLV
        Auto-negotiation supported and enabled
        PMD auto-negotiation capabilities: 0xc036
        MAU type: 1000 BaseTFD
End of LLDPDU TLV

Version-Release number of selected component (if applicable):
vdsm-4.20.3-8.gitd4eb30e.el7.centos.x86_64
4.2.0-0.0.master.20170901193740.git7900511.el7.centos

How reproducible:
100% on some ports and HWs 
Supermicro X9SCD
01:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

Steps to Reproduce:
1. Run lldptool on a port - lldptool get-tlv -n -i enp2s0f1
2. Run vdsm-client Host getLldp - 
cat <<EOF | vdsm-client -f - Host getLldp
> {
>     "filter": {}
> }
> EOF
3. Run tcpdump on the port - tcpdump  -i enp2s0f1 ether proto 0x88cc -c 1
4. Run lldptool again on the port - lldptool get-tlv -n -i enp2s0f1

Actual results:
1. Get nothing back
2. "enabled": true, 
        "tlvs": []
3. We get response
4. We now finaly got response from lldptool for the port

Attaching record to describe it better

Comment 1 Michael Burman 2017-09-03 12:34:31 UTC
Created attachment 1321507 [details]
record Lldptool not working

Comment 2 Dan Kenigsberg 2017-09-03 19:28:57 UTC
It seems to be lldpad that is not working well, regardless of oVirt.

Comment 4 Dan Kenigsberg 2017-09-03 19:31:14 UTC
Chris, can you or someone else from lldpad help oVirt QE pinpoint when and why does lldpad seems to ingnore incoming LLDP data?

Comment 5 Michael Burman 2017-09-06 12:03:01 UTC
Created attachment 1322636 [details]
process trace log

Comment 6 Michael Burman 2017-09-06 12:03:31 UTC
Created attachment 1322637 [details]
messages log

Comment 7 Dominik Holler 2017-09-06 12:05:31 UTC
journalctl provides the error:
Sep 06 10:29:59 vega01.qa.lab.tlv.redhat.com lldpad[17055]: recvfrom(Event interface): No buffer space available

Comment 8 Dominik Holler 2017-09-06 14:15:46 UTC
The problem occurs even before the "recvfrom(Event interface): No buffer space available" is logged.

Comment 9 Dominik Holler 2017-09-06 14:55:56 UTC
Created attachment 1322694 [details]
strace -s 1000 -tt -f -ff -o log /usr/sbin/lldpad -t for around 180 seconds

Comment 10 Dan Kenigsberg 2017-09-06 15:05:29 UTC
Rashid suspects a kernel driver issue. Burman, can you specify the make+model and driver+version of interfaces on which you see the problem?

Comment 11 Dominik Holler 2017-09-06 15:17:19 UTC
Created attachment 1322701 [details]
strace -s 1000 -tt -f -ff -o log /usr/sbin/lldpad -t with tcpdump after around 1 minute

After 1 minute, run tcpdump like this:
tcpdump  -i enp1s0f1  ether proto 0x88cc -c 1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp1s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes
18:04:48.936744 LLDP, length 301: rack03-sw03-lab4.tlv.redhat.com
1 packet captured
1 packet received by filter
0 packets dropped by kernel

Comment 12 Thomas Haller 2017-09-06 15:56:26 UTC
Can you also please show the content of /proc/net/dev_mcast (while tcpdump is not running and no LLDP messages are received)

Comment 13 Dominik Holler 2017-09-07 05:44:35 UTC
Created attachment 1322894 [details]
cat  /proc/net/dev_mcast > dev_mcast.log

Comment 14 Michael Burman 2017-09-07 06:00:34 UTC
Created attachment 1322896 [details]
lspci -vv

Comment 15 Michael Burman 2017-09-07 06:05:17 UTC
Setting back the need info requested by danken in comment#4

Comment 16 Dominik Holler 2017-09-07 07:12:13 UTC
Maybe helpful:

ethtool -i enp2s0f1
driver: igb
version: 5.4.0-k
firmware-version: 1.52.0
expansion-rom-version: 
bus-info: 0000:02:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Comment 17 Thomas Haller 2017-09-07 16:34:22 UTC
Could you please provide an SOS report from the machine in question? Thanks

Comment 18 Hannes Frederic Sowa 2017-09-08 12:16:37 UTC
First thoughts but *I HAVEN'T VERIFIED THAT, BUT HIGHLY LIKELY*:

First of all, it looks like the Cisco is sending LLDP packets encapsulated in VLAN1, which is probably the native tag on the port. This is a known problem of Cisco devices.

Because vlan1 is not enabled on the linux box, we didn't tell the igb card to receive vlan1, thus it is filtered.

When we start tcpdump, we actually disable vlan filters and suddenly we also start to receive all traffic.

Because lldapd doesn't check for PACKET_AUXDATA and the vlan offload information, it wrongly receives the LLDP frame, which is actually in vlan 1, instead of the native wire.

I would suggest trying to fix the Cisco box. Secondly we need to patch lldapd. Third option is to simply create a vlan with id 1.

Comment 19 Hannes Frederic Sowa 2017-09-08 12:35:52 UTC
Please try to subscribe to vlan 1 and check if it works instead of tcpdump:

ip link add link eth0 name enp2s0f1.1 type vlan id 1
ip link set up dev enp2s0f1.1

Comment 20 Dominik Holler 2017-09-08 12:43:28 UTC
Created attachment 1323732 [details]
tcpdump -i enp1s0f1 ether proto 0x88cc -c 1 -w vega04-enp1s0f1.pcap

Comment 21 Dominik Holler 2017-09-08 12:51:15 UTC
Thank you very much Hannes Frederic!
Looks like 

ip link add link enp1s0f1  name enp1s0f1.1 type vlan id 1
ip link set up dev enp1s0f1.1

making lldpad receiving the missing LLDP packets.
Also dump vega04-enp1s0f1.pcap shows vlan id in 802.1Q header, too.

Comment 22 Hannes Frederic Sowa 2017-09-08 12:55:56 UTC
Hi,

(In reply to Dominik Holler from comment #21)
> Thank you very much Hannes Frederic!
> Looks like 
> 
> ip link add link enp1s0f1  name enp1s0f1.1 type vlan id 1
> ip link set up dev enp1s0f1.1
> 
> making lldpad receiving the missing LLDP packets.
> Also dump vega04-enp1s0f1.pcap shows vlan id in 802.1Q header, too.

So we can't do anything in Linux to help you here. The problem is that some devices send LLDP frames in the native vlan, some always in vlan 1. You have to guess here or tell customers that the equipment is broken.

What I would propose is that we can fix lldpd/lldpad in a way that it consistently doesn't recognize the LLDP packets if they are encapsulated, thus eliminating this inconsistent behaviour. What do you think?

If we want to keep the current logic, we can probably close this bug.

Thanks!

Comment 23 Dan Kenigsberg 2017-09-08 21:34:11 UTC
Hannes, do you have a reference to this Cisco "misunderstood feature"? Do they always use vlan id 1, or do they sometimes use another value?

I'm asking because I am thinking about another "fix" for lldpad: when enabling rx on a nic, let it also unfilter vlan 1 for that nic. It smells like the caving in under the pressure of faulty hardware, but it sounds more practical than your suggestion (which would make lldpad useless with a whole class of prevalent devices). Is it possible/reasonable in your opinion?

Comment 24 Hannes Frederic Sowa 2017-09-13 08:40:03 UTC
Hi Dan,

I saw this feature thing happening myself already. I would guess it is pretty common. Quick googling shows some hits:

https://networkengineering.stackexchange.com/questions/9526/behavior-of-lldp-packets-through-tagged-port
https://kb.juniper.net/InfoCenter/index?page=content&id=KB23996

Also mentioned here with lldp: https://github.com/vincentbernat/lldpd/blob/master/README.md

In the last link it is also mentioned that LLDP packets can be tagged with the native vlan number again. So I fear it is a bit more complicated.

My proposal would obviously make the behaviour of lldapd deterministic and ignore tagged frames and obviously document this behaviour so that admins/scripts can send up the appropriate vlans accordingly. Unfortunately this is not a plug and play installation anymore.

Also adding vlan1 to the host has obvious side effects, like starting to export all IP addresses from the Linux system into an accidental vlan 1. So the vlan 1 solution is also pretty scary for me.

I don't know if we can just unfilter vlans, maybe we should think about that.

Thanks,
Hannes

Comment 25 Dominik Holler 2017-09-13 16:04:06 UTC
Is implementable to receive Ethernet frames tagged with VLAN1 in lldpad, without creating a VLAN interface like eth0.1?

Comment 26 Marcelo Ricardo Leitner 2017-09-13 18:48:24 UTC
The problem is receiving such frames. If you don't add such vlan to the system, the packet is considered unwanted by the NIC and is discarded/dropped automatically. What tcpdump (and similars) do to receive them even without the vlan in there is to put the interface in promisc mode, which disables such filter, but we don't want to do that for long term/production as that has security implications and cause additional load.

Comment 27 Dominik Holler 2017-09-13 19:09:23 UTC
> The problem is receiving such frames. If you don't add such vlan to
> the system, 

I wonder if there is another way than adding a interface with full IP stack.

> the packet is considered unwanted by the NIC and is
> discarded/dropped automatically. 

Since libnl has some layer 2 functionality, maybe libnl is able to configure the layer 2 filters in a way that frames tagged as VLAN 1 and with LLDP EtherType are received.

> What tcpdump (and similars) do to
> receive them even without the vlan in there is to put the interface
> in promisc mode, which disables such filter, but we don't want to do
> that for long term/production as that has security implications and
> cause additional load.

Sure.

Comment 28 Marcelo Ricardo Leitner 2017-09-13 19:24:36 UTC
It's orthogonal to libnl. libnl can set up the filters it want, but if the NIC isn't instructed to receive such packets, it won't send them to the kernel and libnl won't see them. Consider it as 2 different levels of filters. And there is only these two ways to configure the NIC to accept it: either add the VLAN or enable promisc mode, but both doesn't suit.

Comment 29 Marcelo Ricardo Leitner 2017-09-13 19:27:10 UTC
Ah, we could also add a tc rule to strip the vlan header in case it is vlan1, then we wouldn't have to add the vlan1 to the system, but we still hit this issue with instructing the NIC to receive such frames.

Comment 30 Dominik Holler 2017-09-13 21:41:58 UTC
> It's orthogonal to libnl. libnl can set up the filters it want, but
> if the NIC isn't instructed to receive such packets, it won't send
> them to the kernel and libnl won't see them. Consider it as 2
> different levels of filters.

I expected that libnl configures the filtering in the kernel via
netlink.

> And there is only these two ways to
> configure the NIC to accept it: either add the VLAN or enable promisc
> mode, but both doesn't suit.

Maybe there are multiple ways to add VLANs.
http://www.fser.info/?q=node/20 purposes an ioctl based idea.
Can this approach help here?

Comment 31 Dominik Holler 2017-09-14 07:11:49 UTC
> http://www.fser.info/?q=node/20 purposes an ioctl based idea.
> Can this approach help here?

I see, ioctl(socket, SIOCSIFVLAN, &args) creates full VLAN interface.

Comment 32 Dan Kenigsberg 2018-02-10 14:47:38 UTC
Bottom line is that we do not have much to do to fix it.

Hosts would not be able to hear and report lldp data if the switch sends it on vlan 1. A user can work around that by configuring vlan1 or a bridge on the nic, or otherwise force it into promiscuous mode.

Comment 33 Jeremy Harris 2018-03-13 10:51:27 UTC
Isn't there a disconnect between saying "not much to do to fix it" and
defining the issue as NOTABUG ?

Comment 34 Dan Kenigsberg 2018-04-29 14:09:52 UTC
NOTABUG on an lldpad bugzilla only means that this is not a bug in lldpad. It does not mean that the report is wrong or that there is no bug SOMEWHERE. I believe that the bug is real, but is in the switch, and it is fixed in new Cisco witches.


Note You need to log in before you can comment on or make changes to this bug.