Bug 1928312 - RHV RHEL 8 Host can not be deployed on interface with error lldpad[2891]: set_ieee_hw: nlconnect failed abort hardware set, Unspecific failure
Summary: RHV RHEL 8 Host can not be deployed on interface with error lldpad[2891]: set...
Keywords:
Status: CLOSED DUPLICATE of bug 1928753
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On: 1915458 1928753 1997064
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-12 23:49 UTC by hhaberma
Modified: 2022-03-16 08:21 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1928753 (view as bug list)
Environment:
Last Closed: 2022-03-15 17:03:01 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
lateset sosreport (17.96 MB, application/x-xz)
2021-02-12 23:49 UTC, hhaberma
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1623904 1 medium CLOSED lldpad is not compatible to NICs with internal LLDP processing 2021-12-10 12:50:20 UTC
Red Hat Knowledge Base (Solution) 3619501 0 None None None 2021-05-10 19:43:55 UTC
Red Hat Knowledge Base (Solution) 5880951 0 None None None 2021-05-10 19:41:29 UTC
Red Hat Knowledge Base (Solution) 6287681 0 None None None 2021-08-27 16:04:54 UTC

Description hhaberma 2021-02-12 23:49:14 UTC
Created attachment 1756726 [details]
lateset sosreport

Description of problem:

Customer is attempting to deploy a RHV 4.4 HE environment, and after installing  "dnf remove cockpit-ovirt-dashboard -y" and rebooting the customer loses network access:


Version-Release number of selected component (if applicable):

NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ovirt-hosted-engine-setup-2.4.9-2.el8ev.noarch              
ovirt-host-4.4.1-4.el8ev.x86_64                             
vdsm-4.40.40-1.el8ev.x86_64
lldpad-1.0.1-13.git036e314.el8.x86_64

How reproducible:
Always

Steps to Reproduce:

1. Host
Network Adapter Intel
driver: ice
version: 0.8.2-k
firmware-version: 2.10 0x80004341 1.2789.0

2. LLDP is enabled for all interfaces on this Juniper ex4550 switch 

3. Install/update dnf packages dashboard per instructions.

https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/installing_red_hat_virtualization_as_a_self-hosted_engine_using_the_cockpit_web_interface/installing_the_self-hosted_engine_deployment_host_she_cockpit_deploy#Installing_Cockpit_on_Linux_Hosts_SHE_deployment_host

4.2.1. Enabling the Red Hat Enterprise Linux host Repositories

Ensure that all packages currently installed are up to date:

# dnf upgrade --nobest
Reboot the machine.

3. Go to messages log verify:

~~~
Feb  9 22:34:44 rhev02 lldpad[2891]: get_dcbx_hw: enp67s0f1 nlconnect failed abort get ieee, Unspecific failure
Feb  9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure
Feb  9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure
Feb  9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure
~~~
and Repeating Error - 
~~~
Non-Contiguous TCs - Disabling DCB
~~~

4. Proceed with host installation step to install dashboard

4.2.2. Installing Cockpit on Red Hat Enterprise Linux hosts
Install the dashboard packages:

# dnf install cockpit-ovirt-dashboard

5. Test ssh connection to host.

6. Verify lldpad service.

* lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-02-11 16:49:56 CST; 17h ago
 Main PID: 2883 (lldpad)
    Tasks: 1 (limit: 3298746)
   Memory: 6.6M
   CGroup: /system.slice/lldpad.service
           `-2883 /usr/sbin/lldpad -t

Feb 11 16:57:04 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 11 16:57:21 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available
Feb 11 16:57:33 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 11 16:57:33 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space Unit lvm2-activation-early.service could not be found.
Unit lvm2-activation.service could not be found.
available
Feb 11 16:57:58 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 12 10:40:20 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 12 10:41:07 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available
Feb 12 10:41:28 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 12 10:41:28 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available
Feb 12 10:41:56 rhev02 lldpad[2883]: l2_packet_send - send: Network is down

Actual results:

Customer lost communication to the host. And seemed hit a known issue with lldpad in RHE7 https://bugzilla.redhat.com/show_bug.cgi?id=1554110. The only way to get communication to work to resume deployment is to do the following:

[1] Applied runtime disabling at the interface but does not survive a reboot and defaults back to 'rx'.

#lldptool set-lldp –i enp67s0f0 adminStatus=disasbled

[2] Customer disabled LLDP at the switch for both interfaces using the Intel NIC.

And is not seeing the non-contiguous error after reboot or lldpad errors.


Expected results:

It seems there were some changes with the lldpad in RHEL 8. In RHEL 7 I was able to disable but no more in RHEL 8 goes back to adminStatus=rx. There is like a strict dependency now between vdsmd and lldpad.

Would it be practical to still have this feature? 
What if the customer could not disable lldp?
Is there a purpose now in RHEL 8 not to disable?
Or, is there a method for this that I missed?

I realize also there might be a compatibility issue with Juniper switch as well. But my main concern is why we can not disable this.


Additional info:

Comment 2 Martin Perina 2021-02-15 13:32:57 UTC
Cloned the bug to RHEL and raised priority to high, because the workaround, which worked on RHEL7, doesn't work on RHEL 8

Comment 3 Dominik Holler 2021-03-04 10:38:25 UTC
Please note that there is also bug 1623904 and bug 1636254. The fix of the last one added the enable_lldp option in vdsm.conf to avoid vdsm using lldp.

Comment 9 Michal Skrivanek 2021-05-03 16:45:40 UTC
*** Bug 1954629 has been marked as a duplicate of this bug. ***

Comment 13 Marina Kalinin 2021-08-27 16:51:22 UTC
Wondering if we really need to keep this bug open?
We have the RHEL8 bzs to address the issue: bz#1928753, bz#1915458, bz#1997064.
And we have RHV bz to confirm the workaround to be working - bz#1939262.

IF we still want to reconsider the workaround after RHEL BZs are fixed, maybe we should close the WA bz instead and continue the discussion here? OR you believe both bugs have place?

Comment 15 Martin Perina 2022-03-15 17:03:01 UTC
RHV team have no influence to move that forward, it really depends on upstream fix and making it delivered as a part of RHEL 8, there is no implementation work in RHV side. So closing as duplicate of RHEL bug

*** This bug has been marked as a duplicate of bug 1928753 ***


Note You need to log in before you can comment on or make changes to this bug.