Bug 1928312

Summary: RHV RHEL 8 Host can not be deployed on interface with error lldpad[2891]: set_ieee_hw: nlconnect failed abort hardware set, Unspecific failure
Product: Red Hat Enterprise Virtualization Manager Reporter: hhaberma
Component: vdsmAssignee: Ales Musil <amusil>
Status: CLOSED DUPLICATE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.4.3CC: ahadas, amusil, bugs, dholler, lsurette, mburman, michal.skrivanek, mkalinin, mperina, pelauter, rcunha, sasundar, srevivo, ycui
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1928753 (view as bug list) Environment:
Last Closed: 2022-03-15 17:03:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1915458, 1928753, 1997064    
Bug Blocks:    
Attachments:
Description Flags
lateset sosreport none

Description hhaberma 2021-02-12 23:49:14 UTC
Created attachment 1756726 [details]
lateset sosreport

Description of problem:

Customer is attempting to deploy a RHV 4.4 HE environment, and after installing  "dnf remove cockpit-ovirt-dashboard -y" and rebooting the customer loses network access:


Version-Release number of selected component (if applicable):

NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ovirt-hosted-engine-setup-2.4.9-2.el8ev.noarch              
ovirt-host-4.4.1-4.el8ev.x86_64                             
vdsm-4.40.40-1.el8ev.x86_64
lldpad-1.0.1-13.git036e314.el8.x86_64

How reproducible:
Always

Steps to Reproduce:

1. Host
Network Adapter Intel
driver: ice
version: 0.8.2-k
firmware-version: 2.10 0x80004341 1.2789.0

2. LLDP is enabled for all interfaces on this Juniper ex4550 switch 

3. Install/update dnf packages dashboard per instructions.

https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/installing_red_hat_virtualization_as_a_self-hosted_engine_using_the_cockpit_web_interface/installing_the_self-hosted_engine_deployment_host_she_cockpit_deploy#Installing_Cockpit_on_Linux_Hosts_SHE_deployment_host

4.2.1. Enabling the Red Hat Enterprise Linux host Repositories

Ensure that all packages currently installed are up to date:

# dnf upgrade --nobest
Reboot the machine.

3. Go to messages log verify:

~~~
Feb  9 22:34:44 rhev02 lldpad[2891]: get_dcbx_hw: enp67s0f1 nlconnect failed abort get ieee, Unspecific failure
Feb  9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure
Feb  9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure
Feb  9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure
~~~
and Repeating Error - 
~~~
Non-Contiguous TCs - Disabling DCB
~~~

4. Proceed with host installation step to install dashboard

4.2.2. Installing Cockpit on Red Hat Enterprise Linux hosts
Install the dashboard packages:

# dnf install cockpit-ovirt-dashboard

5. Test ssh connection to host.

6. Verify lldpad service.

* lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-02-11 16:49:56 CST; 17h ago
 Main PID: 2883 (lldpad)
    Tasks: 1 (limit: 3298746)
   Memory: 6.6M
   CGroup: /system.slice/lldpad.service
           `-2883 /usr/sbin/lldpad -t

Feb 11 16:57:04 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 11 16:57:21 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available
Feb 11 16:57:33 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 11 16:57:33 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space Unit lvm2-activation-early.service could not be found.
Unit lvm2-activation.service could not be found.
available
Feb 11 16:57:58 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 12 10:40:20 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 12 10:41:07 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available
Feb 12 10:41:28 rhev02 lldpad[2883]: l2_packet_send - send: Network is down
Feb 12 10:41:28 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available
Feb 12 10:41:56 rhev02 lldpad[2883]: l2_packet_send - send: Network is down

Actual results:

Customer lost communication to the host. And seemed hit a known issue with lldpad in RHE7 https://bugzilla.redhat.com/show_bug.cgi?id=1554110. The only way to get communication to work to resume deployment is to do the following:

[1] Applied runtime disabling at the interface but does not survive a reboot and defaults back to 'rx'.

#lldptool set-lldp –i enp67s0f0 adminStatus=disasbled

[2] Customer disabled LLDP at the switch for both interfaces using the Intel NIC.

And is not seeing the non-contiguous error after reboot or lldpad errors.


Expected results:

It seems there were some changes with the lldpad in RHEL 8. In RHEL 7 I was able to disable but no more in RHEL 8 goes back to adminStatus=rx. There is like a strict dependency now between vdsmd and lldpad.

Would it be practical to still have this feature? 
What if the customer could not disable lldp?
Is there a purpose now in RHEL 8 not to disable?
Or, is there a method for this that I missed?

I realize also there might be a compatibility issue with Juniper switch as well. But my main concern is why we can not disable this.


Additional info:

Comment 2 Martin Perina 2021-02-15 13:32:57 UTC
Cloned the bug to RHEL and raised priority to high, because the workaround, which worked on RHEL7, doesn't work on RHEL 8

Comment 3 Dominik Holler 2021-03-04 10:38:25 UTC
Please note that there is also bug 1623904 and bug 1636254. The fix of the last one added the enable_lldp option in vdsm.conf to avoid vdsm using lldp.

Comment 9 Michal Skrivanek 2021-05-03 16:45:40 UTC
*** Bug 1954629 has been marked as a duplicate of this bug. ***

Comment 13 Marina Kalinin 2021-08-27 16:51:22 UTC
Wondering if we really need to keep this bug open?
We have the RHEL8 bzs to address the issue: bz#1928753, bz#1915458, bz#1997064.
And we have RHV bz to confirm the workaround to be working - bz#1939262.

IF we still want to reconsider the workaround after RHEL BZs are fixed, maybe we should close the WA bz instead and continue the discussion here? OR you believe both bugs have place?

Comment 15 Martin Perina 2022-03-15 17:03:01 UTC
RHV team have no influence to move that forward, it really depends on upstream fix and making it delivered as a part of RHEL 8, there is no implementation work in RHV side. So closing as duplicate of RHEL bug

*** This bug has been marked as a duplicate of bug 1928753 ***