Created attachment 1756726 [details] lateset sosreport Description of problem: Customer is attempting to deploy a RHV 4.4 HE environment, and after installing "dnf remove cockpit-ovirt-dashboard -y" and rebooting the customer loses network access: Version-Release number of selected component (if applicable): NAME="Red Hat Enterprise Linux" VERSION="8.3 (Ootpa)" ovirt-hosted-engine-setup-2.4.9-2.el8ev.noarch ovirt-host-4.4.1-4.el8ev.x86_64 vdsm-4.40.40-1.el8ev.x86_64 lldpad-1.0.1-13.git036e314.el8.x86_64 How reproducible: Always Steps to Reproduce: 1. Host Network Adapter Intel driver: ice version: 0.8.2-k firmware-version: 2.10 0x80004341 1.2789.0 2. LLDP is enabled for all interfaces on this Juniper ex4550 switch 3. Install/update dnf packages dashboard per instructions. https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/installing_red_hat_virtualization_as_a_self-hosted_engine_using_the_cockpit_web_interface/installing_the_self-hosted_engine_deployment_host_she_cockpit_deploy#Installing_Cockpit_on_Linux_Hosts_SHE_deployment_host 4.2.1. Enabling the Red Hat Enterprise Linux host Repositories Ensure that all packages currently installed are up to date: # dnf upgrade --nobest Reboot the machine. 3. Go to messages log verify: ~~~ Feb 9 22:34:44 rhev02 lldpad[2891]: get_dcbx_hw: enp67s0f1 nlconnect failed abort get ieee, Unspecific failure Feb 9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure Feb 9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure Feb 9 22:34:44 rhev02 lldpad[2891]: set_ieee_hw: enp67s0f0 nlconnect failed abort hardware set, Unspecific failure ~~~ and Repeating Error - ~~~ Non-Contiguous TCs - Disabling DCB ~~~ 4. Proceed with host installation step to install dashboard 4.2.2. Installing Cockpit on Red Hat Enterprise Linux hosts Install the dashboard packages: # dnf install cockpit-ovirt-dashboard 5. Test ssh connection to host. 6. Verify lldpad service. * lldpad.service - Link Layer Discovery Protocol Agent Daemon. Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2021-02-11 16:49:56 CST; 17h ago Main PID: 2883 (lldpad) Tasks: 1 (limit: 3298746) Memory: 6.6M CGroup: /system.slice/lldpad.service `-2883 /usr/sbin/lldpad -t Feb 11 16:57:04 rhev02 lldpad[2883]: l2_packet_send - send: Network is down Feb 11 16:57:21 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available Feb 11 16:57:33 rhev02 lldpad[2883]: l2_packet_send - send: Network is down Feb 11 16:57:33 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space Unit lvm2-activation-early.service could not be found. Unit lvm2-activation.service could not be found. available Feb 11 16:57:58 rhev02 lldpad[2883]: l2_packet_send - send: Network is down Feb 12 10:40:20 rhev02 lldpad[2883]: l2_packet_send - send: Network is down Feb 12 10:41:07 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available Feb 12 10:41:28 rhev02 lldpad[2883]: l2_packet_send - send: Network is down Feb 12 10:41:28 rhev02 lldpad[2883]: recvfrom(Event interface): No buffer space available Feb 12 10:41:56 rhev02 lldpad[2883]: l2_packet_send - send: Network is down Actual results: Customer lost communication to the host. And seemed hit a known issue with lldpad in RHE7 https://bugzilla.redhat.com/show_bug.cgi?id=1554110. The only way to get communication to work to resume deployment is to do the following: [1] Applied runtime disabling at the interface but does not survive a reboot and defaults back to 'rx'. #lldptool set-lldp –i enp67s0f0 adminStatus=disasbled [2] Customer disabled LLDP at the switch for both interfaces using the Intel NIC. And is not seeing the non-contiguous error after reboot or lldpad errors. Expected results: It seems there were some changes with the lldpad in RHEL 8. In RHEL 7 I was able to disable but no more in RHEL 8 goes back to adminStatus=rx. There is like a strict dependency now between vdsmd and lldpad. Would it be practical to still have this feature? What if the customer could not disable lldp? Is there a purpose now in RHEL 8 not to disable? Or, is there a method for this that I missed? I realize also there might be a compatibility issue with Juniper switch as well. But my main concern is why we can not disable this. Additional info:
Cloned the bug to RHEL and raised priority to high, because the workaround, which worked on RHEL7, doesn't work on RHEL 8
Please note that there is also bug 1623904 and bug 1636254. The fix of the last one added the enable_lldp option in vdsm.conf to avoid vdsm using lldp.
*** Bug 1954629 has been marked as a duplicate of this bug. ***
Wondering if we really need to keep this bug open? We have the RHEL8 bzs to address the issue: bz#1928753, bz#1915458, bz#1997064. And we have RHV bz to confirm the workaround to be working - bz#1939262. IF we still want to reconsider the workaround after RHEL BZs are fixed, maybe we should close the WA bz instead and continue the discussion here? OR you believe both bugs have place?
RHV team have no influence to move that forward, it really depends on upstream fix and making it delivered as a part of RHEL 8, there is no implementation work in RHV side. So closing as duplicate of RHEL bug *** This bug has been marked as a duplicate of bug 1928753 ***