2016645 – NIC with vmxnet3 driver is failing constantly OCP 4.8

Bug 2016645 - NIC with vmxnet3 driver is failing constantly OCP 4.8

Summary: NIC with vmxnet3 driver is failing constantly OCP 4.8

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jaime Caamaño Ruiz
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-22 14:00 UTC by peter ducai
Modified:	2021-12-20 16:03 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-20 16:03:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1990663	1	urgent	CLOSED	[Assisted-4.8 ][SaaS][vsphere] cluster deployment failed when use OpenShiftSDN and network adapter vmxnet3	2022-01-11 22:31:35 UTC
Red Hat Bugzilla	2009786	1	None	None	None	2024-12-20 21:16:38 UTC

Description peter ducai 2021-10-22 14:00:51 UTC

Description of problem:

I have several non-related cases where I noticed the same repeating error of the main network interface going down and up continuously. Some of the nodes have degraded performance, but some of them are notReady and SDN doesn't work and some (where it could caused also by other factors) have active crio where no containers run. See linked BZs as it could be connected to those issues.



[   11.127593] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[   11.148526] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 9 vectors allocated
[   11.149748] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[   13.018341] overlayfs: unrecognized mount option "volatile" or missing value
[   13.196570] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation


[1026417.754112] device vethf33b74a6 left promiscuous mode
[1027137.877703] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[1027137.896319] IPv6: ADDRCONF(NETDEV_UP): veth26f55cf2: link is not ready
[1027137.896538] IPv6: ADDRCONF(NETDEV_CHANGE): veth26f55cf2: link becomes ready
[1027137.896825] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[1027137.979585] device veth26f55cf2 entered promiscuous mode
[1033751.214059] TCP: request_sock_TCP: Possible SYN flooding on port 6443. Sending cookies.  Check SNMP counters.


Version-Release number of selected component (if applicable):
4.8.9

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

#03058569 

there's also issue that node is stuck with old kubelet values, sudo touch /run/machine-config-daemon-force DIDNT HELP
reboot DIDNT HELP
and there is zero containers in crio + mentioned NIC issues with vmxnet3 (that could cause that containers cannot run?)
customer is using OVN.

Comment 10 Jaime Caamaño Ruiz 2021-12-20 16:03:16 UTC

I am going to close this bug due to lack of progress. 

There is no definitive proof in the logs provided that the interface is going down.

Please, verify that remedy for https://bugzilla.redhat.com/show_bug.cgi?id=1987108 is correctly applied or a version with the fix is used.

Please re-open if you think this is a mistake.

Note You need to log in before you can comment on or make changes to this bug.