Bug 1987108 - Networking issue with vSphere clusters running HW14 and later [NEEDINFO]
Summary: Networking issue with vSphere clusters running HW14 and later
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: MCO Team
QA Contact: Rio Liu
URL:
Whiteboard: EmergencyRequest UpdateRecommendation...
: 1993153 1993723 1996577 1997292 (view as bug list)
Depends On:
Blocks: 1723620 1987166 1998106
TreeView+ depends on / blocked
 
Reported: 2021-07-28 22:03 UTC by Yash Chouksey
Modified: 2022-01-26 14:27 UTC (History)
59 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1987166 1998106 (view as bug list)
Environment:
Last Closed: 2021-10-29 15:20:17 UTC
Target Upstream Version:
aygarg: needinfo? (bbennett)
sdodson: needinfo? (aconstan)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2734 0 None None None 2021-08-26 11:54:30 UTC
Red Hat Issue Tracker INSIGHTOCP-434 0 None None None 2021-08-19 19:05:24 UTC
Red Hat Knowledge Base (Solution) 6279691 0 None None None 2021-08-25 07:17:49 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-11-01 01:36:37 UTC

Comment 1 Michal Fojtik 2021-07-28 22:08:57 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 4 Stefan Schimanski 2021-07-29 06:33:31 UTC
kube-apiserver has not connectivity to the aggregated apiservers, e.g. from master 1:

2021-07-28T13:36:45.938423139Z E0728 13:36:45.938066      20 controller.go:116] loading OpenAPI spec for "v1.route.openshift.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: dial tcp 10.130.0.6:8443: connect: no route to host

Similar messages appear in the other kube-apiserver instances, and (!) for different aggregated apiservers (metrics, oauth, ...).

Looks like pod networking is down.

At the same time the openshift-apiserver (the one I checked) I up and happy (it provides route.openshift.io among other APIs).

Moving to networking.

Comment 5 Antonio Ojea 2021-07-29 08:21:08 UTC
more ./quay-io-openshift-origin-must-gather-sha256-e5e5166f37d7bd043f25276ad450f7aa57d96604e8c1a6c26ab42a9253689079/cluster-scoped-resources/config.openshift.io/infrastructures.yaml
---
apiVersion: config.openshift.io/v1
items:
- apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2021-07-28T12:52:57Z"
    generation: 1
    name: cluster
    resourceVersion: "659"
    uid: e58515fa-5dfc-4399-ae0d-1ac422d8792e
  spec:
    cloudConfig:
      key: config
      name: cloud-provider-config
    platformSpec:
      type: VSphere
  status:
    apiServerInternalURI: https://api-int.marineprod.scotland.gov.uk:6443
    apiServerURL: https://api.marineprod.scotland.gov.uk:6443

It seems that the internal url used to expose the apiserver https://api.marineprod.scotland.gov.uk:6443 is not reachable, causing a cascade of network failures.

This url resolves to 192.168.24.116

> et \"https://[api-int.marineprod.scotland.gov.uk]:6443/api/v1/namespaces/openshift-kube-controller-manager/pods/installer-8-3master.marineprod.scotland.gov.uk?timeout=1m0s\": dial tcp 192.168.24.116:6443

can you verify that url is working correctly?

Comment 8 aygarg 2021-08-13 14:28:27 UTC
Hello Team,

One of our customers is having the same issue where after upgrading the cluster from 4.7.21 to 4.8.3 (running over VSphere as disconnected UPI), the openshift-apiserver operator is degraded with following errors.

oc get co openshift-apiserver -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
  creationTimestamp: "2021-06-24T10:34:19Z"
  generation: 1
  name: openshift-apiserver
  resourceVersion: "20095132"
  uid: 8aff4f02-b91e-495a-93bb-2cc3b0f88045
spec: {}
status:
  conditions:
  - lastTransitionTime: "2021-08-03T19:10:58Z"
    message: All is well
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2021-08-06T11:35:15Z"
    message: 'APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver:
      0/3 pods have been updated to the latest generation'
    reason: APIServerDeployment_PodsUpdating
    status: "True"
    type: Progressing
  - lastTransitionTime: "2021-08-05T17:02:44Z"
    message: |-
      APIServicesAvailable: "apps.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "authorization.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "build.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "image.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "project.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "quota.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "route.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "security.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
      APIServicesAvailable: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
    reason: APIServices_Error
    status: "False"
    type: Available
  - lastTransitionTime: "2021-06-24T10:36:24Z"
    message: All is well
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: openshiftapiservers
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-apiserver-operator
    resource: namespaces
  - group: ""
    name: openshift-apiserver
    resource: namespaces
  - group: ""
    name: openshift-etcd-operator
    resource: namespaces
  - group: ""
    name: host-etcd-2
    namespace: openshift-etcd
    resource: endpoints
  - group: controlplane.operator.openshift.io
    name: ""
    namespace: openshift-apiserver
    resource: podnetworkconnectivitychecks
  - group: apiregistration.k8s.io
    name: v1.apps.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.authorization.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.build.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.image.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.project.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.quota.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.route.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.security.openshift.io
    resource: apiservices
  - group: apiregistration.k8s.io
    name: v1.template.openshift.io
    resource: apiservices
  versions:
  - name: operator
    version: 4.8.3
  - name: openshift-apiserver
    version: 4.8.3                   

We tried the following workaround but no luck.
--> https://access.redhat.com/solutions/5896081

The endpoints of openshift-apiserver pods over port 8443 are not accessible across the nodes i.e. on master1 only the endpoint for openshift-apiserver pod which is running on that node was accessible. The cluster was upgraded completely and after that only this issue is coming up. I will be attaching the must-gather.

Comment 23 aygarg 2021-08-23 12:37:11 UTC
Hello Antonio,

The customer disabled the offloading on the primary NIC for all the nodes but still, the issue persists.


Regards,
Ayush Garg

Comment 25 Joseph Callen 2021-08-24 15:33:09 UTC
Hi Ronak,

Do you have any idea why we are being required to disable `tx-checksum-ip-generic`. This looks similar to the previous VMXNET3 issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1941714

Comment 32 Joseph Callen 2021-08-25 13:38:09 UTC
***
*** Every customer attached to this case needs to open an immediate support case with VMware ***
***

We _need_ the following:

- vSphere version with build numbers
- Switch type
- Virtual machine hardware version

Comment 37 Simon Reber 2021-08-25 14:22:52 UTC
*** Bug 1997292 has been marked as a duplicate of this bug. ***

Comment 38 Lalatendu Mohanty 2021-08-25 14:25:57 UTC
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions.

Who is impacted?  If we have to block upgrade edges based on this issue, which edges would need blocking?
  example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
What is the impact?  Is it serious enough to warrant blocking edges?
  example: Up to 2 minute disruption in edge routing
  example: Up to 90seconds of API downtime
  example: etcd loses quorum and you have to restore from backup
How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
  example: Issue resolves itself after five minutes
  example: Admin uses oc to fix things
  example: Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
  example: No, it’s always been like this we just never noticed
  example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 43 Scott Dodson 2021-08-25 16:05:51 UTC
Who is impacted?
  OpenShift 4.7.24+ and 4.8 clusters running atop vSphere HW14, new installs and upgrades to the affected versions
What is the impact?  Is it serious enough to warrant blocking edges?
  SDN Packet loss resulting in service unavailability.
How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
  Unknown final remediation but a workaround of disabling tx-checksum-ip-generic has been shown to improve the situation
Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
  Yes, 4.7.24+ and 4.8.2+ are known to be affected

This is an initial assessment which will be updated when we have more information.

Comment 45 daniel 2021-08-25 19:29:56 UTC
so just to confirm, 
with VM HW version set to 15 see c#40 for details, after upgrading my cluster (actually it did not finish completely) from 4.7.21 to 4.7.24 I got the following issue: 
# oc get co |grep -v "True        False         False"
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.24    False       True          True       3h23m
console                                    4.7.24    False       False         True       3h27m
monitoring                                 4.7.24    False       True          True       3h21m
openshift-apiserver                        4.7.24    False       False         False      3h25m
operator-lifecycle-manager-packageserver   4.7.24    False       True          False      3h22m
[root@bastion mg]#

which did not change even after waiting nearly 4 hours. Commands too ages and creating a new project timed out with 
~~~
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get projectrequests.project.openshift.io)
~~~

setting has been: 

~~~
[root@bastion mg]# for NODE in master0 master1 master2 worker0 worker1 worker2 worker3 worker4 worker5; do echo ${NODE};ssh -o StrictHostKeyChecking=no core@${NODE} sudo ethtool -k ens192 |egrep 'tx-checksum-ip-generic';done
master0
	tx-checksum-ip-generic: on
master1
	tx-checksum-ip-generic: on
master2
	tx-checksum-ip-generic: on
worker0
	tx-checksum-ip-generic: on
worker1
	tx-checksum-ip-generic: on
worker2
	tx-checksum-ip-generic: on
worker3
	tx-checksum-ip-generic: on
worker4
	tx-checksum-ip-generic: on
worker5
	tx-checksum-ip-generic: on
[root@bastion mg]# 
~~~

( I has a must-gather and a sosreport from one master, ping me if needed)

as soon as I change this to off

~~~
[root@bastion mg]# for NODE in master0 master1 master2 worker0 worker1 worker2 worker3 worker4 worker5; do echo ${NODE};ssh -o StrictHostKeyChecking=no core@${NODE} sudo ethtool -K ens192 tx-checksum-ip-generic off;done
master0
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
master1
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
master2
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
worker0
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
worker1
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
worker2
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
worker3
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
worker4
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
worker5
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
[root@bastion mg]# 

~~~

everything is nearly instantaneous fine again and I can create new projects 

~~~
[root@bastion]# oc get co |grep -v "True        False         False"
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
[root@bastion]# oc get co 
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.24    True        False         False      3m21s
baremetal                                  4.7.24    True        False         False      9h
cloud-credential                           4.7.24    True        False         False      10h
cluster-autoscaler                         4.7.24    True        False         False      9h
config-operator                            4.7.24    True        False         False      9h
console                                    4.7.24    True        False         False      3m33s
csi-snapshot-controller                    4.7.24    True        False         False      7h26m
dns                                        4.7.24    True        False         False      9h
etcd                                       4.7.24    True        False         False      9h
image-registry                             4.7.24    True        False         False      9h
ingress                                    4.7.24    True        False         False      9h
insights                                   4.7.24    True        False         False      9h
kube-apiserver                             4.7.24    True        False         False      9h
kube-controller-manager                    4.7.24    True        False         False      9h
kube-scheduler                             4.7.24    True        False         False      9h
kube-storage-version-migrator              4.7.24    True        False         False      7h36m
machine-api                                4.7.24    True        False         False      9h
machine-approver                           4.7.24    True        False         False      9h
machine-config                             4.7.24    True        False         False      6h49m
marketplace                                4.7.24    True        False         False      77s
monitoring                                 4.7.24    True        False         False      3m4s
network                                    4.7.24    True        False         False      9h
node-tuning                                4.7.24    True        False         False      8h
openshift-apiserver                        4.7.24    True        False         False      3m51s
openshift-controller-manager               4.7.24    True        False         False      8h
openshift-samples                          4.7.24    True        False         False      8h
operator-lifecycle-manager                 4.7.24    True        False         False      9h
operator-lifecycle-manager-catalog         4.7.24    True        False         False      9h
operator-lifecycle-manager-packageserver   4.7.24    True        False         False      3m50s
service-ca                                 4.7.24    True        False         False      9h
storage                                    4.7.24    True        False         False      7h27m
[root@bastion]# 
~~~

Comment 46 Ronak Doshi 2021-08-25 19:38:46 UTC
(In reply to Joseph Callen from comment #25)
> Hi Ronak,
> 
> Do you have any idea why we are being required to disable
> `tx-checksum-ip-generic`. This looks similar to the previous VMXNET3 issue.
> https://bugzilla.redhat.com/show_bug.cgi?id=1941714

What exactly is the setup and what is the issue?

Couple of questions:
Are tunnels being used? If yes, what tunneling protocol is being used?
What is the destination port being used for the tunnel?
Does the vmxnet3 driver have the fix from PR 1941714?

Thanks,
Ronak

Comment 47 W. Trevor King 2021-08-25 20:27:45 UTC
Based on the impact statement in comment 43, we have stopped recommending folks update from versions that are not impacted to versions that are impacted [1].

[1]: https://github.com/openshift/cincinnati-graph-data/pull/1008

Comment 49 Joseph Callen 2021-08-25 22:06:10 UTC
(In reply to Ronak Doshi from comment #46)
> (In reply to Joseph Callen from comment #25)
> > Hi Ronak,
> > 
> > Do you have any idea why we are being required to disable
> > `tx-checksum-ip-generic`. This looks similar to the previous VMXNET3 issue.
> > https://bugzilla.redhat.com/show_bug.cgi?id=1941714
> 
> What exactly is the setup and what is the issue?

Similar to the last udp issue. Standard and Distributed vSwitch(s). NSX-T not effected - still the same CI setup using VMC.

> 
> Couple of questions:
> Are tunnels being used? If yes, what tunneling protocol is being used?
Yes, either VXLAN or GENEVE

> What is the destination port being used for the tunnel?
No idea - SDN folks on this BZ, please respond

> Does the vmxnet3 driver have the fix from PR 1941714?
Yes and even if it didn't we have a workaround in place to disable the previous issues with udp.

> 
> Thanks,
> Ronak

Based on Cathy's response of changes 8.4 could it be caused by:
https://github.com/torvalds/linux/commit/8a7f280f29a80f6e0798f5d6e07c5dd8726620fe#diff-db4c3dfb5fede7bacdecc2e2c486cb29369c21885ffa6ccb6cd4220c37b0fa75
or 
https://github.com/torvalds/linux/commit/1dac3b1bc66dc68dbb0c9f43adac71a7d0a0331a#diff-db4c3dfb5fede7bacdecc2e2c486cb29369c21885ffa6ccb6cd4220c37b0fa75

Ronak, can you see private comments? If not pasting from previous comment

[root@inf14:~] vsish -e cat /net/portsets/$(net-stats -l |grep master  |awk '{print $4}')/ports/$(net-stats -l |grep master  |awk '{print $1}')/vmxnet3/txSummary
stats of a vmxnet3 vNIC tx queue {
   generation:1424
   pkts tx ok:12564827
   bytes tx ok:6111084746
   TSO pkts tx ok:786793
   TSO bytes tx ok:4352028717
   unicast pkts tx ok:12564748
   unicast bytes tx ok:6111081428
   multicast pkts tx ok:0
   multicast bytes tx ok:0
   broadcast pkts tx ok:79
   broadcast bytes tx ok:3318
   pkts tx failure:0
   pkts discarded:341556 <-------------------- *******
   error when copying hdrs:0
   tso header errors:0
   pkt allocation failures:0
   # of times a tx queue is stopped:0
   failed to map some guest buffers:0
   tx completion failure due to stale enableGen:0
   giant tso pkts requiring more than 1 pkt handle:0
   failed to split a giant tso pkt:0
   giant non-tso pkts requiring more than 1 pkt handle:0
   failed to create a pkt from more than 1 pkt handle:0
   encap (outer) header errors:341556 <------------------------------******
   encap (inner) tso header errors:0
}

Comment 50 Ronak Doshi 2021-08-25 22:26:44 UTC
(In reply to Joseph Callen from comment #49)
> (In reply to Ronak Doshi from comment #46)
> > (In reply to Joseph Callen from comment #25)
> > > Hi Ronak,
> > > 
> > > Do you have any idea why we are being required to disable
> > > `tx-checksum-ip-generic`. This looks similar to the previous VMXNET3 issue.
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1941714
> > 
> > What exactly is the setup and what is the issue?
> 
> Similar to the last udp issue. Standard and Distributed vSwitch(s). NSX-T
> not effected - still the same CI setup using VMC.
> 
> > 
> > Couple of questions:
> > Are tunnels being used? If yes, what tunneling protocol is being used?
> Yes, either VXLAN or GENEVE
> 
> > What is the destination port being used for the tunnel?
> No idea - SDN folks on this BZ, please respond
> 
> > Does the vmxnet3 driver have the fix from PR 1941714?
> Yes and even if it didn't we have a workaround in place to disable the
> previous issues with udp.
> 
> > 
> > Thanks,
> > Ronak
> 
> Based on Cathy's response of changes 8.4 could it be caused by:
> https://github.com/torvalds/linux/commit/
> 8a7f280f29a80f6e0798f5d6e07c5dd8726620fe#diff-
> db4c3dfb5fede7bacdecc2e2c486cb29369c21885ffa6ccb6cd4220c37b0fa75
> or 
> https://github.com/torvalds/linux/commit/
> 1dac3b1bc66dc68dbb0c9f43adac71a7d0a0331a#diff-
> db4c3dfb5fede7bacdecc2e2c486cb29369c21885ffa6ccb6cd4220c37b0fa75
> 
> Ronak, can you see private comments? If not pasting from previous comment
> 
> [root@inf14:~] vsish -e cat /net/portsets/$(net-stats -l |grep master  |awk
> '{print $4}')/ports/$(net-stats -l |grep master  |awk '{print
> $1}')/vmxnet3/txSummary
> stats of a vmxnet3 vNIC tx queue {
>    generation:1424
>    pkts tx ok:12564827
>    bytes tx ok:6111084746
>    TSO pkts tx ok:786793
>    TSO bytes tx ok:4352028717
>    unicast pkts tx ok:12564748
>    unicast bytes tx ok:6111081428
>    multicast pkts tx ok:0
>    multicast bytes tx ok:0
>    broadcast pkts tx ok:79
>    broadcast bytes tx ok:3318
>    pkts tx failure:0
>    pkts discarded:341556 <-------------------- *******
>    error when copying hdrs:0
>    tso header errors:0
>    pkt allocation failures:0
>    # of times a tx queue is stopped:0
>    failed to map some guest buffers:0
>    tx completion failure due to stale enableGen:0
>    giant tso pkts requiring more than 1 pkt handle:0
>    failed to split a giant tso pkt:0
>    giant non-tso pkts requiring more than 1 pkt handle:0
>    failed to create a pkt from more than 1 pkt handle:0
>    encap (outer) header errors:341556 <------------------------------******
>    encap (inner) tso header errors:0
> }

I cannot see private comments.

Based on the counters it seems something was not as expected in the encapsulation header. In the previous udp issue, it was the destination port. So, I would link to know what destination port is used here.

> > Does the vmxnet3 driver have the fix from PR 1941714?
> Yes and even if it didn't we have a workaround in place to disable the
> previous issues with udp.
Btw, if I remember correctly, the fix was that tunnel offloads were disabled in the previous PR. If so, then how are tunnel offloads enabled here? Shouldn't they be disabled?

Also, is NSX-T installed here?

Thanks,
Ronak

Comment 51 Ronak Doshi 2021-08-25 22:28:50 UTC
Also, packet capture (with --ng option) as done in PR 1941714 would be helpful and appreciated.

Comment 52 Joseph Callen 2021-08-25 22:39:25 UTC
Hi Daniel,
Can you answer Ronak's questions regarding your OCP and vSphere cluster specifics?
Can you also provide `ethtool -k ens192` and `uname -a` for that master
Thanks!

Comment 53 Scott Dodson 2021-08-26 00:46:41 UTC
In all cases the kernel is 4.18.0-305.10.2.el8_4 or 4.18.0-305.12.1.el8_4.

Comment 54 David J. M. Karlsen 2021-08-26 00:50:51 UTC
@sdodson@redhat.com isn't that given the OCP version as MCO handles the masters?
In our case we'll have workers on RHEL 7.x  3.10.0-1160.36.2.el7.x86_64 while masters on 4.18.0-305.10.2.el8_4.x86_64 RHCOS.

Comment 55 Scott Dodson 2021-08-26 01:33:33 UTC
(In reply to David J. M. Karlsen from comment #54)
> @sdodson@redhat.com isn't that given the OCP version as MCO handles the
> masters?
> In our case we'll have workers on RHEL 7.x  3.10.0-1160.36.2.el7.x86_64
> while masters on 4.18.0-305.10.2.el8_4.x86_64 RHCOS.

Sure, but no where else in this bug has it been mentioned that RHEL7 workers are involved. I think we'd probably want a unique bug to track that variant as it may require a RHEL7 kernel fix in the end. We'll also want to verify that the problem exists between two RHEL7 workers and not just between RHCOS control plane and RHEL7 workers.

Comment 58 daniel 2021-08-26 07:41:06 UTC
Thanks Jatan,

just to make my data complete:


# cat etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="8.4 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.4"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.4 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.4:GA"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.4"
# 
# cat uname 
Linux master1.ocp4-csa.coe.muc.redhat.com 4.18.0-305.10.2.el8_4.x86_64 #1 SMP Mon Jul 12 04:43:18 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

#cat ethtool_-k_ens192
Features for ens192:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off
	tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off
tx-udp_tnl-csum-segmentation: off
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
rx-gro-list: off
tls-hw-rx-offload: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


I've got a must-gather and a sos-report from this cluster I am going to attach

Comment 62 daniel 2021-08-26 14:18:28 UTC
(In reply to Ronak Doshi from comment #51)
> Also, packet capture (with --ng option) as done in PR 1941714 would be
> helpful and appreciated.

I have captured data like so https://bugzilla.redhat.com/show_bug.cgi?id=1941714#c10
after I moved all masters to one esxi

and I am happy to provide the data, however it is too big to attach it to this bz

If something esle should be captured, pls let me know and be as specific as possible as I am neither a VMWare admin nor a NW guy ;)

Comment 64 Ronak Doshi 2021-08-26 17:16:56 UTC
(In reply to daniel from comment #62)
> (In reply to Ronak Doshi from comment #51)
> > Also, packet capture (with --ng option) as done in PR 1941714 would be
> > helpful and appreciated.
> 
> I have captured data like so
> https://bugzilla.redhat.com/show_bug.cgi?id=1941714#c10
> after I moved all masters to one esxi
> 
> and I am happy to provide the data, however it is too big to attach it to
> this bz
> 
> If something esle should be captured, pls let me know and be as specific as
> possible as I am neither a VMWare admin nor a NW guy ;)

Based on comment 58,

tx-udp_tnl-segmentation: off
tx-udp_tnl-csum-segmentation: off

The overlay offloads are disabled. This means stack will calculate inner header checksums. So, I am not able to understand, how the packets are requesting offloads.

The stats shared in comment 49 are for ens192 right?
[root@inf14:~] vsish -e cat /net/portsets/$(net-stats -l |grep master  |awk '{print $4}')/ports/$(net-stats -l |grep master  |awk '{print $1}')/vmxnet3/txSummary
stats of a vmxnet3 vNIC tx queue {
...
   pkts discarded:341556 <------------------------------******
   encap (outer) header errors:341556 <------------------------------******
...
}

If so, could you capture packets on ens192 using tcpdump inside the vm when you see the issue?

Thanks,
Ronak

Comment 71 Scott Dodson 2021-08-27 15:55:00 UTC
This bug is progressing toward closure via a workaround deployed in OpenShift, we've opened Bug 1998572 to track kernel fix so that we may remove the workaround in future OpenShift versions enabling OpenShift to make use of default offload feature set provided by vmxnet3 driver.

Comment 73 weiguo fan 2021-08-30 06:51:56 UTC
(In reply to Scott Dodson from comment #71)
Dear Red Hat,

> This bug is progressing toward closure via a workaround deployed in
> OpenShift, we've opened Bug 1998572 to track kernel fix so that we may
> remove the workaround in future OpenShift versions enabling OpenShift to
> make use of default offload feature set provided by vmxnet3 driver.

we think it should backport to OCP4.7 and OCP4.8 after bug 1998572 is fixed.
Does Red Hat have plan for that?
Is there any ticket to track for that?

Regards.

Comment 74 Antonio Ojea 2021-08-30 07:41:10 UTC
*** Bug 1993153 has been marked as a duplicate of this bug. ***

Comment 75 Scott Dodson 2021-08-30 14:48:22 UTC
(In reply to weiguo fan from comment #73)
> (In reply to Scott Dodson from comment #71)
> Dear Red Hat,
> 
> > This bug is progressing toward closure via a workaround deployed in
> > OpenShift, we've opened Bug 1998572 to track kernel fix so that we may
> > remove the workaround in future OpenShift versions enabling OpenShift to
> > make use of default offload feature set provided by vmxnet3 driver.
> 
> we think it should backport to OCP4.7 and OCP4.8 after bug 1998572 is fixed.
> Does Red Hat have plan for that?
> Is there any ticket to track for that?
> 
> Regards.

The workaround has already been backported to 4.8 and 4.7. When a kernel fix becomes available that removes the need for these workarounds we will confirm that it fixes the problem in all relevant versions of OpenShift and the workaround will be removed after that.

Comment 76 weiguo fan 2021-08-31 02:18:34 UTC
> The workaround has already been backported to 4.8 and 4.7. When a kernel fix
> becomes available that removes the need for these workarounds we will
> confirm that it fixes the problem in all relevant versions of OpenShift and
> the workaround will be removed after that.

Thanks for the information, Scott.

Cloud you kindly let us know the 4.8 and 4.7 versions that the workaround is included?

Regards.

Comment 77 W. Trevor King 2021-08-31 04:57:11 UTC
The 4.8 workaround is being tracked in bug 1998106.  The 4.7 workaround is being tracked in bug 1998112.  Both are likely to go out with the next supported release in their respective z streams, but neither has been released yet.

Comment 79 Michelle Krejci 2021-09-02 17:31:17 UTC
*** Bug 1993723 has been marked as a duplicate of this bug. ***

Comment 80 Scott Dodson 2021-09-02 19:05:12 UTC
Workaround for those who cannot immediately upgrade is to disable tx-checksum-ip-generic on vmxnet3 interfaces, ex:

ethtool -K ens192 tx-checksum-ip-generic off

Comment 82 Khizer Naeem 2021-09-14 11:29:51 UTC
*** Bug 1996577 has been marked as a duplicate of this bug. ***

Comment 83 errata-xmlrpc 2021-11-01 01:36:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.