Bug 1873114

Summary:

Nodes goes into NotReady state (VMware)

Product:

OpenShift Container Platform

Reporter:

Juan Manuel Parrilla Madrid <jparrill>

Component:

Node

Assignee:

Ryan Phillips <rphillips>

Node sub component:

Kubelet

QA Contact:

Sunil Choudhary <schoudha>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

urgent

Priority:

urgent

CC:

aarapov, abhinkum, abraj, achakrat, adeshpan, agawand, amsingh, andcosta, aos-bugs, aprajapa, asheth, asm, aygarg, bleanhar, ccoleman, ChetRHosey, cpassare, dbenoit, deparker, dphillip, emachado, fjaspe, igreen, jmalde, jokerman, kahara, kangell, kechung, lszaszki, mchebbi, mdhanve, mdunnett, mfuruta, mnunes, mrobson, nagrawal, namato, naygupta, ngirard, nijoshi, oarribas, openshift-bugs-escalate, pchavan, pkhaire, prdeshpa, rbryant, rcarrier, rdave, rdiscala, rheinzma, rpalathi, rphillips, rupatel, ryan.phillips, sagopina, sbhavsar, smaudet, sople, sparpate, srengan, ssonigra, tschelle, tstellar, tsweeney, vjaypurk, weinliu

Version:

4.5

Keywords:

Bugfix

Target Milestone:

---

Flags:

rcarrier: needinfo-

Target Release:

4.7.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: HTTP/2 transports did not have the correct options attached to the connections to provide timeout logic. VMWare network interfaces (and other scenarios) would blip for a few seconds causing connections to fail silently. Consequence: This resulted in connections lingering around causing other related failures; for instance: nodes not being detected down, API calls using stale connections and failing, etc. Fix: The fix was to add proper timeouts for <= 4.7. Upstream has a more permanent fix which will roll out in >= 4.8. Result: HTTP/2 connections within the system are more reliable, and side-effects are mitigated.

Story Points:

---

Clone Of:

Clones:

1873949 1901208 (view as bug list)

Environment:

Last Closed:

2021-02-24 15:16:22 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1857446

Bug Blocks:

1901208

Attachments:

Description	Flags
Hub Kubelet Master-0	none

Description Juan Manuel Parrilla Madrid 2020-08-27 12:16:13 UTC

Description of problem:

OCP nodes goes into NotReady state after X hours after the deployment of the cluster. The environment is an IPv6 under Baremetal and these are the log traces of the node.

Kubelet log from the affected node:


```
Aug 27 09:47:18 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: E0827 09:47:18.442090    2595 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com": an error on the server ("") has prevented the request from succeeding (get nodes openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com)
Aug 27 09:47:18 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: E0827 09:47:18.528281    2595 webhook.go:111] Failed to make webhook authenticator request: Post https://api-int.mgmt-spoke2.e2e.bos.redhat.com:6443/apis/authentication.k8s.io/v1/tokenreviews: write tcp [2620:52:0:1304::3]:50606->[2620:52:0:1304::3]:6443: use of closed network connection
Aug 27 09:47:18 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: E0827 09:47:18.528356    2595 server.go:253] Unable to authenticate the request due to an error: Post https://api-int.mgmt-spoke2.e2e.bos.redhat.com:6443/apis/authentication.k8s.io/v1/tokenreviews: write tcp [2620:52:0:1304::3]:50606->[2620:52:0:1304::3]:6443: use of closed network connection
Aug 27 09:47:18 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: I0827 09:47:18.528456    2595 httplog.go:90] verb="GET" URI="/metrics" latency=345.237µs resp=401 UserAgent="Prometheus/2.15.2" srcIP="[fd99::2]:42934":
Aug 27 09:47:18 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Response).WriteErrorString(0xc00386dd40, 0x191, 0x439079a, 0xc, 0xc00571f7f8, 0x1)
Aug 27 09:47:18 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: E0827 09:47:18.877840    2595 reflector.go:178] object-"openshift-apiserver"/"config": Failed to list *v1.ConfigMap: an error on the server ("") has prevented the request from succeeding (get configmaps)
Aug 27 09:47:19 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: I0827 09:47:19.020869    2595 prober.go:166] Exec-Probe Pod: ovnkube-master-g4n6r, Container: sbdb, Command: [/bin/bash -c set -xe
Aug 27 09:47:19 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: exec /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound  2>/dev/null | grep ${K8S_NODE_IP} | grep -v Address -q
Aug 27 09:47:19 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: E0827 09:47:19.068332    2595 reflector.go:178] object-"openshift-insights"/"openshift-insights-serving-cert": Failed to list *v1.Secret: an error on the server ("") has prevented the request from succeeding (get secrets)
Aug 27 09:47:19 openshift-master-2.mgmt-spoke2.e2e.bos.redhat.com hyperkube[2595]: I0827 09:47:19.110894    2595 exec.go:60] Exec probe response: "+ grep 2620:52:0:1304::7\n+ exec /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound\n+ grep -v Address -q\n"
```

Context:

- OCP Version: 4.5.z (tested in all versions and is fully reproducible)
- IP Stack: Full IPv6
- Installation Method: Bare Metal IPI


How reproducible:

- Deploy 4.5.z on BareMetal under IPv6 stack (we have an environment with 4.5.5 deployed if you're interested to reproduce it)
- If there are NotReady nodes at first, check the Kubelet logs on that node and see the error.
- If there is not any node in NotReady, we should wait some time and eventually will happen

Actual results:

- Nodes in NotReady state makes the cluster unstable

Expected results:

- All nodes in Ready state


Additional info:

- There are other errors that maybe are related:

```
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: E0827 11:12:33.530877  420869 server.go:253] Unable to authenticate the request due to an error: Post https://api-int.mgmt-hub.e2e.bos.redhat.com:6443/apis/authentication.k8s.io/v1/tokenreviews: write tcp [2620:52:0:1302::5]:52114->[2620:52:0:1302::3]:6443: use of closed network connection
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: I0827 11:12:33.530952  420869 httplog.go:90] verb="GET" URI="/metrics" latency=329.398µs resp=401 UserAgent="Prometheus/2.15.2" srcIP="[2620:52:0:1302::6]:55162":
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: goroutine 36777 [running]:
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc003990620, 0x191)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:225 +0xc8
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc003990620, 0x191)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:204 +0x35
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Response).WriteHeader(...)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/response.go:201
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Response).WriteErrorString(0xc006ac9920, 0x191, 0x439079a, 0xc, 0xc001fc97f8, 0x1)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/response.go:181 +0x51
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server.(*Server).InstallAuthFilter.func1(0xc0055827e0, 0xc006ac9920, 0xc0055826c0)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server/server.go:254 +0x122
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*FilterChain).ProcessFilter(0xc0055826c0, 0xc0055827e0, 0xc006ac9920)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/filter.go:19 +0x65
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).HandleWithFilter.func1(0x4ba22a0, 0xc003990620, 0xc001fcf900)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:313 +0x268
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: net/http.HandlerFunc.ServeHTTP(0xc0008dcb40, 0x4ba22a0, 0xc003990620, 0xc001fcf900)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /usr/lib/golang/src/net/http/server.go:2007 +0x44
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: net/http.(*ServeMux).ServeHTTP(0xc000f50000, 0x4ba22a0, 0xc003990620, 0xc001fcf900)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /usr/lib/golang/src/net/http/server.go:2387 +0x1bd
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).ServeHTTP(0xc0009e2480, 0x4ba22a0, 0xc003990620, 0xc001fcf900)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:292 +0x4d
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x4ba6160, 0xc00176da40, 0xc001fcf800)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:89 +0x2ca
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: net/http.HandlerFunc.ServeHTTP(0xc003440340, 0x4ba6160, 0xc00176da40, 0xc001fcf800)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /usr/lib/golang/src/net/http/server.go:2007 +0x44
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server.(*Server).ServeHTTP(0xc00015a190, 0x4ba6160, 0xc00176da40, 0xc001fcf800)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /builddir/build/BUILD/openshift-git-0.a5dde9c/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server/server.go:923 +0x60d
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: net/http.serverHandler.ServeHTTP(0xc000dacb60, 0x4ba6160, 0xc00176da40, 0xc001fcf800)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /usr/lib/golang/src/net/http/server.go:2802 +0xa4
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: net/http.(*conn).serve(0xc000744dc0, 0x4bd6a60, 0xc003f39b40)
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /usr/lib/golang/src/net/http/server.go:1890 +0x875
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: created by net/http.(*Server).Serve
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]:         /usr/lib/golang/src/net/http/server.go:2927 +0x38e
Aug 27 11:12:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: logging error output: "Unauthorized"
```

```
Aug 27 12:14:32 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Response).WriteErrorString(0xc002edb860, 0x191, 0x439079a, 0xc, 0xc0032777f8, 0x1)
Aug 27 12:14:32 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: E0827 12:14:32.952531  420869 reflector.go:178] object-"openshift-machine-config-operator"/"machine-config-operator-images": Failed to list *v1.ConfigMap: an error on the server ("") has prevented the request from succeeding (get configmaps)
Aug 27 12:14:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: I0827 12:14:33.166040  420869 prober.go:166] Exec-Probe Pod: ovs-node-dnj52, Container: ovs-daemons, Command: [/usr/share/openvswitch/scripts/ovs-ctl status]
Aug 27 12:14:33 openshift-master-0.mgmt-hub.e2e.bos.redhat.com hyperkube[420869]: E0827 12:14:33.235487  420869 reflector.go:178] object-"openshift-ovn-kubernetes"/"ovn-ca": Failed to list *v1.ConfigMap: an error on the server ("") has prevented the request from succeeding (get configmaps)
```

Comment 1 Juan Manuel Parrilla Madrid 2020-08-27 12:24:02 UTC

I forgot to mention, if you restart the Kubelet the node comes alive again (from the OCP perspective) and start reporting the status but eventually gets in NotReady state again

Comment 2 Seth Jennings 2020-08-27 14:48:35 UTC

This looks like a golang issue
https://github.com/kubernetes/kubernetes/issues/87615
https://github.com/golang/go/issues/40201

Comment 3 Russell Bryant 2020-08-27 18:50:01 UTC

(In reply to Seth Jennings from comment #2)
> This looks like a golang issue
> https://github.com/kubernetes/kubernetes/issues/87615
> https://github.com/golang/go/issues/40201

Thanks.  I checked the kernel log on a host that was seeing this issue and observed a lot of this in the log, which seems to support this being the same issue as the one you linked.

[84057.832697] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[84057.861232] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84057.988240] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[84058.010442] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84129.485448] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[84129.526182] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84219.127096] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[84219.156378] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84350.926777] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[84350.963246] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[106613.643101] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[106613.677715] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[106614.409979] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[106614.430376] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[106614.783140] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[106614.824764] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[106615.445239] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[106615.473290] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[106793.931631] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[106793.968889] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Comment 4 Russell Bryant 2020-08-27 20:52:18 UTC

(In reply to Russell Bryant from comment #3)
> (In reply to Seth Jennings from comment #2)
> > This looks like a golang issue
> > https://github.com/kubernetes/kubernetes/issues/87615
> > https://github.com/golang/go/issues/40201
> 
> Thanks.  I checked the kernel log on a host that was seeing this issue and
> observed a lot of this in the log, which seems to support this being the
> same issue as the one you linked.
> 
> [84057.832697] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [84057.861232] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [84057.988240] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [84058.010442] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [84129.485448] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [84129.526182] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [84219.127096] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [84219.156378] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [84350.926777] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [84350.963246] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [106613.643101] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [106613.677715] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [106614.409979] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [106614.430376] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [106614.783140] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [106614.824764] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [106615.445239] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [106615.473290] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [106793.931631] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [106793.968889] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

but turns out that's not actually the network interface in use.  The beginning of when the "use of closed network connection" errors is no longer in the journal, so I don't think I can pinpoint what happened to trigger it.  Hopefully we can catch more logs around the time this starts next time since this seems easy to reproduce in this lab for some reason.

Comment 5 Juan Manuel Parrilla Madrid 2020-08-28 12:17:28 UTC

Created attachment 1712946 [details]
Hub Kubelet Master-0

Adding a log file from affected node where, we restart the kubelet and fails again (look for "Unauthorized")

Comment 6 Russell Bryant 2020-08-28 12:50:14 UTC

I'm setting this to a high priority / severity since when this occurs, a Node is in a very broken state until kubelet gets restarted.

Comment 7 Juan Manuel Parrilla Madrid 2020-08-31 07:17:04 UTC

If It's necessary the environment could be redeployed to reproduce the state and try to fix the bug.

Comment 8 Juan Manuel Parrilla Madrid 2020-08-31 15:40:34 UTC

As you know, to maintain this environment in this state with the error replication and so on, will block us in some ways, then we could have this environment as it is for ~8 more bussiness days, from that date we will repurpose the envieonment for other focus. In order to change the state to replicate the issue will cost us like ~3-5 bussiness days, just to let you know ;)

Comment 9 davis phillips 2020-08-31 21:13:16 UTC

*** Bug 1872734 has been marked as a duplicate of this bug. ***

Comment 10 Seth Jennings 2020-09-08 20:10:15 UTC

*** Bug 1875236 has been marked as a duplicate of this bug. ***

Comment 17 Seth Jennings 2020-09-25 18:59:29 UTC

Russell Bryant works on/with the bare-metal team, where NIC resets are most likely to occur vs other platforms.  He should be able to recreate the issue.

Comment 38 Seth Jennings 2020-10-06 16:34:09 UTC

for testing if upstream fix addresses this issue
https://github.com/openshift/kubernetes/pull/398

Comment 71 Peter Hunt 2020-11-17 15:26:38 UTC

*** Bug 1873949 has been marked as a duplicate of this bug. ***

Comment 78 Ryan Phillips 2020-11-20 15:49:32 UTC

*** Bug 1893511 has been marked as a duplicate of this bug. ***

Comment 79 Ryan Phillips 2020-11-20 15:54:34 UTC

*** Bug 1899109 has been marked as a duplicate of this bug. ***

Comment 116 mchebbi@redhat.com 2021-01-19 08:39:28 UTC

Hello,
This bug is to get the fix backported to OCP 4.5, 
Could you please confirm the fix release date for OCP 4.5 ?
Thanks in advance.

Comment 117 oarribas 2021-01-20 16:25:28 UTC

The BZ for OCP 4.5 is [1], and it's already fixed in OCP 4.5.27


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1907938

Comment 121 errata-xmlrpc 2021-02-24 15:16:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 122 Elana Hashman 2021-03-18 20:45:06 UTC

*** Bug 1934422 has been marked as a duplicate of this bug. ***

Comment 123 Ryan Phillips 2021-04-30 17:35:51 UTC

*** Bug 1953678 has been marked as a duplicate of this bug. ***

Comment 124 Ryan Phillips 2021-05-24 14:00:55 UTC

*** Bug 1953678 has been marked as a duplicate of this bug. ***

Comment 129 Ryan Phillips 2021-10-13 14:34:06 UTC

*** Bug 1953678 has been marked as a duplicate of this bug. ***

Comment 130 Red Hat Bugzilla 2023-09-15 01:30:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days