1934782 – Unable to apply 4.7.0: SDN unavailable on upgrade – Was: the cluster operator kube-storage-version-migrator has not yet successfully rolled out

Bug 1934782 - Unable to apply 4.7.0: SDN unavailable on upgrade – Was: the cluster operator kube-storage-version-migrator has not yet successfully rolled out

Summary: Unable to apply 4.7.0: SDN unavailable on upgrade – Was: the cluster operator...

Keywords:
Status:	CLOSED DUPLICATE of bug 1907353
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	jamo luhrsen
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1941740
TreeView+	depends on / blocked

Reported:	2021-03-03 20:02 UTC by mchebbi@redhat.com
Modified:	2024-10-01 17:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1941740 (view as bug list)
Environment:
Last Closed:	2021-03-23 17:57:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 1028	0	None	open	Bug 1934782: Remove OVS daemonsets	2021-03-22 17:34:04 UTC

Description mchebbi@redhat.com 2021-03-03 20:02:18 UTC

gathered data url: shorturl.at/anF36

Hello,
The cluster started at OCP 4.5.3 with RHEL 7.8 worker nodes. The cluster was upgraded to OCP 4.5.17, then OCP 4.6.17, without upgrading the worker nodes. Then on upgrading to OCP 4.7.0, the upgrade stalled at 29/31 ClusterOperators. This is where we thought that the workers might be the issue.


[mchebbi@fedora 02880221]$ omg get co|awk '!/4.7.0.*True.*False.* False/{print}'
NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.7.0    False      True         True      4d
console                                   4.7.0    False      True         True      2d
dns                                       4.6.17   True       False        True      4d
ingress                                   4.7.0    False      True         True      3d
kube-storage-version-migrator             4.7.0    False      False        False     2d
machine-config                            4.6.17   True       False        False     3d
monitoring                                4.7.0    False      True         True      3m10s
network                                   4.7.0    False      True         True      3d
[mchebbi@fedora 02880221]$ omg get nodes
NAME                                                        STATUS  ROLES   AGE  VERSION
armstrong-master1.armstrong.scale-ocp.tuc.stglabs.ibm.com   Ready   master  19d  v1.19.0+e405995
armstrong-master2.armstrong.scale-ocp.tuc.stglabs.ibm.com   Ready   master  19d  v1.19.0+e405995
armstrong-master3.armstrong.scale-ocp.tuc.stglabs.ibm.com   Ready   master  19d  v1.19.0+e405995
armstrong-compute4.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute9.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute7.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute5.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute8.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute1.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute6.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
armstrong-compute2.armstrong.scale-ocp.tuc.stglabs.ibm.com  Ready   worker  17d  v1.20.0+ba45583
[mchebbi@fedora 02880221]$ 

[mchebbi@fedora 02880221]$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         1m14s  Unable to apply 4.7.0: the cluster operator kube-storage-version-migrator has not yet successfully rolled out
[mchebbi@fedora 02880221]$ omg get mcp
NAME    CONFIG                                            UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
worker  rendered-worker-6c9fded112206b678702e275af3d315d  False    True      False     8             0                  0                    0                     19d
master  rendered-master-8859d06803f06500f6cc324d4a0da142  True     False     False     3             3                  3                    0                     19d
[mchebbi@fedora 02880221]$
=======================================================================================================
kind: ClusterVersion
status:
  availableUpdates: null
  conditions:
  - lastTransitionTime: '2020-05-11T22:25:12Z'
    message: Done applying 4.6.17
    status: 'True'
    type: Available
  - lastTransitionTime: '2021-03-01T16:10:17Z'
    message: Cluster operator kube-storage-version-migrator is not available
    reason: ClusterOperatorNotAvailable
    status: 'True'
    type: Failing
  - lastTransitionTime: '2021-02-23T19:32:51Z'
    message: 'Unable to apply 4.7.0: the cluster operator kube-storage-version-migrator
      has not yet successfully rolled out'
    reason: ClusterOperatorNotAvailable
    status: 'True'
    type: Progressing
  - lastTransitionTime: '2021-02-28T17:45:18Z'
    status: 'True'
    type: RetrievedUpdates
[
==============================================================================================================================================================================================================
[mchebbi@fedora 02880221]$ omg logs machine-config-controller-78f848949d-j56gg -n openshift-machine-config-operator

2021-02-26T05:47:08.621885708Z E0226 05:47:08.621811       1 render_controller.go:460] Error updating MachineConfigPool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
2021-02-26T05:47:08.621885708Z I0226 05:47:08.621847       1 render_controller.go:377] Error syncing machineconfigpool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
2021-02-26T05:48:14.006754288Z I0226 05:48:14.006654       1 node_controller.go:419] Pool worker: node armstrong-compute8.armstrong.scale-ocp.tuc.stglabs.ibm.com: Reporting unready: node armstrong-compute8.armstrong.scale-ocp.tuc.stglabs.ibm.com is reporting NotReady=False
2021-02-26T05:48:14.250008768Z I0226 05:48:14.249945       1 node_controller.go:419] Pool worker: node armstrong-compute8.armstrong.scale-ocp.tuc.stglabs.ibm.com: Reporting ready
==============================================================================================================================================================================================================
$ omg logs machine-config-daemon-5678p -c machine-config-daemon -n openshift-machine-config-operator

2021-03-01T09:12:41.157100204-07:00 E0301 16:12:41.157048    5387 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:02.864606563-07:00 I0301 16:13:02.864505    5387 trace.go:205] Trace[1494593888]: "Reflector ListAndWatch" name:github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101 (01-Mar-2021 16:12:32.863) (total time: 30000ms):
2021-03-01T09:13:02.864606563-07:00 Trace[1494593888]: [30.000930122s] [30.000930122s] END
2021-03-01T09:13:02.864606563-07:00 E0301 16:13:02.864549    5387 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
==============================================================================================================================================================================================================

omg logs machine-config-daemon-gr9lp -c machine-config-daemon -n openshift-machine-config-operator
2021-03-01T09:12:56.649503782-07:00 Trace[17984558]: [30.001042548s] [30.001042548s] END
2021-03-01T09:12:56.649503782-07:00 E0301 16:12:56.649465    4835 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:20.774154502-07:00 I0301 16:13:20.774071    4835 trace.go:205] Trace[1944204424]: "Reflector ListAndWatch" name:github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101 (01-Mar-2021 16:12:50.773) (total time: 30000ms):
2021-03-01T09:13:20.774154502-07:00 Trace[1944204424]: [30.000942709s] [30.000942709s] END
2021-03-01T09:13:20.774154502-07:00 E0301 16:13:20.774135    4835 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
==============================================================================================================================================================================================================
omg logs machine-config-daemon-hl7vr -c machine-config-daemon -n openshift-machine-config-operator
2021-03-01T09:12:08.300418327-07:00 E0301 16:12:08.300343    5627 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:12.202071179-07:00 I0301 16:13:12.202004    5627 trace.go:205] Trace[286360012]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Mar-2021 16:12:42.200) (total time: 30001ms):
2021-03-01T09:13:12.202071179-07:00 Trace[286360012]: [30.001272833s] [30.001272833s] END
2021-03-01T09:13:12.202071179-07:00 E0301 16:13:12.202050    5627 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
==============================================================================================================================================================================================================

omg logs machine-config-daemon-nxt4k -c machine-config-daemon -n openshift-machine-config-operator

2021-03-01T09:11:40.319647125-07:00 E0301 16:11:40.319581    5154 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:12:44.233967824-07:00 I0301 16:12:44.233895    5154 trace.go:205] Trace[286360012]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Mar-2021 16:12:14.232) (total time: 30001ms):
2021-03-01T09:12:44.233967824-07:00 Trace[286360012]: [30.001194859s] [30.001194859s] END
2021-03-01T09:12:44.233967824-07:00 E0301 16:12:44.233941    5154 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:05.901916947-07:00 I0301 16:13:05.901818    5154 trace.go:205] Trace[1494593888]: "Reflector ListAndWatch" name:github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101 (01-Mar-2021 16:12:35.900) (total time: 30000ms):
2021-03-01T09:13:05.901916947-07:00 Trace[1494593888]: [30.000897633s] [30.000897633s] END
2021-03-01T09:13:05.901916947-07:00 E0301 16:13:05.901862    5154 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
==============================================================================================================================================================================================================

omg logs machine-config-daemon-rqdqp -c machine-config-daemon -n openshift-machine-config-operator

2021-03-01T09:11:37.138756188-07:00 E0301 16:11:37.138682    5394 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:12:41.012896234-07:00 I0301 16:12:41.012793    5394 trace.go:205] Trace[286360012]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Mar-2021 16:12:11.010) (total time: 30001ms):
2021-03-01T09:12:41.012896234-07:00 Trace[286360012]: [30.001861719s] [30.001861719s] END
2021-03-01T09:12:41.012896234-07:00 E0301 16:12:41.012871    5394 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:02.721063984-07:00 I0301 16:13:02.720995    5394 trace.go:205] Trace[1494593888]: "Reflector ListAndWatch" name:github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101 (01-Mar-2021 16:12:32.720) (total time: 30000ms):
2021-03-01T09:13:02.721063984-07:00 Trace[1494593888]: [30.000916976s] [30.000916976s] END
2021-03-01T09:13:02.721063984-07:00 E0301 16:13:02.721039    5394 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
==============================================================================================================================================================================================================
$ omg logs machine-config-daemon-x8jct -c machine-config-daemon -n openshift-machine-config-operator

2021-03-01T09:11:38.902182585-07:00 E0301 16:11:38.902133    5498 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:12:02.998255159-07:00 I0301 16:12:02.998171    5498 trace.go:205] Trace[1944204424]: "Reflector ListAndWatch" name:github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101 (01-Mar-2021 16:11:32.997) (total time: 30000ms):
2021-03-01T09:12:02.998255159-07:00 Trace[1944204424]: [30.000821765s] [30.000821765s] END
2021-03-01T09:12:02.998255159-07:00 E0301 16:12:02.998217    5498 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:06.896690723-07:00 I0301 16:13:06.896620    5498 trace.go:205] Trace[286360012]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Mar-2021 16:12:36.895) (total time: 30001ms):
2021-03-01T09:13:06.896690723-07:00 Trace[286360012]: [30.001096678s] [30.001096678s] END
2021-03-01T09:13:06.896690723-07:00 E0301 16:13:06.896666    5498 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-03-01T09:13:28.581359027-07:00 I0301 16:13:28.581300    5498 trace.go:205] Trace[1494593888]: "Reflector ListAndWatch" name:github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101 (01-Mar-2021 16:12:58.579) (total time: 30001ms):
2021-03-01T09:13:28.581359027-07:00 Trace[1494593888]: [30.001700434s] [30.001700434s] END
2021-03-01T09:13:28.581424994-07:00 E0301 16:13:28.581341    5498 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
==============================================================================================================================================================================================================
[mchebbi@fedora 02880221]$ omg get pods -n openshift-kube-storage-version-migrator
NAME                      READY  STATUS   RESTARTS  AGE
migrator-9d6c8f546-qxb2t  0/1    Pending  0         3d
[mchebbi@fedora 02880221]$ omg get pods -n openshift-kube-storage-version-migrator
NAME                      READY  STATUS   RESTARTS  AGE
migrator-9d6c8f546-qxb2t  0/1    Pending  0         3d
[mchebbi@fedora 02880221]$

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: '2021-02-26T06:10:11Z'
    status: 'True'
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: '2021-02-26T06:10:11Z'
    message: 'containers with unready status: [migrator]'
    reason: ContainersNotReady
    status: 'False'
    type: Ready

[mchebbi@fedora 02880221]$ omg logs kube-storage-version-migrator-operator-84db77494d-ps9kc -n openshift-kube-storage-version-migrator-operator

2021-02-27T05:08:26.570320753Z I0227 05:08:26.570216       1 status_controller.go:172] clusteroperator/kube-storage-version-migrator diff {"status":{"conditions":[{"lastTransitionTime":"2020-06-16T17:16:14Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-02-27T05:08:26Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-02-25T19:42:07Z","message":"Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available","reason":"_NoMigratorPod","status":"False","type":"Available"},{"lastTransitionTime":"2020-05-09T22:44:57Z","reason":"NoData","status":"Unknown","type":"Upgradeable"}]}}
2021-02-27T05:08:26.579152789Z I0227 05:08:26.579060       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-storage-version-migrator-operator", Name:"kube-storage-version-migrator-operator", UID:"97c0127b-e40f-405e-b140-2a4a2f2f585d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/kube-storage-version-migrator changed: Degraded message changed from "TargetDegraded: \"kube-storage-version-migrator/namespace.yaml\" (string): etcdserver: leader changed\nTargetDegraded: " to "All is well",Progressing changed from True to False ("All is well")


===========================================================
I have asked customer delete the kube-storage-version-migrator-operator pod and its ReplicaSets and this triggers the update to progress but it stuck stuck at the same place again.
===============

[root@armstrong-inf ~]# date; oc get clusterversion
Wed Mar  3 09:58:59 MST 2021
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.17    True        True          7d21h   Working towards 4.7.0: 200 of 668 done (29% complete)
[root@armstrong-inf ~]# date; oc get clusterversion
Wed Mar  3 10:05:03 MST 2021
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.17    True        True          7d21h   Unable to apply 4.7.0: the cluster operator kube-storage-version-migrator has not yet successfully rolled out

Comment 1 Stefan Schimanski 2021-03-04 09:06:32 UTC

Kube-storage-migration-operator is the smallest of your problems. It's on the far end of the root cause chain. 

  dial tcp 172.30.0.1:443: i/o timeout

This suggests that networking is broken.

Comment 2 mchebbi@redhat.com 2021-03-04 11:19:44 UTC

(In reply to Stefan Schimanski from comment #1)
> Kube-storage-migration-operator is the smallest of your problems. It's on
> the far end of the root cause chain. 
> 
>   dial tcp 172.30.0.1:443: i/o timeout
> 
> This suggests that networking is broken.

Thanks Stefan for your feedback.

Could you tell me how to fix the issue.
Thanks in advance for your help.

Comment 3 Nicolas Nosenzo 2021-03-08 11:32:30 UTC

Just adding a bit more info about Moez case.
openshift-sdn:

~~~
NAME                  READY  STATUS   RESTARTS  AGE
ovs-455qw             1/1    Running  0         5d
ovs-5dntd             0/1    Running  1544      6d
ovs-8h26z             0/1    Running  1544      5d
ovs-crxx4             0/1    Running  1546      6d
ovs-lft66             0/1    Running  1545      6d
ovs-lwbl9             0/1    Running  1545      7d
ovs-nd5bw             0/1    Running  1544      6d
ovs-nwz7k             0/1    Running  1545      6d
ovs-s5rcz             1/1    Running  0         5d
ovs-trvqd             1/1    Running  0         5d
ovs-v2png             0/1    Running  1544      7d
sdn-2ddbk             2/2    Running  0         7d
sdn-72w2m             1/2    Running  1289      7d
sdn-c2d6h             1/2    Running  1289      7d
sdn-controller-kw9w2  1/1    Running  0         7d
sdn-controller-qj6hd  1/1    Running  0         7d
sdn-controller-rzt4f  1/1    Running  0         7d
sdn-g9phv             1/2    Running  1288      7d
sdn-gsptx             1/2    Running  1289      7d
sdn-h2m76             2/2    Running  0         7d
sdn-m5pht             1/2    Running  1288      7d
sdn-r7bsz             2/2    Running  0         7d
sdn-sg5ml             1/2    Running  1288      7d
sdn-vbj7l             1/2    Running  1288      7d
sdn-wsmvs             1/2    Running  1289      7d
~~~

SDN pods failure message:
"""
2021-03-03T10:08:36.955546539-07:00 I0303 17:08:36.955476   14072 healthcheck.go:42] waiting for OVS to start: dial unix /var/run/openvswitch/db.sock: connect: no such file or directory
2021-03-03T10:08:36.955546539-07:00 F0303 17:08:36.955499   14072 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition
"""

OVS pods error:

"""
id: openvswitch: no such user
"""

This issue seems to be reported in ticket [0], which was linked to a systemd bug ([1]), I have asked the CU to restart one of the failing nodes and see if that solves the issue.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1887040
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1888017

Comment 7 jamo luhrsen 2021-03-23 17:57:32 UTC


*** This bug has been marked as a duplicate of bug 1907353 ***

Note You need to log in before you can comment on or make changes to this bug.