1826463 – TestContainerRuntimeConfigPidsLimit test frequently failing

Bug 1826463 - TestContainerRuntimeConfigPidsLimit test frequently failing

Summary: TestContainerRuntimeConfigPidsLimit test frequently failing

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Peter Hunt
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-21 18:09 UTC by Kirsten Garrison
Modified:	2020-04-29 12:42 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-29 12:42:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Kirsten Garrison 2020-04-21 18:09:30 UTC

Description of problem:
As of last evening, TestContainerRuntimeConfigPidsLimit starting failing across PRs in the MCO repo

Version-Release number of selected component (if applicable):
4.5

How reproducible:
look at runs

I0421 14:16:43.378036       1 status.go:84] Pool node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2: All nodes are updated with rendered-node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2-634eaf836ce89aff6462d23da0346981
E0421 14:23:12.707088       1 render_controller.go:253] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig mc-pids-limit-00fa9209-0569-4914-903b-ecad037596d2 with labels: map[machineconfiguration.openshift.io/role:node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2]
E0421 14:23:12.761894       1 render_controller.go:253] error finding pools for machineconfig: no MachineConfigPool found for MachineConfig rendered-node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2-37c6d1d18b03faa3c537d237a672dba2 because it has no labels
E0421 14:23:12.764990       1 render_controller.go:253] error finding pools for machineconfig: no MachineConfigPool found for MachineConfig rendered-node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2-634eaf836ce89aff6462d23da0346981 because it has no labels
I0421 14:23:18.130009       1 render_controller.go:497] Generated machineconfig rendered-worker-9d2efc5bce4e84a0546bd8a44f8f9965 from 6 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-3f894b33-60fa-4a57-81c3-704cdfedfb7b-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  add-a-file-874e00b5-e0b9-4dd9-bd28-0f27ec2f9708  machineconfiguration.openshift.io/v1  }]
I0421 14:23:18.143079       1 render_controller.go:516] Pool worker: now targeting: rendered-worker-9d2efc5bce4e84a0546bd8a44f8f9965
I0421 14:43:24.161981       1 render_controller.go:497] Generated machineconfig rendered-worker-675914441b553b8d9dbb9547260b51b6 from 7 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-3f894b33-60fa-4a57-81c3-704cdfedfb7b-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  add-a-file-874e00b5-e0b9-4dd9-bd28-0f27ec2f9708  machineconfiguration.openshift.io/v1  } {MachineConfig  sshkeys-worker-8c1a5281-4d92-43c6-bd62-f7c5b77d66b7  machineconfiguration.openshift.io/v1  }]
I0421 14:43:24.173947       1 render_controller.go:516] Pool worker: now targeting: rendered-worker-675914441b553b8d9dbb9547260b51b6
E0421 14:56:20.034287       1 render_controller.go:216] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2-7eeb753b-3a6a-4fb5-aee8-a5089c26fdbe-containerruntime with labels: map[machineconfiguration.openshift.io/role:node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2]
I0421 15:03:30.195606       1 render_controller.go:497] Generated machineconfig rendered-worker-dda5263d5acf90f62166be65dbf62dbd from 8 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-3f894b33-60fa-4a57-81c3-704cdfedfb7b-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  add-a-file-874e00b5-e0b9-4dd9-bd28-0f27ec2f9708  machineconfiguration.openshift.io/v1  } {MachineConfig  kargs-8a131c63-fa29-4ca4-a9e8-6a383da4ed17  machineconfiguration.openshift.io/v1  } {MachineConfig  sshkeys-worker-8c1a5281-4d92-43c6-bd62-f7c5b77d66b7  machineconfiguration.openshift.io/v1  }]
I0421 15:03:30.209304       1 render_controller.go:516] Pool worker: now targeting: rendered-worker-dda5263d5acf90f62166be65dbf62dbd
I0421 15:23:36.222171       1 render_controller.go:497] Generated machineconfig rendered-worker-84e8ff69f53b20985f5531045c857815 from 9 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-3f894b33-60fa-4a57-81c3-704cdfedfb7b-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  add-a-file-874e00b5-e0b9-4dd9-bd28-0f27ec2f9708  machineconfiguration.openshift.io/v1  } {MachineConfig  kargs-8a131c63-fa29-4ca4-a9e8-6a383da4ed17  machineconfiguration.openshift.io/v1  } {MachineConfig  kerneltype-71a0d6a1-0657-4006-aecd-2dbeff6665f7  machineconfiguration.openshift.io/v1  } {MachineConfig  sshkeys-worker-8c1a5281-4d92-43c6-bd62-f7c5b77d66b7  machineconfiguration.openshift.io/v1  }]
I0421 15:23:36.235619       1 render_controller.go:516] Pool worker: now targeting: rendered-worker-84e8ff69f53b20985f5531045c857815
E0421 15:36:01.672378       1 render_controller.go:216] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2-7eeb753b-3a6a-4fb5-aee8-a5089c26fdbe-containerruntime with labels: map[machineconfiguration.openshift.io/role:node-pids-limit-00fa9209-0569-4914-903b-ecad037596d2]

Example runs: 
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1649/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1929

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1659/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1930

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1474/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1927

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1474/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1925

Comment 1 Kirsten Garrison 2020-04-21 18:12:01 UTC

Topmost failure seen on CI run homepage:

 --- FAIL: TestContainerRuntimeConfigPidsLimit (2609.90s)
    utils_test.go:60: Pool node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 has rendered config mc-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 with rendered-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-8af41a11d07a4a871ca61c3410da5427 (waited 6.00957333s)
    utils_test.go:82: Pool node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 has completed rendered-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-8af41a11d07a4a871ca61c3410da5427 (waited 2m14.007560508s)
    utils_test.go:60: Pool node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 has rendered config 99-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-d7a28ec4-491b-4759-b841-0c256b760067-containerruntime with rendered-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-2cf2fc59855924e9fd231eb4cef9fc18 (waited 2.015521109s)
    utils_test.go:82: Pool node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 has completed rendered-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-2cf2fc59855924e9fd231eb4cef9fc18 (waited 54.006398508s)
    ctrcfg_test.go:99: Deleted ContainerRuntimeConfig ctrcfg-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600
    utils_test.go:60: Pool node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 has rendered config mc-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 with rendered-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-8af41a11d07a4a871ca61c3410da5427 (waited 7.247007ms)
    utils_test.go:36: 
        	Error Trace:	utils_test.go:36
        	            				ctrcfg_test.go:105
        	            				ctrcfg_test.go:21
        	Error:      	Expected nil, but got: pool node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600 didn't report rendered-node-pids-limit-55df0de7-fbf2-4d45-b46b-d3a00cf3d600-8af41a11d07a4a871ca61c3410da5427 to updated (waited 20m0.009915796s): timed out waiting for the condition

Comment 2 Kirsten Garrison 2020-04-21 18:52:06 UTC

In an old passing run:

utils_test.go:60: Pool node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 has rendered config mc-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 with rendered-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-c8fdd7eabb731fced9b7a47d657ae6c1 (waited 6.011953781s)
    utils_test.go:82: Pool node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 has completed rendered-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-c8fdd7eabb731fced9b7a47d657ae6c1 (waited 1m40.006386104s)
    utils_test.go:60: Pool node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 has rendered config 99-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-96d2d0c8-3828-4577-85b6-dd9c8ee26c59-containerruntime with rendered-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-f79292b27668b87d0a12ebed841d9ccb (waited 2.020746925s)
    utils_test.go:82: Pool node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 has completed rendered-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-f79292b27668b87d0a12ebed841d9ccb (waited 54.012257612s)
    ctrcfg_test.go:99: Deleted ContainerRuntimeConfig ctrcfg-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4
    utils_test.go:60: Pool node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 has rendered config mc-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 with rendered-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-c8fdd7eabb731fced9b7a47d657ae6c1 (waited 6.773856ms)
    utils_test.go:82: Pool node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4 has completed rendered-node-pids-limit-38a610f4-78bd-4813-8d1c-a0d636f44af4-c8fdd7eabb731fced9b7a47d657ae6c1 (waited 50.010949018s)
    utils_test.go:82: Pool worker has completed rendered-worker-c8fdd7eabb731fced9b7a47d657ae6c1 (waited 50.011878179s)

we have this mc-pids-limit and also 99-pids-limit that are both making new rendered-node-pids-limit configs... is this correct?

Comment 3 Kirsten Garrison 2020-04-21 18:53:39 UTC

To clarify the above:

ctrcfg-pids-limit-xxx -> mc-pids-limit/99-pids-limit -> rendered-node-pids-limit-xxx

I dont think we need both?? or why do we have two?

Comment 4 Kirsten Garrison 2020-04-21 20:28:17 UTC

In the failed runs we seem to lose a worker node:

                },
                "degradedMachineCount": 0,
                "machineCount": 3,
                "observedGeneration": 6,
                "readyMachineCount": 0,
                "unavailableMachineCount": 1,
                "updatedMachineCount": 0
            }

Am going through the worker journals now but am seeing quite a few sdn/ovs/nto failures during the drain/reboot to apply the rendered config.. For ex:
```
Apr 21 10:10:43.587097 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: I0421 10:10:43.587025    1593 status_manager.go:570] Patch status for pod "sdn-dd2qz_openshift-sdn(2b78e6b7-0035-4aa5-9bc9-82659e308899)" with "{\"metadata\":{\"uid\":\"2b78e6b7-0035-4aa5-9bc9-82659e308899\"},\"status\":{\"containerStatuses\":[{\"containerID\":\"cri-o://c3018fedd27ecf4c92d196c6fe2d5ec947e51b99a53546f65e51e372ff95afec\",\"image\":\"registry.svc.ci.openshift.org/ci-op-x3q91yfw/stable@sha256:7c23c0e8ecd2689f09c4c0d651a4c1d2dbf9275a0ef42bfd80d9034f5d5e3668\",\"imageID\":\"registry.svc.ci.openshift.org/ci-op-x3q91yfw/stable@sha256:7c23c0e8ecd2689f09c4c0d651a4c1d2dbf9275a0ef42bfd80d9034f5d5e3668\",\"lastState\":{\"terminated\":{\"containerID\":\"cri-o://e4052528899875c1212dc497d04cec2f958cc3c05f43e6152827b4f45147c4b9\",\"exitCode\":255,\"finishedAt\":\"2020-04-21T10:10:28Z\",\"message\":\"I0421 10:09:28.039335    2015 node.go:148] Initializing SDN node \\\"ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal\\\" (10.0.32.4) of type \\\"redhat/openshift-ovs-networkpolicy\\\"\\nI0421 10:09:28.090242    2015 cmd.go:159] Starting node networking (unknown)\\nF0421 10:10:28.266902    2015 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition\\n\",\"reason\":\"Error\",\"startedAt\":\"2020-04-21T10:09:27Z\"}},\"name\":\"sdn\",\"ready\":false,\"restartCount\":3,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-04-21T10:10:43Z\"}}}]}}"
Apr 21 10:10:43.587259 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: I0421 10:10:43.587078    1593 status_manager.go:578] Status for pod "sdn-dd2qz_openshift-sdn(2b78e6b7-0035-4aa5-9bc9-82659e308899)" updated successfully: (5, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-04-21 09:52:43 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-04-21 10:09:26 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [sdn]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-04-21 10:09:26 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [sdn]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-04-21 09:52:40 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.32.4 PodIP:10.0.32.4 PodIPs:[{IP:10.0.32.4}] StartTime:2020-04-21 09:52:43 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:sdn State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-04-21 10:10:43 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0421 10:09:28.039335    2015 node.go:148] Initializing SDN node "ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal" (10.0.32.4) of type "redhat/openshift-ovs-networkpolicy"
Apr 21 10:10:43.587259 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: I0421 10:09:28.090242    2015 cmd.go:159] Starting node networking (unknown)
Apr 21 10:10:43.587259 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: F0421 10:10:28.266902    2015 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition
Apr 21 10:10:43.587259 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: ,StartedAt:2020-04-21 10:09:27 +0000 UTC,FinishedAt:2020-04-21 10:10:28 +0000 
```

Comment 5 Peter Hunt 2020-04-21 20:36:19 UTC

Kirsten,

do you have access to the rendered crio.conf on nodes that upgraded, and can you post them here?

I don't have time left today to launch a cluster and check. If you can't get it, I'll be able to get it tomorrow

Comment 6 Kirsten Garrison 2020-04-21 21:58:00 UTC

These are ci runs so everything would be in the artifacts folders of each run (if it exists).

Comment 7 Kirsten Garrison 2020-04-21 23:44:30 UTC

Another thing that I see is are some sdn errors:
```
Apr 21 10:23:13.233143 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: I0421 10:20:05.480540    9367 cmd.go:159] Starting node networking (unknown)
Apr 21 10:23:13.233143 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: F0421 10:21:05.563116    9367 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition
Apr 21 10:23:13.233143 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: ,StartedAt:2020-04-21 10:20:05 +0000 UTC,FinishedAt:2020-04-21 10:21:05 +0000 
```

```
Apr 21 10:27:07.413522 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: E0421 10:27:07.413056    1593 pod_workers.go:191] Error syncing pod 2b78e6b7-0035-4aa5-9bc9-82659e308899 ("sdn-dd2qz_openshift-sdn(2b78e6b7-0035-4aa5-9bc9-82659e308899)"), skipping: failed to "StartContainer" for "sdn" with CrashLoopBackOff: "back-off 5m0s restarting failed container=sdn pod=sdn-dd2qz_openshift-sdn(2b78e6b7-0035-4aa5-9bc9-82659e308899)"
Apr 21 10:27:07.413522 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: I0421 10:27:07.413086    1593 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-sdn", Name:"sdn-dd2qz", UID:"2b78e6b7-0035-4aa5-9bc9-82659e308899", APIVersion:"v1", ResourceVersion:"29701", FieldPath:"spec.containers{sdn}"}): type: 'Warning' reason: 'BackOff' Back-off restarting failed container
Apr 21 10:27:07.433004 ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal hyperkube[1593]: I0421 10:27:07.432896    1593 status_manager.go:570] Patch status for pod "sdn-dd2qz_openshift-sdn(2b78e6b7-0035-4aa5-9bc9-82659e308899)" with "{\"metadata\":{\"uid\":\"2b78e6b7-0035-4aa5-9bc9-82659e308899\"},\"status\":{\"containerStatuses\":[{\"containerID\":\"cri-o://1bd5e1cfae85762e44ff687816bb989e6494dbc3e64a92889966910594b275fa\",\"image\":\"registry.svc.ci.openshift.org/ci-op-x3q91yfw/stable@sha256:7c23c0e8ecd2689f09c4c0d651a4c1d2dbf9275a0ef42bfd80d9034f5d5e3668\",\"imageID\":\"registry.svc.ci.openshift.org/ci-op-x3q91yfw/stable@sha256:7c23c0e8ecd2689f09c4c0d651a4c1d2dbf9275a0ef42bfd80d9034f5d5e3668\",\"lastState\":{\"terminated\":{\"containerID\":\"cri-o://446e86e16b7a073a902a1ae494e7aa579ddb49d06cda9f2a96682acb2ee284fd\",\"exitCode\":255,\"finishedAt\":\"2020-04-21T10:21:05Z\",\"message\":\"I0421 10:20:05.474357    9367 node.go:148] Initializing SDN node \\\"ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal\\\" (10.0.32.4) of type \\\"redhat/openshift-ovs-networkpolicy\\\"\\nI0421 10:20:05.480540    9367 cmd.go:159] Starting node networking (unknown)\\nF0421 10:21:05.563116    9367 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition\\n\",\"reason\":\"Error\",\"startedAt\":\"2020-04-21T10:20:05Z\"}},\"name\":\"sdn\",\"ready\":false,\"restartCount\":8,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://1bd5e1cfae85762e44ff687816bb989e6494dbc3e64a92889966910594b275fa\",\"exitCode\":255,\"finishedAt\":\"2020-04-21T10:27:06Z\",\"message\":\"I0421 10:26:06.469058   12442 node.go:148] Initializing SDN node \\\"ci-op-wg85w-w-d-sbn7r.c.openshift-gce-devel-ci.internal\\\" (10.0.32.4) of type \\\"redhat/openshift-ovs-networkpolicy\\\"\\nI0421 10:26:06.476353   12442 cmd.go:159] Starting node networking (unknown)\\nF0421 10:27:06.558969   12442 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition\\n\",\"reason\":\"Error\",\"startedAt\":\"2020-04-21T10:26:06Z\"}}}]}}"
```

Comment 8 Kirsten Garrison 2020-04-21 23:45:44 UTC

So the runs all seem to have:
I0421 19:14:31.085189       1 leaderelection.go:242] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
E0421 19:19:36.702218       1 leaderelection.go:331] error retrieving resource lock openshift-sdn/openshift-network-controller: etcdserver: request timed out
E0421 19:47:19.276963       1 leaderelection.go:331] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.ci-op-j76fb6m0-1354f.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.0.2:6443: i/o timeout


https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1649/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1929/artifacts/e2e-gcp-op/pods/openshift-sdn_sdn-controller-462j8_sdn-controller.log

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1659/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1930/artifacts/e2e-gcp-op/pods/openshift-sdn_sdn-controller-jtj54_sdn-controller.log

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1474/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1927/artifacts/e2e-gcp-op/pods/openshift-sdn_sdn-controller-n57nb_sdn-controller.log

Comment 9 Kirsten Garrison 2020-04-21 23:57:16 UTC

Spoke with Ryan and he thinks this might actually be : https://bugzilla.redhat.com/show_bug.cgi?id=1802534

I'm going to keep this open until we confirm and use it to track....

Comment 10 Antonio Murdaca 2020-04-28 21:24:14 UTC

This hasn't failed this week so whatever was causing it doesn't appear to be the case anymore. (and if I searched the CI wrong, please reopen)

Comment 11 Antonio Murdaca 2020-04-28 21:24:43 UTC

(In reply to Antonio Murdaca from comment #10)
> This hasn't failed this week so whatever was causing it doesn't appear to be
> the case anymore. (and if I searched the CI wrong, please reopen)

also that test has been since dropped for another who's not failing but we'll get it back hopefully.

Note You need to log in before you can comment on or make changes to this bug.