1743104 – Failing test: [sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]

Bug 1743104 - Failing test: [sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]

Summary: Failing test: [sig-apps] Daemon set [Serial] should rollback without unnecess...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Tomáš Nožička
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-19 06:13 UTC by Xingxing Xia
Modified:	2019-10-16 06:36 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:36:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 23740	0	None	None	None	2019-09-06 12:10:36 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:36:36 UTC

Description Xingxing Xia 2019-08-19 06:13:49 UTC

Description of problem:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/21
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/20
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/19
... etc.

fail [k8s.io/kubernetes/test/e2e/apps/daemon_set.go:425]: Expected
    <int>: 2
to equal
    <int>: 0

Version-Release number of selected component (if applicable):
4.2 jobs

How reproducible:
Always

Comment 1 Maciej Szulik 2019-08-21 11:21:54 UTC

None of the linked jobs failed with what was mentioned, but suffered other problems from other BZ.

Comment 2 Tomáš Nožička 2019-08-22 14:31:47 UTC

It actually failed with it if you sroll there. It doesn't seems to be DeamonSet's fault. The tests detect the number of schedulable nodes when the e2e starts (on demand next kube version) but that number has been <2 when the e2e started although there are 3 worker nodes. 

That seems like e2e either doesn't wait for nodes to be schedulable or it was tainted at the time, but I don't see either in the end state so I'd assume it was transient (at the start up).

Comment 3 Maciej Szulik 2019-08-23 10:06:07 UTC

Fair, but I can't find this issue in the previous 2 weeks in https://search.svc.ci.openshift.org/ so I'm lowering the severity and moving this to 4.3

Comment 4 Tomáš Nožička 2019-08-23 12:46:25 UTC

I'd not trust search that much it seems broken - check the dates on https://prow.k8s.io/job-history/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2

I am not sure to which team this belongs to though - I guess Node (to make sure there operator is degraded until the nodes are schedulable and tests wait for it not to be degraded).

Comment 5 Seth Jennings 2019-08-28 21:42:05 UTC

This job just started running on 8/16 and this test has never passed on this platform.  Not sure how it can be a release blocker in that case.

https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-serial-4.2&sort-by-flakiness=

It isn't a flake; it is just not working at all and the test appears to be fragile in design.  Tomas, any chance we could pick the on-demand behavior back?

At the very least, the monitoring operator should not be able to become ready until the workers are ready as the monitoring stack runs on workers, not masters.

So it is strange that the e2e tests appear to start before all the workers are ready.

Regardless, there is no "Node operator".  The closest thing is the MCO.

Posting test results here for convenience:

Aug 28 11:01:53.051: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Aug 28 11:01:53.329: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Aug 28 11:01:53.331: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Aug 28 11:01:53.450: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Aug 28 11:01:53.593: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Aug 28 11:01:53.593: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Aug 28 11:01:53.593: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Aug 28 11:01:53.650: INFO: e2e test version: v1.14.0+e791baf
Aug 28 11:01:53.689: INFO: kube-apiserver version: v1.14.0+db14b69
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:73
[BeforeEach] [sig-apps] Daemon set [Serial]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:149
STEP: Creating a kubernetes client
Aug 28 11:01:53.692: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
STEP: Building a namespace api object, basename daemonsets
Aug 28 11:01:53.917: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Aug 28 11:01:54.467: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-apps] Daemon set [Serial]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/daemon_set.go:102
[It] should rollback without unnecessary restarts [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:692
Aug 28 11:01:54.768: INFO: Conformance test suite needs a cluster with at least 2 nodes.
Aug 28 11:01:54.768: INFO: Create a RollingUpdate DaemonSet
Aug 28 11:01:54.819: INFO: Check that daemon pods launch on every node of the cluster
Aug 28 11:01:54.866: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-0 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:54.866: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-1 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:54.866: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-2 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:54.910: INFO: Number of nodes with available pods: 0
Aug 28 11:01:54.910: INFO: Node ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl is running more than one daemon pod
Aug 28 11:01:55.994: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-0 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:55.994: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-1 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:55.994: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-2 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:56.038: INFO: Number of nodes with available pods: 0
Aug 28 11:01:56.038: INFO: Node ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl is running more than one daemon pod
Aug 28 11:01:57.029: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-0 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:57.029: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-1 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:57.029: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-2 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:57.072: INFO: Number of nodes with available pods: 0
Aug 28 11:01:57.072: INFO: Node ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl is running more than one daemon pod
Aug 28 11:01:58.033: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-0 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:58.033: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-1 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:58.033: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-2 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:58.076: INFO: Number of nodes with available pods: 1
Aug 28 11:01:58.076: INFO: Node ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86 is running more than one daemon pod
Aug 28 11:01:58.991: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-0 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:58.991: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-1 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:58.991: INFO: DaemonSet pods can't tolerate node ci-op-hdkv5gfr-3a8ca-8cvhp-master-2 with taints [{Key:node-role.kubernetes.io/master Value: Effect:NoSchedule TimeAdded:<nil>}], skip checking this node
Aug 28 11:01:59.047: INFO: Number of nodes with available pods: 3
Aug 28 11:01:59.047: INFO: Number of running nodes: 3, number of available pods: 3
Aug 28 11:01:59.047: INFO: Update the DaemonSet to trigger a rollout
Aug 28 11:01:59.536: INFO: Updating DaemonSet daemon-set
[AfterEach] [sig-apps] Daemon set [Serial]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/daemon_set.go:68
STEP: Deleting DaemonSet "daemon-set"
STEP: deleting DaemonSet.extensions daemon-set in namespace e2e-daemonsets-531, will wait for the garbage collector to delete the pods
Aug 28 11:02:03.877: INFO: Deleting DaemonSet.extensions daemon-set took: 59.944832ms
Aug 28 11:02:04.077: INFO: Terminating DaemonSet.extensions daemon-set pods took: 200.340876ms
Aug 28 11:02:10.119: INFO: Number of nodes with available pods: 0
Aug 28 11:02:10.119: INFO: Number of running nodes: 0, number of available pods: 0
Aug 28 11:02:10.163: INFO: daemonset: {"kind":"DaemonSetList","apiVersion":"apps/v1","metadata":{"selfLink":"/apis/apps/v1/namespaces/e2e-daemonsets-531/daemonsets","resourceVersion":"64509"},"items":null}

Aug 28 11:02:10.205: INFO: pods: {"kind":"PodList","apiVersion":"v1","metadata":{"selfLink":"/api/v1/namespaces/e2e-daemonsets-531/pods","resourceVersion":"64509"},"items":null}

[AfterEach] [sig-apps] Daemon set [Serial]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:150
STEP: Collecting events from namespace "e2e-daemonsets-531".
STEP: Found 24 events.
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:54 +0000 UTC - event for daemon-set: {daemonset-controller } SuccessfulCreate: Created pod: daemon-set-bcw5z
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:54 +0000 UTC - event for daemon-set: {daemonset-controller } SuccessfulCreate: Created pod: daemon-set-wns7g
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:54 +0000 UTC - event for daemon-set: {daemonset-controller } SuccessfulCreate: Created pod: daemon-set-mtb2b
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:54 +0000 UTC - event for daemon-set-bcw5z: {default-scheduler } Scheduled: Successfully assigned e2e-daemonsets-531/daemon-set-bcw5z to ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:54 +0000 UTC - event for daemon-set-mtb2b: {default-scheduler } Scheduled: Successfully assigned e2e-daemonsets-531/daemon-set-mtb2b to ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:54 +0000 UTC - event for daemon-set-wns7g: {default-scheduler } Scheduled: Successfully assigned e2e-daemonsets-531/daemon-set-wns7g to ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus3-gv9zd
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:56 +0000 UTC - event for daemon-set-bcw5z: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl} Started: Started container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:56 +0000 UTC - event for daemon-set-bcw5z: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl} Pulled: Container image "docker.io/library/nginx:1.14-alpine" already present on machine
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:56 +0000 UTC - event for daemon-set-bcw5z: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl} Created: Created container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:56 +0000 UTC - event for daemon-set-wns7g: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus3-gv9zd} Pulled: Container image "docker.io/library/nginx:1.14-alpine" already present on machine
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:57 +0000 UTC - event for daemon-set-mtb2b: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Pulled: Container image "docker.io/library/nginx:1.14-alpine" already present on machine
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:57 +0000 UTC - event for daemon-set-mtb2b: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Created: Created container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:57 +0000 UTC - event for daemon-set-wns7g: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus3-gv9zd} Started: Started container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:57 +0000 UTC - event for daemon-set-wns7g: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus3-gv9zd} Created: Created container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:58 +0000 UTC - event for daemon-set-mtb2b: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Started: Started container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:01:59 +0000 UTC - event for daemon-set: {daemonset-controller } SuccessfulDelete: Deleted pod: daemon-set-mtb2b
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:00 +0000 UTC - event for daemon-set-mtb2b: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Killing: Stopping container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:02 +0000 UTC - event for daemon-set: {daemonset-controller } SuccessfulCreate: Created pod: daemon-set-7jts9
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:02 +0000 UTC - event for daemon-set-7jts9: {default-scheduler } Scheduled: Successfully assigned e2e-daemonsets-531/daemon-set-7jts9 to ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:04 +0000 UTC - event for daemon-set-bcw5z: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus1-rm9bl} Killing: Stopping container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:04 +0000 UTC - event for daemon-set-wns7g: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus3-gv9zd} Killing: Stopping container app
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:05 +0000 UTC - event for daemon-set-7jts9: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Pulling: Pulling image "foo:non-existent"
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:06 +0000 UTC - event for daemon-set-7jts9: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Failed: Error: ErrImagePull
Aug 28 11:02:10.458: INFO: At 2019-08-28 11:02:06 +0000 UTC - event for daemon-set-7jts9: {kubelet ci-op-hdkv5gfr-3a8ca-8cvhp-worker-centralus2-qpf86} Failed: Failed to pull image "foo:non-existent": rpc error: code = Unknown desc = Error reading manifest non-existent in docker.io/library/foo: errors:
denied: requested access to the resource is denied
unauthorized: authentication required

Aug 28 11:02:10.500: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Aug 28 11:02:10.500: INFO: 
Aug 28 11:02:10.544: INFO: skipping dumping cluster info - cluster too large
Aug 28 11:02:10.544: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-daemonsets-531" for this suite.
Aug 28 11:02:18.724: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Aug 28 11:02:22.665: INFO: namespace e2e-daemonsets-531 deletion completed in 12.077397749s
Aug 28 11:02:22.672: INFO: Running AfterSuite actions on all nodes
Aug 28 11:02:22.673: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/apps/daemon_set.go:425]: Expected
    <int>: 2
to equal
    <int>: 0

Comment 6 Seth Jennings 2019-08-28 22:07:48 UTC

canary-openshift-ocp-installer-e2e-azure-serial-4.2 is in the release-informing category in testgrid, not the release-blocking category.  Deferring to 4.3 by category.

Comment 7 Anping Li 2019-09-04 07:48:45 UTC

Hit same issue in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/89  -> [sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]

Comment 8 Yu Qi Zhang 2019-09-05 15:10:54 UTC

This test is now run against 4.2 nightly as part of azure-serial, and is now being considered a blocking failure for 4.2 (branching for 4.3). Moving this back to 4.2 and moving to urgent. Please reach out to me or nstielau if you think this should not be the case.

Comment 9 Seth Jennings 2019-09-05 15:32:36 UTC

> This test is now run against 4.2 nightly as part of azure-serial, and is now being considered a blocking failure for 4.2

This can not be decided one week before final freeze :-/

azure-serial is also in the release-informing category in testgrid
https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-serial-4.2&sort-by-flakiness=

We will not block the release for this e2e test failure.  It is because the test is fragile, not because there is something functionally wrong.

This is also not a Node team bug. Sending back to kube-controller-manager.

> The tests detect the number of schedulable nodes when the e2e starts (on demand next kube version) but that number has been <2 when the e2e started although there are 3 worker nodes.

Tomas, can you pick back the on-demand behaviour you mentioned?

Comment 10 Clayton Coleman 2019-09-05 18:14:37 UTC

This test has always been release blocking.  We added Azure.  It fails on Azure because of "takes longer to start up".  I agree fixing the test to be more resilient is good.  The remaining question is "why is machine-config for Azure reporting ready before all worker pools created on install are schedulable?".  That decision needs to be made for all platforms (it's either a product requirement or not).  That question must be be decided in 4.2, either way, because then we can use it to enforce other rules.

Comment 11 Tomáš Nožička 2019-09-06 12:09:20 UTC

I picked the PR but it doesn't fix the issue, only limits the occurrence rate. If ginko start that e2e soon enough it detects the 0 schedulable nodes anyways.

It really looks to me that the issue at hand is starting e2e on a cluster that is not ready yet.

Comment 12 Nick Stielau 2019-09-06 20:44:33 UTC

"starting e2e on a cluster that is not ready yet" doesn't fully explain why this is different on Azure.  Can we work with RHCOS team to look at the startup time and what they delay might be?

Comment 14 Tomáš Nožička 2019-09-09 07:37:54 UTC

> "starting e2e on a cluster that is not ready yet" doesn't fully explain why this is different on Azure.  Can we work with RHCOS team to look at the startup time and what they delay might be?

Sure, can you file a new BZ for MCO team? This is already linked to the e2e test fix (+ different teams).

Comment 15 zhou ying 2019-09-11 01:57:51 UTC

confirmed with latest jobs , the issue has fixed:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/127
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/124
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/125

Comment 16 errata-xmlrpc 2019-10-16 06:36:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.