Bug 1738691

Summary: GCP: e2e test failing: [sig-scheduling] Multi-AZ Cluster Volumes [sig-storage] should only be allowed to provision PDs in zones where nodes exist [Suite:openshift/conformance/parallel] [Suite:k8s]
Product: OpenShift Container Platform Reporter: Cesar Wong <cewong>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Status: CLOSED ERRATA QA Contact: Liang Xia <lxia>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, aos-storage-staff, fbertina
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:35:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Cesar Wong 2019-08-07 21:28:05 UTC
Description of problem:

The following test fails when running e2e conformance on a GCP cluster:
[sig-scheduling] Multi-AZ Cluster Volumes [sig-storage] should only be allowed to provision PDs in zones where nodes exist [Suite:openshift/conformance/parallel] [Suite:k8s]

Version-Release number of selected component (if applicable):
4.2

How reproducible:
Always

Steps to Reproduce:
1. Run e2e conformance against a GCP cluster

Actual results:
Test fails

Expected results:
Test passes

This test assumes that a PVC will be bound after creation. In 4.2, the PVC is waiting for a consumer so it can be bound, therefore the test never succeeds.
https://github.com/openshift/origin/blob/c1e9f01c0d94615ad729e786ed9eb6e063834cd6/vendor/k8s.io/kubernetes/test/e2e/scheduling/ubernetes_lite_volumes.go#L153

Test Log:

Aug  6 19:10:57.317: INFO: Fetching cloud provider for "gce"
I0806 19:10:57.317515    9701 gce.go:877] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.emptyCtx)(0xc0000ee018), conf:(*jwt.Config)(0xc0037ac800)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
I0806 19:10:57.383320    9701 gce.go:877] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.emptyCtx)(0xc0000ee018), conf:(*jwt.Config)(0xc0037ad380)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
I0806 19:10:57.405844    9701 gce.go:877] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.emptyCtx)(0xc0000ee018), conf:(*jwt.Config)(0xc0006af480)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
W0806 19:10:57.431006    9701 gce.go:475] No network name or URL specified.
Aug  6 19:10:58.053: INFO: lookupDiskImageSources: gcloud error with [[]string{"instance-groups", "list-instances", "", "--format=get(instance)"}]; err:exit status 1
Aug  6 19:10:58.053: INFO:  > ERROR: (gcloud.compute.instance-groups.list-instances) could not parse resource []
Aug  6 19:10:58.053: INFO:  > 
Aug  6 19:10:58.053: INFO: Cluster image sources lookup failed: exit status 1

Aug  6 19:10:58.053: INFO: >>> kubeConfig: /tmp/cluster/admin.kubeconfig
Aug  6 19:10:58.055: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Aug  6 19:10:58.085: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Aug  6 19:10:58.114: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Aug  6 19:10:58.114: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Aug  6 19:10:58.114: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Aug  6 19:10:58.126: INFO: e2e test version: v1.14.0+8e63b6d
Aug  6 19:10:58.130: INFO: kube-apiserver version: v1.14.0+8e63b6d
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:73
[BeforeEach] [sig-scheduling] Multi-AZ Cluster Volumes [sig-storage]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:149
STEP: Creating a kubernetes client
Aug  6 19:10:58.134: INFO: >>> kubeConfig: /tmp/cluster/admin.kubeconfig
STEP: Building a namespace api object, basename multi-az
Aug  6 19:10:58.220: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Aug  6 19:10:58.359: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-scheduling] Multi-AZ Cluster Volumes [sig-storage]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/ubernetes_lite_volumes.go:40
STEP: Checking for multi-zone cluster.  Zone count = 3
[It] should only be allowed to provision PDs in zones where nodes exist [Suite:openshift/conformance/parallel] [Suite:k8s]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/ubernetes_lite_volumes.go:55
Aug  6 19:10:58.550: INFO: Expected zones: map[us-east1-c:{}]
STEP: starting a compute instance in unused zone: us-east1-b

STEP: Creating zoneCount+1 PVCs and making sure PDs are only provisioned in zones with nodes
Aug  6 19:11:14.795: INFO: Waiting up to 5m0s for PersistentVolumeClaims [pvc-1] to have phase Bound
Aug  6 19:11:14.799: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound.
[repeats many times...]
Aug  6 19:16:13.767: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound.
Aug  6 19:16:15.768: INFO: deleting claim "e2e-multi-az-4442"/"pvc-4"
Aug  6 19:16:15.777: INFO: deleting claim "e2e-multi-az-4442"/"pvc-3"
Aug  6 19:16:15.791: INFO: deleting claim "e2e-multi-az-4442"/"pvc-2"
Aug  6 19:16:15.806: INFO: deleting claim "e2e-multi-az-4442"/"pvc-1"
Aug  6 19:16:15.813: INFO: Deleting compute resource: compute-ec549e39-b87d-11e9-9b9c-0a58ac1053a9
[AfterEach] [sig-scheduling] Multi-AZ Cluster Volumes [sig-storage]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:150
STEP: Collecting events from namespace "e2e-multi-az-4442".
STEP: Found 4 events.
Aug  6 19:17:24.480: INFO: At 2019-08-06 19:11:14 +0000 UTC - event for pvc-1: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
Aug  6 19:17:24.480: INFO: At 2019-08-06 19:11:14 +0000 UTC - event for pvc-2: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
Aug  6 19:17:24.480: INFO: At 2019-08-06 19:11:14 +0000 UTC - event for pvc-3: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
Aug  6 19:17:24.480: INFO: At 2019-08-06 19:11:14 +0000 UTC - event for pvc-4: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
Aug  6 19:17:24.484: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Aug  6 19:17:24.484: INFO: 
Aug  6 19:17:24.490: INFO: skipping dumping cluster info - cluster too large
Aug  6 19:17:24.490: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-multi-az-4442" for this suite.
Aug  6 19:17:32.541: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Aug  6 19:17:34.542: INFO: namespace e2e-multi-az-4442 deletion completed in 10.043619716s
Aug  6 19:17:34.547: INFO: Running AfterSuite actions on all nodes
Aug  6 19:17:34.547: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/scheduling/ubernetes_lite_volumes.go:154]: Unexpected error:
    <*errors.errorString | 0xc001a37c60>: {
        s: "PersistentVolumeClaims [pvc-1] not all in phase Bound within 5m0s",
    }
    PersistentVolumeClaims [pvc-1] not all in phase Bound within 5m0s
occurred

Aug 06 19:12:04.754 W ns/openshift-insights pod/insights-operator-8c5df947f-9hq62 node/cewong-tdbqf-m-0.c.openshift-dev-installer.internal graceful deletion within 30s
Aug 06 19:12:04.783 I ns/openshift-insights deployment/insights-operator Scaled down replica set insights-operator-8c5df947f to 0 (4 times)
Aug 06 19:12:04.787 I ns/openshift-insights replicaset/insights-operator-8c5df947f Deleted pod: insights-operator-8c5df947f-9hq62
Aug 06 19:12:04.787 I ns/openshift-insights pod/insights-operator-8c5df947f-9hq62 Stopping container operator
Aug 06 19:12:05.390 E ns/openshift-insights pod/insights-operator-8c5df947f-9hq62 node/cewong-tdbqf-m-0.c.openshift-dev-installer.internal container=operator container exited with code 2 (Error): ding config/clusteroperator/operator-lifecycle-manager with fingerprint=\nI0806 19:09:11.915061       1 diskrecorder.go:63] Recording config/clusteroperator/operator-lifecycle-manager-catalog with fingerprint=\nI0806 19:09:11.915077       1 diskrecorder.go:63] Recording config/clusteroperator/operator-lifecycle-manager-packageserver with fingerprint=\nI0806 19:09:11.915095       1 diskrecorder.go:63] Recording config/clusteroperator/service-ca with fingerprint=\nI0806 19:09:11.915112       1 diskrecorder.go:63] Recording config/clusteroperator/service-catalog-apiserver with fingerprint=\nI0806 19:09:11.915128       1 diskrecorder.go:63] Recording config/clusteroperator/service-catalog-controller-manager with fingerprint=\nI0806 19:09:11.915144       1 diskrecorder.go:63] Recording config/clusteroperator/storage with fingerprint=\nI0806 19:09:11.919463       1 diskrecorder.go:63] Recording config/version with fingerprint=\nI0806 19:09:11.919695       1 diskrecorder.go:63] Recording config/id with fingerprint=\nI0806 19:09:11.943720       1 diskrecorder.go:63] Recording config/infrastructure with fingerprint=\nI0806 19:09:11.963660       1 diskrecorder.go:63] Recording config/network with fingerprint=\nI0806 19:09:11.984436       1 diskrecorder.go:63] Recording config/authentication with fingerprint=\nI0806 19:09:12.006254       1 diskrecorder.go:63] Recording config/featuregate with fingerprint=\nI0806 19:09:12.029434       1 diskrecorder.go:63] Recording config/oauth with fingerprint=\nI0806 19:09:12.052077       1 diskrecorder.go:63] Recording config/ingress with fingerprint=\nI0806 19:09:12.052222       1 periodic.go:126] Periodic gather config completed in 150ms\nI0806 19:09:12.052245       1 controllerstatus.go:34] name=periodic-config healthy=true reason= message=\nI0806 19:11:11.913854       1 status.go:263] The operator is healthy\nI0806 19:11:11.913930       1 status.go:338] No status update necessary, objects are identical\nI0806 19:11:50.835809       1 insightsuploader.go:117] Nothing to report since 0001-01-01T00:00:00Z\n
Aug 06 19:12:12.128 I ns/openshift-insights deployment/insights-operator Scaled up replica set insights-operator-8c5df947f to 1 (5 times)
Aug 06 19:12:12.135 I ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 node/ created
Aug 06 19:12:12.169 I ns/openshift-insights replicaset/insights-operator-8c5df947f Created pod: insights-operator-8c5df947f-cgdm5
Aug 06 19:12:12.169 I ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 Successfully assigned openshift-insights/insights-operator-8c5df947f-cgdm5 to cewong-tdbqf-m-1.c.openshift-dev-installer.internal
Aug 06 19:12:12.836 W ns/openshift-insights pod/insights-operator-8c5df947f-9hq62 node/cewong-tdbqf-m-0.c.openshift-dev-installer.internal deleted
Aug 06 19:12:14.477 I ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 Container image "registry.svc.ci.openshift.org/origin/4.2-2019-08-06-182258@sha256:effd694f2075fc1166158b56c0bddd281961a8dd39e0e548c7e54fad32706bd1" already present on machine
Aug 06 19:12:14.771 I ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 Created container operator
Aug 06 19:12:14.805 I ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 Started container operator
Aug 06 19:14:40.951 W clusteroperator/monitoring changed Upgradeable to False: Rollout of the monitoring stack is in progress. Please wait until it finishes.
Aug 06 19:15:06.427 W clusteroperator/monitoring changed Upgradeable to True
Aug 06 19:15:06.471 W clusteroperator/monitoring changed Upgradeable to False: Rollout of the monitoring stack is in progress. Please wait until it finishes.
Aug 06 19:15:14.378 W ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 node/cewong-tdbqf-m-1.c.openshift-dev-installer.internal graceful deletion within 30s
Aug 06 19:15:14.378 I ns/openshift-insights deployment/insights-operator Scaled down replica set insights-operator-8c5df947f to 0 (5 times)
Aug 06 19:15:14.399 I ns/openshift-insights replicaset/insights-operator-8c5df947f Deleted pod: insights-operator-8c5df947f-cgdm5
Aug 06 19:15:14.404 I ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 Stopping container operator
Aug 06 19:15:15.519 E ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 node/cewong-tdbqf-m-1.c.openshift-dev-installer.internal container=operator container exited with code 2 (Error): roperator/openshift-samples with fingerprint=\nI0806 19:12:18.640638       1 diskrecorder.go:63] Recording config/clusteroperator/operator-lifecycle-manager with fingerprint=\nI0806 19:12:18.640665       1 diskrecorder.go:63] Recording config/clusteroperator/operator-lifecycle-manager-catalog with fingerprint=\nI0806 19:12:18.640693       1 diskrecorder.go:63] Recording config/clusteroperator/operator-lifecycle-manager-packageserver with fingerprint=\nI0806 19:12:18.640724       1 diskrecorder.go:63] Recording config/clusteroperator/service-ca with fingerprint=\nI0806 19:12:18.640767       1 diskrecorder.go:63] Recording config/clusteroperator/service-catalog-apiserver with fingerprint=\nI0806 19:12:18.640797       1 diskrecorder.go:63] Recording config/clusteroperator/service-catalog-controller-manager with fingerprint=\nI0806 19:12:18.640826       1 diskrecorder.go:63] Recording config/clusteroperator/storage with fingerprint=\nI0806 19:12:18.649213       1 diskrecorder.go:63] Recording config/version with fingerprint=\nI0806 19:12:18.650361       1 diskrecorder.go:63] Recording config/id with fingerprint=\nI0806 19:12:18.659407       1 diskrecorder.go:63] Recording config/infrastructure with fingerprint=\nI0806 19:12:18.672270       1 diskrecorder.go:63] Recording config/network with fingerprint=\nI0806 19:12:18.679180       1 diskrecorder.go:63] Recording config/authentication with fingerprint=\nI0806 19:12:18.689542       1 diskrecorder.go:63] Recording config/featuregate with fingerprint=\nI0806 19:12:18.696269       1 diskrecorder.go:63] Recording config/oauth with fingerprint=\nI0806 19:12:18.703443       1 diskrecorder.go:63] Recording config/ingress with fingerprint=\nI0806 19:12:18.703611       1 periodic.go:126] Periodic gather config completed in 87ms\nI0806 19:12:18.703637       1 controllerstatus.go:34] name=periodic-config healthy=true reason= message=\nI0806 19:14:18.630089       1 status.go:263] The operator is healthy\nI0806 19:14:18.630155       1 status.go:338] No status update necessary, objects are identical\n
Aug 06 19:15:16.510 W ns/openshift-insights pod/insights-operator-8c5df947f-cgdm5 node/cewong-tdbqf-m-1.c.openshift-dev-installer.internal deleted
Aug 06 19:15:16.548 I ns/openshift-insights deployment/insights-operator Scaled up replica set insights-operator-cdc85dd6 to 1 (4 times)
Aug 06 19:15:16.560 I ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 node/ created
Aug 06 19:15:16.579 I ns/openshift-insights replicaset/insights-operator-cdc85dd6 Created pod: insights-operator-cdc85dd6-sstp7
Aug 06 19:15:16.587 I ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 Successfully assigned openshift-insights/insights-operator-cdc85dd6-sstp7 to cewong-tdbqf-m-2.c.openshift-dev-installer.internal
Aug 06 19:15:18.451 I ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 Container image "registry.svc.ci.openshift.org/origin/4.2-2019-08-06-182258@sha256:ab4ac966521b45029e7b9a2d64ee7ae8f2151103a824d17ff138e1ea02a08cdb" already present on machine
Aug 06 19:15:18.683 I ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 Created container operator
Aug 06 19:15:18.714 I ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 Started container operator
Aug 06 19:15:21.178 W ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 node/cewong-tdbqf-m-2.c.openshift-dev-installer.internal graceful deletion within 30s
Aug 06 19:15:21.179 I ns/openshift-insights deployment/insights-operator Scaled down replica set insights-operator-cdc85dd6 to 0 (5 times)
Aug 06 19:15:21.207 I ns/openshift-insights replicaset/insights-operator-cdc85dd6 Deleted pod: insights-operator-cdc85dd6-sstp7
Aug 06 19:15:21.215 I ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 Stopping container operator
Aug 06 19:15:22.027 E ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 node/cewong-tdbqf-m-2.c.openshift-dev-installer.internal container=operator container exited with code 2 (Error): W0806 19:15:18.738034       1 cmd.go:164] Using insecure, self-signed certificates\nI0806 19:15:18.738357       1 crypto.go:514] Generating new CA for openshift-insights-operator-signer@1565118918 cert, and key in /tmp/serving-cert-872531595/serving-signer.crt, /tmp/serving-cert-872531595/serving-signer.key\nI0806 19:15:20.039973       1 observer_polling.go:40] Starting from specified content for file "/etc/insights-operator/server.yaml"\nI0806 19:15:20.040007       1 observer_polling.go:46] Adding reactor for file "/var/run/secrets/serving-cert/tls.crt"\nI0806 19:15:20.040028       1 observer_polling.go:46] Adding reactor for file "/var/run/secrets/serving-cert/tls.key"\nI0806 19:15:20.040405       1 observer_polling.go:106] Starting file observer\nW0806 19:15:20.084696       1 builder.go:181] unable to get owner reference (falling back to namespace): replicasets.apps "insights-operator-cdc85dd6" is forbidden: User "system:serviceaccount:openshift-insights:operator" cannot get resource "replicasets" in API group "apps" in the namespace "openshift-insights"\n
Aug 06 19:15:29.690 W ns/openshift-insights pod/insights-operator-cdc85dd6-sstp7 node/cewong-tdbqf-m-2.c.openshift-dev-installer.internal deleted
Aug 06 19:15:29.748 I ns/openshift-insights pod/insights-operator-8c5df947f-c745d node/ created
Aug 06 19:15:29.765 I ns/openshift-insights deployment/insights-operator Scaled up replica set insights-operator-8c5df947f to 1 (6 times)
Aug 06 19:15:29.782 I ns/openshift-insights replicaset/insights-operator-8c5df947f Created pod: insights-operator-8c5df947f-c745d
Aug 06 19:15:29.785 I ns/openshift-insights pod/insights-operator-8c5df947f-c745d Successfully assigned openshift-insights/insights-operator-8c5df947f-c745d to cewong-tdbqf-m-1.c.openshift-dev-installer.internal
Aug 06 19:15:32.063 W clusteroperator/monitoring changed Upgradeable to True
Aug 06 19:15:32.373 I ns/openshift-insights pod/insights-operator-8c5df947f-c745d Container image "registry.svc.ci.openshift.org/origin/4.2-2019-08-06-182258@sha256:effd694f2075fc1166158b56c0bddd281961a8dd39e0e548c7e54fad32706bd1" already present on machine
Aug 06 19:15:32.740 I ns/openshift-insights pod/insights-operator-8c5df947f-c745d Created container operator
Aug 06 19:15:32.789 I ns/openshift-insights pod/insights-operator-8c5df947f-c745d Started container operator
Aug 06 19:17:08.630 I ns/openshift-vertical-pod-autoscaler verticalpodautoscalercontroller/default Updated VerticalPodAutoscalerController deployment: openshift-vertical-pod-autoscaler/vpa-recommender-default (14 times)


Additional info:

Comment 5 Liang Xia 2019-08-15 08:25:17 UTC
This is to disable some e2e test on GCP, not too much work needed to do from QE side.
Since the PR is merged. I'm moving the bug to verified.

Comment 6 errata-xmlrpc 2019-10-16 06:35:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922