Bug 1838730
Summary: | [azure-disk] azure e2e fail with failed scheduling errors | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> | |
Component: | Storage | Assignee: | Christian Huffman <chuffman> | |
Storage sub component: | Kubernetes | QA Contact: | Wei Duan <wduan> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | unspecified | CC: | aos-bugs, ffranz, jsafrane, wduan | |
Version: | 4.5 | Flags: | wduan:
needinfo-
|
|
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1861382 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 16:00:29 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1861382 |
Description
Hemant Kumar
2020-05-21 16:43:38 UTC
These tests are I think failing because volume is being provisioned in a zone where there is no worker node. This is because cluster has 3 master nodes and 2 worker nodes. I think we had similar problem in AWS for abit and it caused flakes. The internal tests Azure tests have passed with this change. I've submitted an upstream PR [1] to include this in k8s. [1] https://github.com/kubernetes/kubernetes/pull/91642 Still find some cases failed with volume node affinity conflict, need to check if there is other issue. https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.5/1284319209928527872 STEP: Found 3 events. Jul 18 03:54:08.144: INFO: At 2020-07-18 03:48:55 +0000 UTC - event for azure-diskph482: {persistentvolume-controller } ProvisioningSucceeded: Successfully provisioned volume pvc-9521d714-f364-48a1-baed-95a40db5b33f using kubernetes.io/azure-disk Jul 18 03:54:08.144: INFO: At 2020-07-18 03:48:57 +0000 UTC - event for exec-volume-test-dynamicpv-xwb5: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.5/1285226062380273664 STEP: Found 4 events. Jul 20 16:10:38.369: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for exec-volume-test-dynamicpv-7fcc: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Jul 20 16:10:38.369: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for exec-volume-test-dynamicpv-7fcc: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Jul 20 16:10:38.369: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for exec-volume-test-dynamicpv-7fcc: {default-scheduler } FailedScheduling: skip schedule deleting pod: e2e-volume-9629/exec-volume-test-dynamicpv-7fcc Jul 20 16:10:38.369: INFO: At 2020-07-20 16:05:24 +0000 UTC - event for azure-diskdm6s9: {persistentvolume-controller } ProvisioningSucceeded: Successfully provisioned volume pvc-21d39c8e-0eeb-4251-9789-dfe97a0ce40e using kubernetes.io/azure-disk Both of the links provided are for 4.5; however, this change has not been backported to 4.5 yet - it only exists in 4.6 at this time. If we don't see any failures in 4.6, then we can proceed to backport it. Is it possible to chec kand see if these failures exist in 4.6, which contains the change? Hi Huffman, it's my fault. I did not see failures exist in 4.6. Status changed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |