Bug 1814363 - Test flake: [sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
Summary: Test flake: [sig-scheduling] Multi-AZ Clusters should spread the pods of a se...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.3.z
Assignee: Clayton Coleman
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On: 1806594 1814360
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-17 17:31 UTC by Clayton Coleman
Modified: 2020-05-04 10:12 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1814360
Environment:
Last Closed: 2020-05-04 10:12:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2020-03-17 17:31:54 UTC
+++ This bug was initially created as a clone of Bug #1814360 +++

+++ This bug was initially created as a clone of Bug #1806594 +++

Description of problem:

Top flake in the 4.3 blocking job grid:

[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.3-blocking#release-openshift-origin-installer-e2e-gcp-4.3

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/1434
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/1452

Not observed recently in 4.1, 4.2, or 4.4 branches. Hard to tell if that's significant.

This is not to be confused with https://bugzilla.redhat.com/show_bug.cgi?id=1760193, which is a similar test but for replication controllers.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Eric Paris on 2020-03-03 14:26:03 EST ---

This bug sets Target Release equal to a z-stream but has no bug in the 'Depends On' field. As such this is not a valid bug state and the target release is being unset.

Any bug targeting 4.1.z must have a bug targeting 4.2 in 'Depends On.'
Similarly, any bug targeting 4.2.z must have a bug with Target Release of 4.3 in 'Depends On.'

--- Additional comment from zhou ying on 2020-03-06 05:51:04 EST ---

Hi Maciej:

Since we want to defer this to 4.5 , do we need to clone it to 4.4 and 4.3 ?

--- Additional comment from Maciej Szulik on 2020-03-06 06:25:40 EST ---

(In reply to zhou ying from comment #2)
> Hi Maciej:
> 
> Since we want to defer this to 4.5 , do we need to clone it to 4.4 and 4.3 ?

Not until we figure out the root-cause and fix for it. Only then we will consider backports.

--- Additional comment from Sebastian Soto on 2020-03-16 17:13:28 EDT ---

This issue caused a 4.5 ci build failure today https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-serial-4.5/645

--- Additional comment from Clayton Coleman on 2020-03-17 13:27:50 EDT ---

This fails 1/4 runs on AWS because AWS only does two zones.

Workaround in 4.5 (should be backported to at least 4.3 to clean up flakes) https://github.com/openshift/origin/pull/24709

The upstream issue https://github.com/kubernetes/kubernetes/issues/89178 is that the test is wrong (assumes all zones of all nodes are schedulable) but requires significant rework of the upstream tests, so that won't be available anytime soon.  The tests need to take as input an argument of nodes that can be considered for the test.

--- Additional comment from Clayton Coleman on 2020-03-17 13:28:45 EDT ---

Comment 1 Maciej Szulik 2020-05-04 10:12:57 UTC
I'm going to close this for now, we don't have a clear resolution in master yet, once we have we'll re-consider backports.


Note You need to log in before you can comment on or make changes to this bug.