Bug 1814360

Summary: Test flake: [sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: kube-schedulerAssignee: Clayton Coleman <ccoleman>
Status: CLOSED DEFERRED QA Contact: RamaKasturi <knarra>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: aos-bugs, ccoleman, dmace, jhou, knarra, maszulik, mfojtik, ssoto, wking, yinzhou
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1806594
: 1814363 (view as bug list) Environment:
Last Closed: 2020-05-04 10:12:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1806594    
Bug Blocks: 1814363    

Description Clayton Coleman 2020-03-17 17:29:29 UTC
+++ This bug was initially created as a clone of Bug #1806594 +++

Description of problem:

Top flake in the 4.3 blocking job grid:

[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.3-blocking#release-openshift-origin-installer-e2e-gcp-4.3

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/1434
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/1452

Not observed recently in 4.1, 4.2, or 4.4 branches. Hard to tell if that's significant.

This is not to be confused with https://bugzilla.redhat.com/show_bug.cgi?id=1760193, which is a similar test but for replication controllers.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Eric Paris on 2020-03-03 14:26:03 EST ---

This bug sets Target Release equal to a z-stream but has no bug in the 'Depends On' field. As such this is not a valid bug state and the target release is being unset.

Any bug targeting 4.1.z must have a bug targeting 4.2 in 'Depends On.'
Similarly, any bug targeting 4.2.z must have a bug with Target Release of 4.3 in 'Depends On.'

--- Additional comment from zhou ying on 2020-03-06 05:51:04 EST ---

Hi Maciej:

Since we want to defer this to 4.5 , do we need to clone it to 4.4 and 4.3 ?

--- Additional comment from Maciej Szulik on 2020-03-06 06:25:40 EST ---

(In reply to zhou ying from comment #2)
> Hi Maciej:
> 
> Since we want to defer this to 4.5 , do we need to clone it to 4.4 and 4.3 ?

Not until we figure out the root-cause and fix for it. Only then we will consider backports.

--- Additional comment from Sebastian Soto on 2020-03-16 17:13:28 EDT ---

This issue caused a 4.5 ci build failure today https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-serial-4.5/645

--- Additional comment from Clayton Coleman on 2020-03-17 13:27:50 EDT ---

This fails 1/4 runs on AWS because AWS only does two zones.

Workaround in 4.5 (should be backported to at least 4.3 to clean up flakes) https://github.com/openshift/origin/pull/24709

The upstream issue https://github.com/kubernetes/kubernetes/issues/89178 is that the test is wrong (assumes all zones of all nodes are schedulable) but requires significant rework of the upstream tests, so that won't be available anytime soon.  The tests need to take as input an argument of nodes that can be considered for the test.

--- Additional comment from Clayton Coleman on 2020-03-17 13:28:45 EDT ---

Comment 1 Clayton Coleman 2020-03-17 17:32:20 UTC
*** Bug 1814359 has been marked as a duplicate of this bug. ***

Comment 3 Maciej Szulik 2020-05-04 10:12:47 UTC
I'm going to close this for now, we don't have a clear resolution in master yet, once we have we'll re-consider backports.