Bug 1694182

Summary: [rebase] Pod readiness gate test is failing
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: Weinan Liu <weinliu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, gblomqui, jokerman, mmccomas, sjenning
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:46:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Clayton Coleman 2019-03-29 17:27:31 UTC
fail [k8s.io/kubernetes/test/e2e/common/pods.go:737]: Expected error:
    <*errors.errorString | 0xc42029b580>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
not to have occurred

Is the readiness flag gate even on?  If not, why is this test running?  If it is on, please:

a. verify it should be on
b. ensure the test isn't flaky

Setting high because we need to know why the gate is on or whether it should be off - if that's resolved it can be dropped to medium but is still a CI impacter 1/12 flake rate.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/6254#openshift-tests-k8sio-pods-should-support-pod-readiness-gates-nodefeaturepodreadinessgate-suiteopenshiftconformanceparallel-suitek8s

Comment 1 Seth Jennings 2019-04-04 15:29:55 UTC
https://github.com/kubernetes/kubernetes/pull/69303

Introduced a change in both the kubelet and e2e.  The current version skew between the e2e (1.13) and kubelet in RHCOS (1.12) is causing this failure.  Once the kubelet is 1.13 in RHCOS (which it is already is, but the pivot takes it back to 1.12 as of yesterday), this will go away.

Comment 2 Seth Jennings 2019-04-04 16:52:35 UTC
ART is pushing the new os container that has the 1.13 based hyperkube right now.  Once this is done we can deploy/upgrade a cluster and verify this is fixed.

Comment 3 Seth Jennings 2019-04-04 18:30:57 UTC
Moving this to POST as a high level indicator that the fix is merged and verification is pending.  Don't want to dump this on QE.  If it works, I'll just close as this was a transient issue caused by rebase version skew.

Comment 4 Seth Jennings 2019-04-04 18:35:38 UTC
PR to re-enable test
https://github.com/openshift/origin/pull/22486

Comment 5 Seth Jennings 2019-04-04 22:59:39 UTC
this is a NodeConformance test.  confirmed blocker.

Comment 6 Seth Jennings 2019-04-09 17:39:20 UTC
origin CI release build 4.0.0-0.alpha-2019-04-09-164546 moved machine-os-content to 1.13 base
https://origin-release.svc.ci.openshift.org/releasestream/4.0.0-0.alpha/release/4.0.0-0.alpha-2019-04-09-164546

Comment 9 Weinan Liu 2019-04-22 10:00:01 UTC
(In reply to Seth Jennings from comment #3)
> Moving this to POST as a high level indicator that the fix is merged and
> verification is pending.  Don't want to dump this on QE.  If it works, I'll
> just close as this was a transient issue caused by rebase version skew.

Hi Seth,
Do you still need QE involved in the verification? If not and you have already verified the rebase issue, would you mind pushing it to VERIFIED?
Thanks!

Comment 11 errata-xmlrpc 2019-06-04 10:46:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758