Bug 1910113 - periodic-ci-openshift-release-master-ocp-4.5-ci-e2e-44-stable-to-45-ci is never passing [NEEDINFO]
Summary: periodic-ci-openshift-release-master-ocp-4.5-ci-e2e-44-stable-to-45-ci is nev...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.4
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.8.0
Assignee: Ben Parees
QA Contact: Ke Wang
URL:
Whiteboard: LifecycleReset
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-22 17:28 UTC by Ben Parees
Modified: 2021-07-27 22:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:35:34 UTC
Target Upstream Version:
Embargoed:
bparees: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift release pull 18640 0 None open Bug 1910113: disable 4.5 rollback jobs as they are permfailing due to kubelet limitations 2021-05-17 17:21:46 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:36:03 UTC

Description Ben Parees 2020-12-22 17:28:09 UTC
Has not passed in the last 56+ runs:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#periodic-ci-openshift-release-master-ocp-4.5-ci-e2e-44-stable-to-45-ci

sample failure:
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.5-ci-e2e-44-stable-to-45-ci/1341352877758615552

reason for blaming apiserver:
 error: error sending request: Post https://api.ci-op-20ht1bp8-7b0c0.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/e2e-test-prometheus-6cknn/pods/execpodsbp2m/exec?command=%2Fbin%2Fsh&command=-x&command=-c&command=curl+-s+-k+-H+%27Authorization%3A+Bearer+eyJhbGciOiJSUzI1NiIsImtpZCI6ImU3eEZ3ZXFVNGNXdGZtZHVJSjl3TUdydVJNYmc1aFpBZTUxaEw2cFRrSUEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnFma3oiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNjlkYmM0NjctNmU3ZS00NzEzLTk3ZDgtOTZlNmZjYzVjNTg3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Y_XiSnY1EuDtlH-TmBioU4RWG2rsGLmCNWfG4oeWysThTVREHJqxRCX0tnpDa8-qnBJVPvGtZqxR3E9jA_Xox99oWtKxpJrxuG23-xu9OMvjWN-kPEy-PDd4HcXlV22VmlSiZo4Nzhh8MfZawC_x6GmYanX7d35MaZ0Px5s3cBf4nQTNzbxh5U5bK8rQDJJmT_cyWn_slwJ9d7EZ7IMqOg3GJ43UEzwKtTwqgYlds0eXDr4FNppfHgsBZaOY0dPyrMsDe2Lva7l_FcKnWhYQiLa8Yd7xSh0RIBVUG5n1GqzEIwHAX_gAoCmtcNNKvMZNZ7EI1xngsU39DUB-NAgV7g%27+%22https%3A%2F%2Fprometheus-k8s.openshift-monitoring.svc%3A9091%2Fapi%2Fv1%2Fquery%3Fquery%3DALERTS%257Balertname%2521~%2522Watchdog%257CAlertmanagerReceiversNotConfigured%257CPrometheusRemoteWriteDesiredShards%2522%252Calertstate%253D%2522firing%2522%252Cseverity%2521%253D%2522info%2522%257D%2B%253E%253D%2B1%22&container=agnhost-pause&stderr=true&stdout=true: dial tcp 52.21.157.92:6443: connect: connection refused

Other tests also have connection refused errors to the apiserver (yes i realize this may also mean a networking issue, feel free to reassign as needed).

There are also many "the server could not find the requested resource" errors reported from tests.

These failures look pretty consistent across the couple of recent runs i examined.

Comment 1 Stefan Schimanski 2021-01-04 10:55:33 UTC
At least since August there is no green run, compare https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.5-ci-e2e-44-stable-to-45-ci?buildId=1300221839577976832. Hence, this is certainly no recent regression.

Comment 6 Michal Fojtik 2021-02-11 16:29:02 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 7 Stefan Schimanski 2021-05-17 16:22:23 UTC
> Stefan, if i'm understanding correctly, this is a 4.5 to 4.4 downgrade test job and you're saying that downgrade between those levels will never work due to kubelet issues?  If so, should we simply disable/delete the job in question?

Feel free to do so. It has no priority to look into this further.

Comment 8 Ben Parees 2021-05-17 17:14:57 UTC
please don't close this bug until you disable the job (if that is indeed the only viable solution)

closing the bug while leaving the issue around just means someone else is going to end up investigating it again and opening a new bug.  (And also means we are continuing to waste CI resources running a worthless CI job)


current 4.5 rollback jobs that are failing:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#periodic-ci-openshift-release-master-ci-4.5-upgrade-from-stable-4.4-e2e-aws-upgrade-rollback
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#periodic-ci-openshift-release-master-ci-4.5-e2e-aws-upgrade-rollback

Yes I realize 4.5 is nearly EOL so it's not worth significant effort here.  Disabling the job may well be the right answer at this point.  But i'd like to at least encourage the practice here that we don't just close bugs when stuff is broken, without taking any action at all to mitigate the breakage.

There was also substantial discussion around this bug (https://bugzilla.redhat.com/show_bug.cgi?id=1947477#c5) about what our obligations/expectations are for rollback support in the face of one-way steps.

Comment 9 Michal Fojtik 2021-05-17 17:16:51 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 13 errata-xmlrpc 2021-07-27 22:35:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.