Bug 1685185
Summary: | API servers reject traffic before being removed as an endpoint | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Adam Kaplan <adam.kaplan> | ||||
Component: | Master | Assignee: | Stefan Schimanski <sttts> | ||||
Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.1.0 | CC: | aos-bugs, jokerman, mfojtik, mmccomas, sttts, yinzhou | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-06-04 10:44:55 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1686509 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Adam Kaplan
2019-03-04 14:55:21 UTC
This is probably a combination of https://github.com/openshift/cluster-openshift-apiserver-operator/pull/154 and the sometimes unexpected lag of endpoint convergence (we saw >30 sec in very bad cases). With https://github.com/openshift/cluster-openshift-apiserver-operator/pull/154 merged, lets try to re-test this. Today tried bug's "1. Add a blocked registry to the cluster image configuration (see attachment for sample YAML)" which blocked quay.io, then my env worker nodes become scheduledisabled, after I uncordon, later they turned to not ready, the env is broken. Will build another env, and try comment 0 steps again (will not include quay.io as blocked) (In reply to Xingxing Xia from comment #5) > Today tried bug's "1. Add a blocked registry to the cluster image > configuration (see attachment for sample YAML)" which blocked quay.io, then > my env worker nodes become scheduledisabled, after I uncordon, later they > turned to not ready, the env is broken. Found https://bugzilla.redhat.com/show_bug.cgi?id=1686509 reported same issue. Latest payload 4.0.0-0.nightly-2019-03-22-002648 which already includes above fix https://github.com/openshift/cluster-openshift-apiserver-operator/pull/154 . In terminal T1: ssh to master, run: $ tail -f /var/log/openshift-apiserver/audit.log # It outputs a flow of many requests constantly in every second In terminal T2: $ watch -n 1 oc get ep,po -n openshift-apiserver In terminal T3: $ oc delete po --all -n openshift-apiserver After T3's command issued, look at T1 and T2, got: In T2, endpoints and pods disappeared immediately meantime, and the output flow in T1 suspended immediately meantime, too, until T2's endpoints and pods come back. From this perspective, the issue is not hit again. BTW, there is https://github.com/openshift/cluster-kube-apiserver-operator/pull/352 in case slowly converging SDN env. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |