Bug 1879232
Summary: | API request timed out | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Santiago Maudet <smaudet> |
Component: | openshift-apiserver | Assignee: | Lukasz Szaszkiewicz <lszaszki> |
Status: | CLOSED NOTABUG | QA Contact: | Xingxing Xia <xxia> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.5 | CC: | aconstan, akuriyan, alchan, amsingh, aos-bugs, braander, bshirren, cpassare, dyocum, jcoscia, jcrumple, jseunghw, ktadimar, lszaszki, malonso, mfojtik, morgan.peterman, ngirard, oarribas, openshift-bugs-escalate, rdiscala, rgregory, rvanderp, shsaxena, sttts, surya |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 4.7.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-04 10:33:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Santiago Maudet
2020-09-15 17:59:48 UTC
We need the must-gather logs shortly after the incident. Looking into the customer case, I see crash looping pod and I see failing SubjectAccessReviews that time out or show messages like 2020-09-14T19:16:04.140061553Z E0914 19:16:04.140000 1 upgradeaware.go:327] Error proxying data from backend to client: write tcp 10.26.17.45:6443->10.26.17.55:33292: write: broken pipe suggesting that the network connection is cut off. Without further logs, this pretty much looks like a problem of the service network (SDN) on the masters. Please provide: - kubelet logs - logs of failing pods - logs of everything around SDN on the masters. Moving to SDN to investigate network health on the masters. Hi team, Workaround 2 from kcs https://access.redhat.com/solutions/5448851 solved the issue. My case is closed now. Issue still affecting v4.5.16 and crippling master nodes after hours/days 155360 I1112 02:04:12.571661 1 log.go:172] http2: panic serving 10.6.14.84:51786: killing connection/stream because serving request timed out and response had been started (In reply to Brendan Shirren from comment #49) > Issue still affecting v4.5.16 and crippling master nodes after hours/days > > 155360 I1112 02:04:12.571661 1 log.go:172] http2: panic serving > 10.6.14.84:51786: killing connection/stream because serving request timed > out and response had been started Hi, could you gather must-gather for analysis? (In reply to Lukasz Szaszkiewicz from comment #50) > (In reply to Brendan Shirren from comment #49) > > > Issue still affecting v4.5.16 and crippling master nodes after hours/days > > > > 155360 I1112 02:04:12.571661 1 log.go:172] http2: panic serving > > 10.6.14.84:51786: killing connection/stream because serving request timed > > out and response had been started > > Hi, could you gather must-gather for analysis? CU just provided the mustgather in the case, it can be yanked on supportshell now and is at the following hydra link: https://attachments.access.redhat.com/hydra/rest/cases/02794589/attachments/8612c8b6-22cb-48f5-8029-e2551d196b3d?usePresignedUrl=true |