Bug 1894916

Summary: [4.6] Panic output due to timeouts in openshift-apiserver
Product: OpenShift Container Platform Reporter: Simon Pasquier <spasquie>
Component: openshift-apiserverAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: aos-bugs, lszaszki, mfojtik, skumari, slaznick, sttts, xxia
Target Milestone: ---Keywords: UpcomingSprint
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1885644
: 1894918 (view as bug list) Environment:
Undiagnosed panic detected in pod
Last Closed: 2021-02-08 13:50:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1885644    
Bug Blocks: 1894918    

Description Simon Pasquier 2020-11-05 12:31:41 UTC
+++ This bug was initially created as a clone of Bug #1885644 +++

test:
Undiagnosed panic detected in pod 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod


Several ovn upgrade from 4.5 -> 4.6 failed.

One of the job link - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.5-stable-to-4.6-ci/1313309253259235328

Error message:
pods/openshift-apiserver_apiserver-7d87777d99-lx9f4_openshift-apiserver.log.gz:E1006 04:09:10.949495       1 runtime.go:78] Observed a panic: &errors.errorString{s:"killing connection/stream because serving request timed out and response had been started"} (killing connection/stream because serving request timed out and response had been started)
pods/openshift-apiserver_apiserver-7d87777d99-lx9f4_openshift-apiserver.log.gz:E1006 04:09:11.910858       1 runtime.go:78] Observed a panic: &errors.errorString{s:"killing connection/stream because serving request timed out and response had been started"} (killing connection/stream because serving request timed out and response had been started)

--- Additional comment from Stefan Schimanski on 2020-10-07 15:09:02 UTC ---

This is not a classical panic. We have a fix in kube-apiserver to make it prettier.

--- Additional comment from Lukasz Szaszkiewicz on 2020-10-23 08:08:57 UTC ---

I have opened a WIP PR https://github.com/kubernetes/kubernetes/pull/95002, haven't got time to finish it

Comment 2 Lukasz Szaszkiewicz 2020-11-16 09:43:43 UTC
The upstream PR merged last week. I'm in the process of backporting it to the earlier version.

Comment 3 Lukasz Szaszkiewicz 2020-12-04 10:56:31 UTC
Once https://bugzilla.redhat.com/show_bug.cgi?id=1885644 is verified I am going to apply the fix to 4.6

Comment 4 Xingxing Xia 2020-12-10 10:42:32 UTC
Following pre-merge test process - issue DPTP-660, launched env with cluster-bot:
launch openshift/kubernetes#480,openshift/oauth-apiserver#33,openshift/oauth-server#63,openshift/openshift-apiserver#161
Then test the env with the steps in my comment of bug 1885644#c8 . Got same result. So this bug is pre-merge-verified.

Comment 5 Lukasz Szaszkiewicz 2020-12-10 11:23:31 UTC
(In reply to Xingxing Xia from comment #4)
> Following pre-merge test process - issue DPTP-660, launched env with
> cluster-bot:
> launch
> openshift/kubernetes#480,openshift/oauth-apiserver#33,openshift/oauth-
> server#63,openshift/openshift-apiserver#161
> Then test the env with the steps in my comment of bug 1885644#c8 . Got same
> result. So this bug is pre-merge-verified.

Thanks, please remember to move it to VERIFIED once all mentioned PRs are merged.
Otherwise, some PRs might get stuck on an invalid BZ error.

Comment 6 Xingxing Xia 2020-12-10 14:10:44 UTC
Lukasz, no worry, as already witnessed, issue DPTP-660 implemented robot automatically moves PRs-merged bug from ON_QA to VERIFIED as long as DPTP-660 Description's process is strictly followed. What we do like above is exactly following that.

Comment 7 Lukasz Szaszkiewicz 2021-01-15 10:16:41 UTC
PRs are in the merge queue.

Comment 8 Lukasz Szaszkiewicz 2021-01-28 13:39:55 UTC
Xingxing all required PRs have just been merged. PTAL. Thanks.

Comment 12 errata-xmlrpc 2021-02-08 13:50:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0308