1958329 – pick 97428: add more context to log after a request times out

Bug 1958329 - pick 97428: add more context to log after a request times out

Summary: pick 97428: add more context to log after a request times out

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Abu Kashem
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-07 16:16 UTC by Abu Kashem
Modified:	2021-07-27 23:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 23:07:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
apiserver_request_post_timeout_total record. (54.07 KB, image/png) 2021-06-17 10:58 UTC, Ke Wang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 737	0	None	open	Bug 1958329: UPSTREAM: 97428: add more context to log after a request times out	2021-05-07 16:56:41 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:07:42 UTC

Description Abu Kashem 2021-05-07 16:16:46 UTC

Pick https://github.com/kubernetes/kubernetes/pull/97428

We are seeing "net/http Abort Handler" message originating from the aggregator after the request had timed out. We want to have more context in the log so we have some idea where these are coming from.

Comment 2 Ke Wang 2021-06-04 08:56:55 UTC

Per PR description(https://github.com/kubernetes/kubernetes/pull/97428), This PR logs the result:

    with request verb and path
    with time elapsed since the timeout

Verification steps as below,

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-03-221810   True        False         4h4m    Cluster version is 4.8.0-0.nightly-2021-06-03-221810

$ oc debug node/<master-node>
sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep ' timeout\.go\:.*elapsed'  termination.log
..
E0604 04:28:00.677751      18 timeout.go:135] post-timeout activity - time-elapsed: 4.33764ms, GET "/api/v1/namespaces/openshift-kube-scheduler/configmapskube-scheduler" result: <nil>
...
E0604 04:29:48.893162      18 timeout.go:135] post-timeout activity - time-elapsed: 2.807µs, POST "/api/v1/namespaces/openshift-apiserver/events" result: <nil>

From above results, the timeout logs recorded with request verb and path and time elapsed since the timeout, as expected, so move the bug VERIFIED.

Comment 3 Abu Kashem 2021-06-04 13:54:28 UTC

kewang,

did you have a chance to verify that the metric counter added in this PR gets recorded?

Comment 5 Ke Wang 2021-06-17 10:58:00 UTC

Created attachment 1791782 [details]
apiserver_request_post_timeout_total record.

apiserver_request_post_timeout_total record in metrics.

Comment 6 Abu Kashem 2021-06-17 13:35:13 UTC

the metric looks good, thanks!

Comment 8 errata-xmlrpc 2021-07-27 23:07:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.