Pick https://github.com/kubernetes/kubernetes/pull/97428 We are seeing "net/http Abort Handler" message originating from the aggregator after the request had timed out. We want to have more context in the log so we have some idea where these are coming from.
Per PR description(https://github.com/kubernetes/kubernetes/pull/97428), This PR logs the result: with request verb and path with time elapsed since the timeout Verification steps as below, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-03-221810 True False 4h4m Cluster version is 4.8.0-0.nightly-2021-06-03-221810 $ oc debug node/<master-node> sh-4.4# cd /var/log/kube-apiserver sh-4.4# grep ' timeout\.go\:.*elapsed' termination.log .. E0604 04:28:00.677751 18 timeout.go:135] post-timeout activity - time-elapsed: 4.33764ms, GET "/api/v1/namespaces/openshift-kube-scheduler/configmapskube-scheduler" result: <nil> ... E0604 04:29:48.893162 18 timeout.go:135] post-timeout activity - time-elapsed: 2.807µs, POST "/api/v1/namespaces/openshift-apiserver/events" result: <nil> From above results, the timeout logs recorded with request verb and path and time elapsed since the timeout, as expected, so move the bug VERIFIED.
kewang, did you have a chance to verify that the metric counter added in this PR gets recorded?
Created attachment 1791782 [details] apiserver_request_post_timeout_total record. apiserver_request_post_timeout_total record in metrics.
the metric looks good, thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438