Bug 1924870

Summary: pick upstream pr#96901: plumb context with request deadline
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: kube-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: aos-bugs, kewang, mfojtik, wlewis, xxia
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:58:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abu Kashem 2021-02-03 19:16:35 UTC
Pick upstream PR: https://github.com/kubernetes/kubernetes/pull/96901

we want it in 4.7: when we debug customer escalations we run into scenarios where we see request activity lasting longer than 60s. This PR wires the context with an appropriate timeout immediately after receiving the request. This ensures that authentication, authorization, aggregation filters, they all use a deadline bound request context.

This lays the ground work for a better management of request deadline. We will inspect the authentication, authorization filters and the aggregation layer and ensure that the wired context is used. So in future we may pick more fixes along this line.

Comment 1 Wally 2021-02-04 16:59:35 UTC
Not a 4.7 blocker but something we'd like to merge b/f code freeze.

Comment 3 Xingxing Xia 2021-02-09 14:09:33 UTC
Tested in 4.7.0-0.nightly-2021-02-09-024347:
When timeout is invalid, got 400:
$ curl -XDELETE -ksSH "Authorization: Bearer $TOKEN" https://...:6443/api/v1/namespaces/xxia-proj/pods/node-hello-854495b46-2vmkd?timeout=aaaa
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "invalid timeout specified in the request URL - time: invalid duration \"aaaa\"",
  "reason": "BadRequest",
  "code": 400
}

When timeout is valid: i.e. when it is ?timeout= or ?timeout=${NUM}s or ?timeout=0s , the requests have no regression issue (per the PR's test code, need test if hasDeadlineExpected is set and what deadlineExpected is, but seems no way to test via e2e way. Now that the PR unit test covers, will not be stuck in thinking out how to test this via e2e way)

Comment 6 errata-xmlrpc 2021-02-24 15:58:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633