Bug 1874583
| Summary: | apiserver tries and fails to log an event when shutting down | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
| Component: | kube-apiserver | Assignee: | Stefan Schimanski <sttts> |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.6 | CC: | aos-bugs, kewang, mfojtik, nstielau, sttts, xxia |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:17:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1885619 | ||
Event validation logic changed recently in upstream https://github.com/kubernetes/kubernetes/commit/75612c17461fe81fb969768e12ca52bfb22f745d#diff-b08aadf812d8dba261a29b585894d1e0 For the events we are creating, validation function should bail out at this point. https://github.com/kubernetes/kubernetes/commit/75612c17461fe81fb969768e12ca52bfb22f745d#diff-b08aadf812d8dba261a29b585894d1e0R46 But as group and version are not set for this request, it slips into the new validation logic and fails here https://github.com/kubernetes/kubernetes/commit/75612c17461fe81fb969768e12ca52bfb22f745d#diff-b08aadf812d8dba261a29b585894d1e0R53 But even after fixing the validation, I still see "grpc error: code = Unavailable desc = transport is closing" errors. Changes I tried are here. https://github.com/openshift/kubernetes/pull/336/files I was trying to see the events in a integration test but instead found grpc errors after the above fix. First, checked the not-fixed 4.6.1 env, can reproduce the issue:
ssh to master, check:
[root@xxia1028bm-qdtfs-control-plane-0 ~]# crictl ps --name='kube-apiserver$'
CONTAINER IMAGE CREATED STATE NAME
ATTEMPT POD ID
7985fb1b750fd ee1ad20f0063bb88b3bffd9cfd0ceddeca5ef14d9fb7e1b033fac4d8e61bc140 58 minutes ago Running kube-apis
erver 0 558ca25d06f43
[root@xxia1028bm-qdtfs-control-plane-0 ~]# crictl logs -f 7985fb1b750fd &> 7985fb1b750fd_kube-apiserver.log &
[root@xxia1028bm-qdtfs-control-plane-0 ~]# crictl stop 7985fb1b750fd # make kube-apiserver shutdown
[root@xxia1028bm-qdtfs-control-plane-0 ~]# vi 7985fb1b750fd_kube-apiserver.log # reproduced 'failed to create event ... Invalid value: "": needs to be unset'
...
I1028 03:49:13.236470 1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
I1028 03:49:13.237278 18 controller.go:181] Shutting down kubernetes service endpoint reconciler
I1028 03:49:13.237289 18 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-xxia1028bm-qdtfs-control-plane-0", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationStart' Received signal to terminate, becoming unready, but keeping serving
W1028 03:49:13.237647 18 genericapiserver.go:670] failed to create event openshift-kube-apiserver/kube-apiserver-xxia1028bm-qdtfs-control-plane-0.16420b66c03dcb8f: Event "kube-apiserver-xxia1028bm-qdtfs-control-plane-0.16420b66c03dcb8f" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset]
...
In latest 4.7.0-0.nightly-2020-10-27-051128 env, repeat above steps, the shutdown kube-apiserver container logs do not reproduce it:
...
I1028 02:57:06.183558 18 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-156-192.ap-northeast-1.compute.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationStart' Received signal to terminate, becoming unready, but keeping serving
I1028 02:57:06.184397 1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
I1028 02:57:06.286774 1 main.go:198] Termination finished with exit code 1
I1028 02:57:06.286868 1 main.go:151] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Seen in kube-apiserver logs (in a failed IPv6 bare metal run but I don't think that part is relevant): I0831 23:50:21.751294 1 secure_serving.go:241] Stopped listening on [::]:6443 I0831 23:50:21.751373 1 genericapiserver.go:665] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"default", Name:"kube-system", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationGracefulTerminationFinished' All pending requests processed W0831 23:50:21.751444 1 genericapiserver.go:668] failed to create event default/kube-system.16307f49968807b6: Event "kube-system.16307f49968807b6" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset] dunno if this failure mode might affect other events too...