Bug 1874583 - apiserver tries and fails to log an event when shutting down
Summary: apiserver tries and fails to log an event when shutting down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Stefan Schimanski
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks: 1885619
TreeView+ depends on / blocked
 
Reported: 2020-09-01 16:57 UTC by Dan Winship
Modified: 2021-02-24 15:17 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:17:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 336 0 None closed Bug 1874583: fix kube-apiserver termination event(s) validation failures 2021-01-14 14:31:19 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:17:26 UTC

Description Dan Winship 2020-09-01 16:57:06 UTC
Seen in kube-apiserver logs (in a failed IPv6 bare metal run but I don't think that part is relevant):

I0831 23:50:21.751294       1 secure_serving.go:241] Stopped listening on [::]:6443
I0831 23:50:21.751373       1 genericapiserver.go:665] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"default", Name:"kube-system", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationGracefulTerminationFinished' All pending requests processed
W0831 23:50:21.751444       1 genericapiserver.go:668] failed to create event default/kube-system.16307f49968807b6: Event "kube-system.16307f49968807b6" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset]

dunno if this failure mode might affect other events too...

Comment 1 Venkata Siva Teja Areti 2020-09-09 17:46:47 UTC
Event validation logic changed recently in upstream

https://github.com/kubernetes/kubernetes/commit/75612c17461fe81fb969768e12ca52bfb22f745d#diff-b08aadf812d8dba261a29b585894d1e0

For the events we are creating, validation function should bail out at this point. https://github.com/kubernetes/kubernetes/commit/75612c17461fe81fb969768e12ca52bfb22f745d#diff-b08aadf812d8dba261a29b585894d1e0R46

But as group and version are not set for this request, it slips into the new validation logic and fails here https://github.com/kubernetes/kubernetes/commit/75612c17461fe81fb969768e12ca52bfb22f745d#diff-b08aadf812d8dba261a29b585894d1e0R53

But even after fixing the validation, I still see "grpc error: code = Unavailable desc = transport is closing" errors.

Changes I tried are here.

https://github.com/openshift/kubernetes/pull/336/files

I was trying to see the events in a integration test but instead found grpc errors after the above fix.

Comment 8 Xingxing Xia 2020-10-28 06:31:48 UTC
First, checked the not-fixed 4.6.1 env, can reproduce the issue:
ssh to master, check:
[root@xxia1028bm-qdtfs-control-plane-0 ~]# crictl ps --name='kube-apiserver$'                                                           
CONTAINER           IMAGE                                                              CREATED             STATE               NAME     
           ATTEMPT             POD ID                                                                                                  
7985fb1b750fd       ee1ad20f0063bb88b3bffd9cfd0ceddeca5ef14d9fb7e1b033fac4d8e61bc140   58 minutes ago      Running             kube-apis
erver      0                   558ca25d06f43

[root@xxia1028bm-qdtfs-control-plane-0 ~]# crictl logs -f 7985fb1b750fd &> 7985fb1b750fd_kube-apiserver.log &

[root@xxia1028bm-qdtfs-control-plane-0 ~]# crictl stop 7985fb1b750fd # make kube-apiserver shutdown
[root@xxia1028bm-qdtfs-control-plane-0 ~]# vi 7985fb1b750fd_kube-apiserver.log # reproduced 'failed to create event ... Invalid value: "": needs to be unset'
...
I1028 03:49:13.236470       1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
I1028 03:49:13.237278      18 controller.go:181] Shutting down kubernetes service endpoint reconciler
I1028 03:49:13.237289      18 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-xxia1028bm-qdtfs-control-plane-0", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationStart' Received signal to terminate, becoming unready, but keeping serving
W1028 03:49:13.237647      18 genericapiserver.go:670] failed to create event openshift-kube-apiserver/kube-apiserver-xxia1028bm-qdtfs-control-plane-0.16420b66c03dcb8f: Event "kube-apiserver-xxia1028bm-qdtfs-control-plane-0.16420b66c03dcb8f" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset]
...

In latest 4.7.0-0.nightly-2020-10-27-051128 env, repeat above steps, the shutdown kube-apiserver container logs do not reproduce it:
...
I1028 02:57:06.183558      18 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-156-192.ap-northeast-1.compute.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationStart' Received signal to terminate, becoming unready, but keeping serving
I1028 02:57:06.184397       1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
I1028 02:57:06.286774       1 main.go:198] Termination finished with exit code 1
I1028 02:57:06.286868       1 main.go:151] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
...

Comment 12 errata-xmlrpc 2021-02-24 15:17:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.