1923976 – Confusing KAS/OAS/OauthAS spam: loopyWriter.run returning. connection error: desc = "transport is closing"

Bug 1923976 - Confusing KAS/OAS/OauthAS spam: loopyWriter.run returning. connection error: desc = "transport is closing"

Summary: Confusing KAS/OAS/OauthAS spam: loopyWriter.run returning. connection error: ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Nobody
QA Contact:	Sandeep
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-02 11:09 UTC by Xingxing Xia
Modified:	2024-12-20 19:35 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-11 10:18:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5931541	0	None	None	None	2021-04-08 00:38:46 UTC

Description Xingxing Xia 2021-02-02 11:09:53 UTC

Description of problem:
KAS/OAS/OauthAS logs spam: loopyWriter.run returning. connection error: desc = "transport is closing"
But seems these messages are not indicative of any problem we care about, so they quite confuse users

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-31-031653

How reproducible:
*Always*
The spam can be seen in any env on your hand, so will not attach must-gather

Steps to Reproduce:
1. $ oc logs --timestamps -n openshift-kube-apiserver -c kube-apiserver kube-apiserver-ip-10-143-1-199.ec2.internal > logs/kube-apiserver-ip-10-143-1-199.ec2.internal.log
$ oc logs --timestamps -n openshift-apiserver -c openshift-apiserver apiserver-6cc8bf5c8f-7gr7n > logs/apiserver-6cc8bf5c8f-7gr7n.log
$ oc logs --timestamps -n openshift-oauth-apiserver oauth-apiserver-6bd4fbd859-l2f8b > logs/oauth-apiserver-6bd4fbd859-l2f8b.log

2.
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/kube-apiserver-ip-10-143-1-199.ec2.internal.log | wc -l
253
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/apiserver-6cc8bf5c8f-7gr7n.log | wc -l
827
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/oauth-apiserver-6bd4fbd859-l2f8b.log | wc -l
804

Take OAS for example:
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/apiserver-6cc8bf5c8f-7gr7n.log | grep "^2021-02-02T09:50"
2021-02-02T09:50:02.522899544Z I0202 09:50:02.522863       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:05.128339529Z I0202 09:50:05.128313       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:13.314228703Z I0202 09:50:13.314204       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:35.511715474Z I0202 09:50:35.511680       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:45.454702141Z I0202 09:50:45.454672       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:47.907317981Z I0202 09:50:47.907288       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Actual results:
2. KAS/OAS/OauthAS spam such logs, they confuse users

Expected results:
2. No such confusing spam

Additional info:
Bug 1844448#c0 once ever showed such logs too, now not see:
$ oc logs --timestamps -n openshift-etcd-operator etcd-operator-fb8cbf695-hlzf2 > logs/etcd-operator-fb8cbf695-hlzf2.log
$ grep 'loopyWriter.run.*connection error' logs/etcd-operator-fb8cbf695-hlzf2.log # none

Also not see in etcd container logs.

Comment 1 Stefan Schimanski 2021-02-02 11:19:02 UTC

This is https://github.com/grpc/grpc-go#the-rpc-failed-with-error-code--unavailable-desc--transport-is-closing, a behaviour of the grpc library used in the etcd client. Moving to etcd.

Comment 3 Michal Fojtik 2021-03-07 20:26:55 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 4 Fatima 2021-04-06 08:21:58 UTC

We have a customer facing this for kube-apiserver pods. One of the pods show:

~~~
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' kas1.logs | wc -l
13583
~~~

where kas1 is "oc logs <podname> -c kube-apiserver > kas1.logs" for one of the masters. This master is showing more than 60% cpu consumption and specifically this pod is consuming more cpu.

If any information is required please let me know I'll update the bug.

Comment 5 Fatima 2021-04-08 00:39:29 UTC

Dear team, 

Any updates here?

Thanks.

Comment 6 Fatima 2021-04-16 05:38:03 UTC

Dear team, 

Any updates here?

Thanks.

Comment 13 Sam Batschelet 2021-06-18 20:46:45 UTC

This is fixed in k8s 1.22 https://github.com/kubernetes/kubernetes/pull/102896

Comment 14 Sam Batschelet 2021-06-18 20:47:14 UTC

We will keep the BZ open until it lands.

Comment 18 Sandeep 2021-07-07 06:25:20 UTC

steps followed are :

1. Install cluster version : 4.9.0-0.nightly-2021-07-06-025559
2. check the logs of apiserver and saw the  message "loopyWriter.run returning. connection error: desc = "transport is closing"

The same issue is also seen in Server Version: 4.7.0-0.nightly-2021-07-06-050841

Comment 20 Sam Batschelet 2021-07-07 09:55:47 UTC

1.22 rebase has not landed in openshift so we can retest when that happens.

Comment 21 Mridul Markandey 2021-07-12 19:14:38 UTC

We have a customer facing the same issue that the kube-api server is consuming max of CPU resources and the Kube-apiserver pods logs shows this output:
~~~
ccResolverWrapper: sending update to cc: {[{https://localhost:2379  <nil> 0 <nil>}] <nil> <nil>}
2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885      17 clientconn.go:948] ClientConn switching balancer to "pick_first"
2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>}
2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>}
2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980      17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-07-09T06:36:52.532112913Z W0709 06:36:52.531817      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:52.582434039Z W0709 06:36:52.582254      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:52.731282776Z W0709 06:36:52.731187      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:52.931948178Z W0709 06:36:52.931846      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:53.131199569Z W0709 06:36:53.131091      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:53.330927116Z W0709 06:36:53.330835      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:53.531801993Z W0709 06:36:53.531689      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
~~~
Can you suggest a workaround for this? If more information is required please let me know I'll update the bug.

Comment 22 Mridul Markandey 2021-07-14 13:35:00 UTC

Hello Team, 

Any update on this? 

Thanks

Comment 23 Sam Batschelet 2021-07-14 13:47:10 UTC

There is no workaround for etcd client balancer gRPC logs in apiserver, we are going to improve this in 4.9.

>2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885      17 clientconn.go:948] ClientConn switching balancer to "pick_first"
>2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>}
>2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>}
>2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980      17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

This is not root cause for CPU if you have issues around that you should start a new bug against apiserver.

Comment 36 Red Hat Bugzilla 2023-09-15 01:32:36 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.