1928874 – Log spam etcd: rejected connection from "LOCAL_IP:POD" (error "EOF", ServerName "")

Bug 1928874 - Log spam etcd: rejected connection from "LOCAL_IP:POD" (error "EOF", ServerName "")

Summary: Log spam etcd: rejected connection from "LOCAL_IP:POD" (error "EOF", ServerNa...

Keywords:
Status:	CLOSED DUPLICATE of bug 1946607
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sam Batschelet
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:	LifecycleStale
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-15 18:13 UTC by Ryan Howe
Modified:	2023-09-15 01:01 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-07 14:43:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5801821	0	None	None	None	2021-02-15 18:20:30 UTC

Description Ryan Howe 2021-02-15 18:13:50 UTC

Description of problem:

Local IP of etcd shows following log every 5 seconds. 

 embed: rejected connection from "192.168.0.14:37100" (error "EOF", ServerName "")

Version-Release number of selected component (if applicable):
4.6.13

How reproducible:
100%

Steps to Reproduce:
1. Every 4.6 environment I see

Actual results:
 Log spam, no issue just log

Expected results:
 No spam in logs. 

Additional info:

Issue due to readiness probe for etcd container in pod.


    readinessProbe:
      failureThreshold: 3
      initialDelaySeconds: 3
      periodSeconds: 5
      successThreshold: 1
      tcpSocket:
        port: 2380
      timeoutSeconds: 5


Can generate same log running the following from host: 


 # curl telnet://10.0.88.152:2380
  
Log seen every 5seconds from the hosts IP address where pod is running. 


```
2021-02-14 15:46:57.761552 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-02-14 15:47:00.316848 I | embed: rejected connection from "10.0.88.152:57670" (error "EOF", ServerName "")
2021-02-14 15:47:02.756808 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-02-14 15:47:05.316775 I | embed: rejected connection from "10.0.88.152:57720" (error "EOF", ServerName "")
2021-02-14 15:47:07.735943 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-02-14 15:47:10.316650 I | embed: rejected connection from "10.0.88.152:57782" (error "EOF", ServerName "")
2021-02-14 15:47:12.742267 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-02-14 15:47:15.316686 I | embed: rejected connection from "10.0.88.152:57860" (error "EOF", ServerName "")
```

Comment 1 Michal Fojtik 2021-03-17 18:20:21 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 2 Sam Batschelet 2021-03-17 18:43:57 UTC

Ryan sorry this has lingered. This should be looked at these are tcpSocket readiness probes failing TLS auth from kubelet. While they are begnine they are bug food. I will talk with node team about options.

https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/pod.yaml#L165

Comment 3 Michal Fojtik 2021-03-17 19:20:27 UTC

The LifecycleStale keyword was removed because the needinfo? flag was reset and the bug got commented on recently.
The bug assignee was notified.

Comment 5 W. Trevor King 2021-04-06 21:58:01 UTC

4.5 has been in the maintenance phase since November [1], and this medium-severity, benign (per comment 2) issue doesn't rise to the level needed to qualify for a maintenance-phase backport [2].  4.6 is in the maintenance phase since March [1], so same for it.  Maybe this would get a backport to 4.7, once there's a fix.  But since it's just cosmetic, it might not be worth any backports.  If it's actually causing anyone trouble, it would be nice to hear more details about how.

[1]: https://access.redhat.com/support/policy/updates/openshift#dates
[2]: https://access.redhat.com/support/policy/updates/openshift#ocp4_phases

Comment 6 Michal Fojtik 2021-05-06 22:14:28 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 7 Sam Batschelet 2021-05-07 14:43:52 UTC

> 4.5 has been in the maintenance phase since November [1], and this medium-severity, benign (per comment 2) issue doesn't rise to the level needed to qualify for a maintenance-phase backport [2].  4.6 is in the maintenance phase since March [1], so same for it.  Maybe this would get a backport to 4.7, once there's a fix.  But since it's just cosmetic, it might not be worth any backports.  If it's actually causing anyone trouble, it would be nice to hear more details about how.

One thing is that the probe is not really providing a clear ready signal so I do believe in cases like upgrade we expose ourselves to a window where we would inadvertently cause transient quorum loss during a static pod revision. The static pod controller waits for ready status before it will schedule the installer pod for next instance. If that timing is off you can see how 2 etcds could be down at the same time.

*** This bug has been marked as a duplicate of bug 1946607 ***

Comment 9 Red Hat Bugzilla 2023-09-15 01:01:17 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.