1468420 – LoadBalancerRR: Removing log spam

Bug 1468420 - LoadBalancerRR: Removing log spam

Summary: LoadBalancerRR: Removing log spam

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Dan Williams
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-07 03:16 UTC by Eric Paris
Modified:	2017-11-28 22:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: We logged useless information, often, and at log level 2. Consequence: The logs fill with junk. Fix: Reduce the times we log to those when there are actual changes. Result: Far fewer messages, and only when they are relevant.
Clone Of:
Environment:
Last Closed:	2017-11-28 22:00:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Eric Paris 2017-07-07 03:16:45 UTC

Over 2 minutes I see log messages like:

LoadBalancerRR: Removing endpoints for ops-health-monitoring/pull-07062050z-ie:8080-tcp

11 times. This leaves me wondering what the message really means. Why does it need to be removed every 10 seconds? Shouldn't it be removed 1 time and be gone? Why do we keep printing this over and over?

If this is not actually a problem or telling me anything interesting I consider it spam and should be at a higher level than V(2). But I want to understand what it is actually telling me and fix any bug that it might be indicating before we just hide it.

Comment 2 Dan Williams 2017-07-07 16:01:17 UTC

(In reply to Eric Paris from comment #0)
> Over 2 minutes I see log messages like:
> 
> LoadBalancerRR: Removing endpoints for
> ops-health-monitoring/pull-07062050z-ie:8080-tcp
> 
> 11 times. This leaves me wondering what the message really means. Why does
> it need to be removed every 10 seconds? Shouldn't it be removed 1 time and
> be gone? Why do we keep printing this over and over?
> 
> If this is not actually a problem or telling me anything interesting I
> consider it spam and should be at a higher level than V(2). But I want to
> understand what it is actually telling me and fix any bug that it might be
> indicating before we just hide it.

It's not telling you anything interesting unless you want to debug the proxy.  We should hide it.  It likely gets printed every time there's an endpoints or service update.

Comment 3 Eric Paris 2017-07-07 20:09:44 UTC

If it is really normal to print that we are removing the SAME thing every 10 seconds that doesn't belong at V(2). Now if we actually add or remove a service that probably should be at V(2). Tagging for 3.7.0 to get it off any 3.6 block list mess.

Comment 4 Dan Williams 2017-07-07 20:49:55 UTC

Upstream PR: https://github.com/kubernetes/kubernetes/pull/48635

Note our Origin PR will be different, since upstream has changed.  But it's literally just:

diff --git a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
index 5d58c3c..f81635b 100644
--- a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
+++ b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
@@ -299,10 +299,12 @@ func (lb *LoadBalancerRR) OnEndpointsUpdate(allEndpoints []*api.Endpoints) {
        // Remove endpoints missing from the update.
        for k := range lb.services {
                if _, exists := registeredEndpoints[k]; !exists {
-                       glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k)
                        // Reset but don't delete.
                        state := lb.services[k]
-                       state.endpoints = []string{}
+                       if len(state.endpoints) > 0 {
+                               glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k)
+                               state.endpoints = []string{}
+                       }
                        state.index = 0
                        state.affinity.affinityMap = map[string]*affinityState{}
                }

Comment 5 Dan Williams 2017-07-12 15:29:15 UTC

Origin PR: https://github.com/openshift/origin/pull/15166

Comment 6 Meng Bo 2017-07-14 03:23:36 UTC

ose/cb1f3d352b1ed29291086221c06ae07c7f8bb5cf has been merged in to v3.6.144.

Move the bug to ON_QA for testing.

Comment 7 Yan Du 2017-07-18 03:29:30 UTC

Test on latest OCP env, there is no such spam log shown in node log.
openshift v3.6.152.0
kubernetes v1.6.1+5115d708d7

Comment 10 errata-xmlrpc 2017-11-28 22:00:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.