Bug 1468420

Summary:	LoadBalancerRR: Removing log spam
Product:	OpenShift Container Platform	Reporter:	Eric Paris <eparis>
Component:	Networking	Assignee:	Dan Williams <dcbw>
Status:	CLOSED ERRATA	QA Contact:	Meng Bo <bmeng>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.6.0	CC:	aos-bugs, bbennett, danw, yadu
Target Milestone:	---
Target Release:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: We logged useless information, often, and at log level 2. Consequence: The logs fill with junk. Fix: Reduce the times we log to those when there are actual changes. Result: Far fewer messages, and only when they are relevant.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-11-28 22:00:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Paris 2017-07-07 03:16:45 UTC

Over 2 minutes I see log messages like:

LoadBalancerRR: Removing endpoints for ops-health-monitoring/pull-07062050z-ie:8080-tcp

11 times. This leaves me wondering what the message really means. Why does it need to be removed every 10 seconds? Shouldn't it be removed 1 time and be gone? Why do we keep printing this over and over?

If this is not actually a problem or telling me anything interesting I consider it spam and should be at a higher level than V(2). But I want to understand what it is actually telling me and fix any bug that it might be indicating before we just hide it.

Comment 2 Dan Williams 2017-07-07 16:01:17 UTC

(In reply to Eric Paris from comment #0)
> Over 2 minutes I see log messages like:
> 
> LoadBalancerRR: Removing endpoints for
> ops-health-monitoring/pull-07062050z-ie:8080-tcp
> 
> 11 times. This leaves me wondering what the message really means. Why does
> it need to be removed every 10 seconds? Shouldn't it be removed 1 time and
> be gone? Why do we keep printing this over and over?
> 
> If this is not actually a problem or telling me anything interesting I
> consider it spam and should be at a higher level than V(2). But I want to
> understand what it is actually telling me and fix any bug that it might be
> indicating before we just hide it.

It's not telling you anything interesting unless you want to debug the proxy.  We should hide it.  It likely gets printed every time there's an endpoints or service update.

Comment 3 Eric Paris 2017-07-07 20:09:44 UTC

If it is really normal to print that we are removing the SAME thing every 10 seconds that doesn't belong at V(2). Now if we actually add or remove a service that probably should be at V(2). Tagging for 3.7.0 to get it off any 3.6 block list mess.

Comment 4 Dan Williams 2017-07-07 20:49:55 UTC

Upstream PR: https://github.com/kubernetes/kubernetes/pull/48635

Note our Origin PR will be different, since upstream has changed.  But it's literally just:

diff --git a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
index 5d58c3c..f81635b 100644
--- a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
+++ b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
@@ -299,10 +299,12 @@ func (lb *LoadBalancerRR) OnEndpointsUpdate(allEndpoints []*api.Endpoints) {
        // Remove endpoints missing from the update.
        for k := range lb.services {
                if _, exists := registeredEndpoints[k]; !exists {
-                       glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k)
                        // Reset but don't delete.
                        state := lb.services[k]
-                       state.endpoints = []string{}
+                       if len(state.endpoints) > 0 {
+                               glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k)
+                               state.endpoints = []string{}
+                       }
                        state.index = 0
                        state.affinity.affinityMap = map[string]*affinityState{}
                }

Comment 5 Dan Williams 2017-07-12 15:29:15 UTC

Origin PR: https://github.com/openshift/origin/pull/15166

Comment 6 Meng Bo 2017-07-14 03:23:36 UTC

ose/cb1f3d352b1ed29291086221c06ae07c7f8bb5cf has been merged in to v3.6.144.

Move the bug to ON_QA for testing.

Comment 7 Yan Du 2017-07-18 03:29:30 UTC

Test on latest OCP env, there is no such spam log shown in node log.
openshift v3.6.152.0
kubernetes v1.6.1+5115d708d7

Comment 10 errata-xmlrpc 2017-11-28 22:00:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188