Bug 1468420

Summary: LoadBalancerRR: Removing log spam
Product: OpenShift Container Platform Reporter: Eric Paris <eparis>
Component: NetworkingAssignee: Dan Williams <dcbw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, bbennett, danw, yadu
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: We logged useless information, often, and at log level 2. Consequence: The logs fill with junk. Fix: Reduce the times we log to those when there are actual changes. Result: Far fewer messages, and only when they are relevant.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:00:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eric Paris 2017-07-07 03:16:45 UTC
Over 2 minutes I see log messages like:

LoadBalancerRR: Removing endpoints for ops-health-monitoring/pull-07062050z-ie:8080-tcp

11 times. This leaves me wondering what the message really means. Why does it need to be removed every 10 seconds? Shouldn't it be removed 1 time and be gone? Why do we keep printing this over and over?

If this is not actually a problem or telling me anything interesting I consider it spam and should be at a higher level than V(2). But I want to understand what it is actually telling me and fix any bug that it might be indicating before we just hide it.

Comment 2 Dan Williams 2017-07-07 16:01:17 UTC
(In reply to Eric Paris from comment #0)
> Over 2 minutes I see log messages like:
> 
> LoadBalancerRR: Removing endpoints for
> ops-health-monitoring/pull-07062050z-ie:8080-tcp
> 
> 11 times. This leaves me wondering what the message really means. Why does
> it need to be removed every 10 seconds? Shouldn't it be removed 1 time and
> be gone? Why do we keep printing this over and over?
> 
> If this is not actually a problem or telling me anything interesting I
> consider it spam and should be at a higher level than V(2). But I want to
> understand what it is actually telling me and fix any bug that it might be
> indicating before we just hide it.

It's not telling you anything interesting unless you want to debug the proxy.  We should hide it.  It likely gets printed every time there's an endpoints or service update.

Comment 3 Eric Paris 2017-07-07 20:09:44 UTC
If it is really normal to print that we are removing the SAME thing every 10 seconds that doesn't belong at V(2). Now if we actually add or remove a service that probably should be at V(2). Tagging for 3.7.0 to get it off any 3.6 block list mess.

Comment 4 Dan Williams 2017-07-07 20:49:55 UTC
Upstream PR: https://github.com/kubernetes/kubernetes/pull/48635

Note our Origin PR will be different, since upstream has changed.  But it's literally just:

diff --git a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
index 5d58c3c..f81635b 100644
--- a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
+++ b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go
@@ -299,10 +299,12 @@ func (lb *LoadBalancerRR) OnEndpointsUpdate(allEndpoints []*api.Endpoints) {
        // Remove endpoints missing from the update.
        for k := range lb.services {
                if _, exists := registeredEndpoints[k]; !exists {
-                       glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k)
                        // Reset but don't delete.
                        state := lb.services[k]
-                       state.endpoints = []string{}
+                       if len(state.endpoints) > 0 {
+                               glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k)
+                               state.endpoints = []string{}
+                       }
                        state.index = 0
                        state.affinity.affinityMap = map[string]*affinityState{}
                }

Comment 5 Dan Williams 2017-07-12 15:29:15 UTC
Origin PR: https://github.com/openshift/origin/pull/15166

Comment 6 Meng Bo 2017-07-14 03:23:36 UTC
ose/cb1f3d352b1ed29291086221c06ae07c7f8bb5cf has been merged in to v3.6.144.

Move the bug to ON_QA for testing.

Comment 7 Yan Du 2017-07-18 03:29:30 UTC
Test on latest OCP env, there is no such spam log shown in node log.
openshift v3.6.152.0
kubernetes v1.6.1+5115d708d7

Comment 10 errata-xmlrpc 2017-11-28 22:00:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188