Bug 1468420
Summary: | LoadBalancerRR: Removing log spam | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Paris <eparis> |
Component: | Networking | Assignee: | Dan Williams <dcbw> |
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.0 | CC: | aos-bugs, bbennett, danw, yadu |
Target Milestone: | --- | ||
Target Release: | 3.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: We logged useless information, often, and at log level 2.
Consequence: The logs fill with junk.
Fix: Reduce the times we log to those when there are actual
changes.
Result: Far fewer messages, and only when they are relevant.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-11-28 22:00:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Paris
2017-07-07 03:16:45 UTC
(In reply to Eric Paris from comment #0) > Over 2 minutes I see log messages like: > > LoadBalancerRR: Removing endpoints for > ops-health-monitoring/pull-07062050z-ie:8080-tcp > > 11 times. This leaves me wondering what the message really means. Why does > it need to be removed every 10 seconds? Shouldn't it be removed 1 time and > be gone? Why do we keep printing this over and over? > > If this is not actually a problem or telling me anything interesting I > consider it spam and should be at a higher level than V(2). But I want to > understand what it is actually telling me and fix any bug that it might be > indicating before we just hide it. It's not telling you anything interesting unless you want to debug the proxy. We should hide it. It likely gets printed every time there's an endpoints or service update. If it is really normal to print that we are removing the SAME thing every 10 seconds that doesn't belong at V(2). Now if we actually add or remove a service that probably should be at V(2). Tagging for 3.7.0 to get it off any 3.6 block list mess. Upstream PR: https://github.com/kubernetes/kubernetes/pull/48635 Note our Origin PR will be different, since upstream has changed. But it's literally just: diff --git a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go index 5d58c3c..f81635b 100644 --- a/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go +++ b/vendor/k8s.io/kubernetes/pkg/proxy/userspace/roundrobin.go @@ -299,10 +299,12 @@ func (lb *LoadBalancerRR) OnEndpointsUpdate(allEndpoints []*api.Endpoints) { // Remove endpoints missing from the update. for k := range lb.services { if _, exists := registeredEndpoints[k]; !exists { - glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k) // Reset but don't delete. state := lb.services[k] - state.endpoints = []string{} + if len(state.endpoints) > 0 { + glog.V(2).Infof("LoadBalancerRR: Removing endpoints for %s", k) + state.endpoints = []string{} + } state.index = 0 state.affinity.affinityMap = map[string]*affinityState{} } ose/cb1f3d352b1ed29291086221c06ae07c7f8bb5cf has been merged in to v3.6.144. Move the bug to ON_QA for testing. Test on latest OCP env, there is no such spam log shown in node log. openshift v3.6.152.0 kubernetes v1.6.1+5115d708d7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |