Bug 1484095

Summary: REST request log spam is back in OCP 3.7
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: NodeAssignee: Avesh Agarwal <avagarwa>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: aos-bugs, eparis, erich, jokerman, mifiedle, mmccomas, sjenning, xtian
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1531557 1531558 (view as bug list) Environment:
Last Closed: 2017-11-28 22:07:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1531557, 1531558    

Description Mike Fiedler 2017-08-22 16:50:22 UTC
Description of problem:

Looks like a regression with the 1.7 rebase.  Same issue as described by https://bugzilla.redhat.com/show_bug.cgi?id=1414813.

At loglevel 2 the master logs fill up pretty quickly with REST request logging:

Aug 22 16:45:50 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:50.821651   17961 wrap.go:42] GET /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=2687&timeoutSeconds=488&watch=true: (8m8.000625503s) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:50 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:50.822431   17961 rest.go:349] Starting watch for /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings, rv=2687 labels= fields= timeout=6m51s
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.275308   17961 wrap.go:42] GET /api/v1/nodes/ip-172-31-10-189.us-west-2.compute.internal?resourceVersion=0: (766.418µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.10.189:54774]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.285035   17961 wrap.go:42] PATCH /api/v1/nodes/ip-172-31-10-189.us-west-2.compute.internal/status: (2.746248ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.10.189:54774]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.318163   17961 wrap.go:42] GET /api/v1/nodes/ip-172-31-8-167.us-west-2.compute.internal?resourceVersion=0: (727.992µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.8.167:58272]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.328059   17961 wrap.go:42] PATCH /api/v1/nodes/ip-172-31-8-167.us-west-2.compute.internal/status: (2.93939ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.8.167:58272]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.946031   17961 wrap.go:42] GET /api/v1/namespaces/kube-system/secrets/pod-garbage-collector-token-0djvb: (1.344436ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.947217   17961 wrap.go:42] GET /api/v1/namespaces/kube-system/serviceaccounts/pod-garbage-collector: (936.412µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.948753   17961 wrap.go:42] GET /api/v1/nodes: (4.554035ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420/system:serviceaccount:kube-system:pod-garbage-collector] 172.31.47.62:48106]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.725343   17961 wrap.go:42] GET /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers: (1.146946ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48096]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.727686   17961 wrap.go:42] PUT /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers: (1.714487ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48096]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.986797   17961 wrap.go:42] GET /api/v1/namespaces/default: (1.089537ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.988311   17961 wrap.go:42] GET /api/v1/namespaces/default/services/kubernetes: (828.645µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.399458   17961 wrap.go:42] GET /api/v1/endpoints?resourceVersion=4318&timeoutSeconds=494&watch=true: (8m14.000595824s) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48086]
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.400427   17961 rest.go:349] Starting watch for /api/v1/endpoints, rv=4318 labels= fields= timeout=8m22s
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.796436   17961 wrap.go:42] GET /api/v1/nodes/ip-172-31-47-62.us-west-2.compute.internal?resourceVersion=0: (738.639µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48086]
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.806339   17961 wrap.go:42] PATCH /api/v1/nodes/ip-172-31-47-62.us-west-2.compute.internal/status: (2.715475ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48086]
Aug 22 16:45:55 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:55.162356   17961 wrap.go:42] GET /api/v1/resourcequotas?resourceVersion=2687&timeoutSeconds=320&watch=true: (5m20.00067913s) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48096]



Version-Release number of selected component (if applicable):

oc v3.7.0-0.104.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-47-62.us-west-2.compute.internal:8443
openshift v3.7.0-0.104.0
kubernetes v1.7.0+695f48a16f



How reproducible:  Always

Comment 1 Seth Jennings 2017-08-23 15:31:22 UTC
Avesh, this is a 3.7 regression of bz1414813.  Can you create a pick for origin master?

Also, we need to get the upstream PR merged so this doesn't happen on every rebase https://github.com/kubernetes/kubernetes/pull/40933

Comment 2 Avesh Agarwal 2017-08-23 15:55:09 UTC
(In reply to Seth Jennings from comment #1)
> Avesh, this is a 3.7 regression of bz1414813.  Can you create a pick for
> origin master?

Ok I will create it soon.

> 
> Also, we need to get the upstream PR merged so this doesn't happen on every
> rebase https://github.com/kubernetes/kubernetes/pull/40933

Please help and put your weight in the PR, any help to support that PR is more than welcome.

Comment 5 Seth Jennings 2017-09-12 18:50:24 UTC
Mike, is it acceptable for you to decrease the overall loglevel to 1 and use --logspec to selectively increase the log level on a per package basis?

Comment 6 Mike Fiedler 2017-09-12 20:12:47 UTC
Almost every problem we (QE) provide logs for, dev wants at least loglevel 2.  The types of problems we find in scalability and performance testing are often difficult or impossible to recreate and the first failure logs are critical to debugging.   I think having online, customers and QE run at loglevel  1 is going to result in a lot of frustration for support and development.

ping @eparis and @erich

Comment 7 Seth Jennings 2017-09-12 20:46:27 UTC
Another option here is to just carry the patch.  That PR is already open:

https://github.com/openshift/origin/pull/15934

Just need to agree to the carry and inform deads to let the PR through.

Both Erics, what do you think?

Comment 8 Avesh Agarwal 2017-09-12 21:00:22 UTC
(In reply to Mike Fiedler from comment #6)
> Almost every problem we (QE) provide logs for, dev wants at least loglevel
> 2.  The types of problems we find in scalability and performance testing are
> often difficult or impossible to recreate and the first failure logs are
> critical to debugging.   I think having online, customers and QE run at
> loglevel  1 is going to result in a lot of frustration for support and
> development.
> 
> ping @eparis and @erich

The plan is to continue with loglevel to default to 2 and only for that file (http log) is to have at level 1 by using logspec.

Comment 9 Avesh Agarwal 2017-09-12 21:03:37 UTC
So it'd look like this:

--loglevel=2 --logspec='httplog=1'

Comment 10 Mike Fiedler 2017-09-13 00:38:47 UTC
+1 to comment 9 if it works

Comment 11 Eric Paris 2017-09-13 13:31:59 UTC
Online/RH managed clusters run at v2.

I can't speak for QE clusters...

Comment 12 Eric Paris 2017-09-14 16:01:18 UTC
I don't care if it is a CARRY. These calls need to be gone by default in V(2) in openshift.

Comment 13 Seth Jennings 2017-09-15 14:51:33 UTC
Moved to post as the carry is already open
https://github.com/openshift/origin/pull/15934

Eric just approved it.

Comment 15 Mike Fiedler 2017-09-27 12:28:47 UTC
Verified on 3.7.0-0.127.0.   Logs are usable again ;-)

Comment 18 errata-xmlrpc 2017-11-28 22:07:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188