Bug 1484095 - REST request log spam is back in OCP 3.7
Summary: REST request log spam is back in OCP 3.7
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.7.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Avesh Agarwal
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks: 1531557 1531558
TreeView+ depends on / blocked
 
Reported: 2017-08-22 16:50 UTC by Mike Fiedler
Modified: 2018-01-05 13:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1531557 1531558 (view as bug list)
Environment:
Last Closed: 2017-11-28 22:07:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Mike Fiedler 2017-08-22 16:50:22 UTC
Description of problem:

Looks like a regression with the 1.7 rebase.  Same issue as described by https://bugzilla.redhat.com/show_bug.cgi?id=1414813.

At loglevel 2 the master logs fill up pretty quickly with REST request logging:

Aug 22 16:45:50 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:50.821651   17961 wrap.go:42] GET /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=2687&timeoutSeconds=488&watch=true: (8m8.000625503s) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:50 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:50.822431   17961 rest.go:349] Starting watch for /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings, rv=2687 labels= fields= timeout=6m51s
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.275308   17961 wrap.go:42] GET /api/v1/nodes/ip-172-31-10-189.us-west-2.compute.internal?resourceVersion=0: (766.418µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.10.189:54774]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.285035   17961 wrap.go:42] PATCH /api/v1/nodes/ip-172-31-10-189.us-west-2.compute.internal/status: (2.746248ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.10.189:54774]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.318163   17961 wrap.go:42] GET /api/v1/nodes/ip-172-31-8-167.us-west-2.compute.internal?resourceVersion=0: (727.992µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.8.167:58272]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.328059   17961 wrap.go:42] PATCH /api/v1/nodes/ip-172-31-8-167.us-west-2.compute.internal/status: (2.93939ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.8.167:58272]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.946031   17961 wrap.go:42] GET /api/v1/namespaces/kube-system/secrets/pod-garbage-collector-token-0djvb: (1.344436ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.947217   17961 wrap.go:42] GET /api/v1/namespaces/kube-system/serviceaccounts/pod-garbage-collector: (936.412µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:52 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:52.948753   17961 wrap.go:42] GET /api/v1/nodes: (4.554035ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420/system:serviceaccount:kube-system:pod-garbage-collector] 172.31.47.62:48106]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.725343   17961 wrap.go:42] GET /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers: (1.146946ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48096]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.727686   17961 wrap.go:42] PUT /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers: (1.714487ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48096]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.986797   17961 wrap.go:42] GET /api/v1/namespaces/default: (1.089537ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:53 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:53.988311   17961 wrap.go:42] GET /api/v1/namespaces/default/services/kubernetes: (828.645µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48066]
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.399458   17961 wrap.go:42] GET /api/v1/endpoints?resourceVersion=4318&timeoutSeconds=494&watch=true: (8m14.000595824s) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48086]
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.400427   17961 rest.go:349] Starting watch for /api/v1/endpoints, rv=4318 labels= fields= timeout=8m22s
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.796436   17961 wrap.go:42] GET /api/v1/nodes/ip-172-31-47-62.us-west-2.compute.internal?resourceVersion=0: (738.639µs) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48086]
Aug 22 16:45:54 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:54.806339   17961 wrap.go:42] PATCH /api/v1/nodes/ip-172-31-47-62.us-west-2.compute.internal/status: (2.715475ms) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48086]
Aug 22 16:45:55 ip-172-31-49-195 atomic-openshift-master-api: I0822 16:45:55.162356   17961 wrap.go:42] GET /api/v1/resourcequotas?resourceVersion=2687&timeoutSeconds=320&watch=true: (5m20.00067913s) 200 [[openshift/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 172.31.47.62:48096]



Version-Release number of selected component (if applicable):

oc v3.7.0-0.104.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-47-62.us-west-2.compute.internal:8443
openshift v3.7.0-0.104.0
kubernetes v1.7.0+695f48a16f



How reproducible:  Always

Comment 1 Seth Jennings 2017-08-23 15:31:22 UTC
Avesh, this is a 3.7 regression of bz1414813.  Can you create a pick for origin master?

Also, we need to get the upstream PR merged so this doesn't happen on every rebase https://github.com/kubernetes/kubernetes/pull/40933

Comment 2 Avesh Agarwal 2017-08-23 15:55:09 UTC
(In reply to Seth Jennings from comment #1)
> Avesh, this is a 3.7 regression of bz1414813.  Can you create a pick for
> origin master?

Ok I will create it soon.

> 
> Also, we need to get the upstream PR merged so this doesn't happen on every
> rebase https://github.com/kubernetes/kubernetes/pull/40933

Please help and put your weight in the PR, any help to support that PR is more than welcome.

Comment 5 Seth Jennings 2017-09-12 18:50:24 UTC
Mike, is it acceptable for you to decrease the overall loglevel to 1 and use --logspec to selectively increase the log level on a per package basis?

Comment 6 Mike Fiedler 2017-09-12 20:12:47 UTC
Almost every problem we (QE) provide logs for, dev wants at least loglevel 2.  The types of problems we find in scalability and performance testing are often difficult or impossible to recreate and the first failure logs are critical to debugging.   I think having online, customers and QE run at loglevel  1 is going to result in a lot of frustration for support and development.

ping @eparis and @erich

Comment 7 Seth Jennings 2017-09-12 20:46:27 UTC
Another option here is to just carry the patch.  That PR is already open:

https://github.com/openshift/origin/pull/15934

Just need to agree to the carry and inform deads to let the PR through.

Both Erics, what do you think?

Comment 8 Avesh Agarwal 2017-09-12 21:00:22 UTC
(In reply to Mike Fiedler from comment #6)
> Almost every problem we (QE) provide logs for, dev wants at least loglevel
> 2.  The types of problems we find in scalability and performance testing are
> often difficult or impossible to recreate and the first failure logs are
> critical to debugging.   I think having online, customers and QE run at
> loglevel  1 is going to result in a lot of frustration for support and
> development.
> 
> ping @eparis and @erich

The plan is to continue with loglevel to default to 2 and only for that file (http log) is to have at level 1 by using logspec.

Comment 9 Avesh Agarwal 2017-09-12 21:03:37 UTC
So it'd look like this:

--loglevel=2 --logspec='httplog=1'

Comment 10 Mike Fiedler 2017-09-13 00:38:47 UTC
+1 to comment 9 if it works

Comment 11 Eric Paris 2017-09-13 13:31:59 UTC
Online/RH managed clusters run at v2.

I can't speak for QE clusters...

Comment 12 Eric Paris 2017-09-14 16:01:18 UTC
I don't care if it is a CARRY. These calls need to be gone by default in V(2) in openshift.

Comment 13 Seth Jennings 2017-09-15 14:51:33 UTC
Moved to post as the carry is already open
https://github.com/openshift/origin/pull/15934

Eric just approved it.

Comment 15 Mike Fiedler 2017-09-27 12:28:47 UTC
Verified on 3.7.0-0.127.0.   Logs are usable again ;-)

Comment 18 errata-xmlrpc 2017-11-28 22:07:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.