1903999 – Httplog response code is always zero

Bug 1903999 - Httplog response code is always zero [NEEDINFO]

Summary: Httplog response code is always zero

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Lukasz Szaszkiewicz
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:	LifecycleReset
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-03 11:05 UTC by Stefan Schimanski
Modified:	2021-02-24 15:37 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously httplog was disabled for requests on the paths like /healthz or /readyz (used by LB). As a result, it was impossible to check the status of a request in the logs. Now, HTTP Status Code is properly reported in the logs so we know what response was sent back to a LB.
Clone Of:
Environment:
Last Closed:	2021-02-24 15:37:28 UTC
Target Upstream Version:
Embargoed:
Flags:	mfojtik: needinfo?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 494	0	None	closed	Bug 1903999: Httplog response code is always zero	2021-01-15 15:17:32 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:37:55 UTC

Description Stefan Schimanski 2020-12-03 11:05:32 UTC

Description of problem:

The httplog value of the http response code is always zero, like in:

I1201 01:06:54.612229      19 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces/openshift-monitoring/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dprometheus-k8s-rulefiles-0&resourceVersion=32579&timeout=5m10s&timeoutSeconds=310&watch=true" latency="26.42360465s" userAgent="kubelet/v1.19.2+ad738ba (linux/amd64) kubernetes/ad738ba" srcIP="10.0.202.49:25022" resp=0
I1201 00:50:09.052969      18 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="9.85141ms" userAgent="ELB-HealthChecker/2.0" srcIP="10.0.56.74:18849" resp=0

This is a real problem because we need this value to easily prove that the LB health check gets a non-200 during termination.

Version-Release number of selected component (if applicable):

4.7 for sure, probably a bug we have for a long time.

How reproducible:

Always. Check audit termination logs where httplog is printed independently from the verbosity.

Actual results:

resp=0

Expected results:

resp=200, resp=503, ...

Comment 1 Lukasz Szaszkiewicz 2020-12-03 11:27:44 UTC

Stefan noticed that we don't set the rsp code only for /readyz and watch requests

Comment 2 Michal Fojtik 2021-01-02 11:58:24 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 Lukasz Szaszkiewicz 2021-01-11 11:15:27 UTC

A PR is open and ready for review.

Comment 4 Michal Fojtik 2021-01-11 11:38:56 UTC

The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 6 Ke Wang 2021-01-18 07:50:03 UTC

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-17-211555   True        False         57m     Cluster version is 4.7.0-0.nightly-2021-01-17-211555

$ oc debug node/<master node>
...
sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep 'httplog.go.*readyz.*resp=500' termination.log
...
I0118 06:37:30.966346      18 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="2.843191ms" userAgent="ELB-HealthChecker/2.0" srcIP="10.0.52.66:52533" resp=500
I0118 06:37:31.287344      18 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="3.58769ms" userAgent="ELB-HealthChecker/2.0" srcIP="10.0.188.240:18384" resp=500
I0118 06:37:31.376421      18 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="2.861812ms" userAgent="ELB-HealthChecker/2.0" srcIP="10.0.188.240:41642" resp=500
I0118 06:37:31.692545      18 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="3.182211ms" userAgent="ELB-HealthChecker/2.0" srcIP="10.0.146.166:50504" resp=500
...

Just like PR said, for health checks on the paths like /readyz, HTTP 500 is considered a normal response to signal a LB to stop sending traffic for that path. So move the bug VERIFIED.

Comment 9 errata-xmlrpc 2021-02-24 15:37:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.