Bug 1506736

Summary: Misleading error messages in Heapster log when namespace deleted - Hawkular returns 500 to Heapster
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: HawkularAssignee: Jan Martiska <jmartisk>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: low Docs Contact:
Priority: low    
Version: 3.7.0CC: aos-bugs, jmartisk, jsanda, juzhao, lmeyer, mifiedle, mwringe, rvargasp, wsun
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-20 14:11:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Fiedler 2017-10-26 17:01:03 UTC
Description of problem:

The error messages in the Hawkular and Heapster logs when a namespace gets deleted aren't the greatest.

Hawkular shows:

2017-10-26 16:30:48,888 WARN  [org.hawkular.openshift.auth.org.hawkular.openshift.namespace.NamespaceListener] (default I/O-11) Error getting project metadata. Code 404: Not Found
2017-10-26 16:30:48,889 INFO  [org.hawkular.openshift.auth.org.hawkular.openshift.namespace.NamespaceHandler] (default I/O-11) Could not determine a namespace id for namespace. Cannot process request. Returning an INTERNAL_SERVER_ERROR.

The 404 WARN is fine, but is there a better way to communicate a deleted/non-existent namespace than a 500?

Corresponding heapster log is:

E1026 16:30:49.334428       1 client.go:338] Hawkular returned status code 500, error message: Reply could not be parsed: invalid character '<' looking for beginning of value

Probably should not be an E message and not a parse error.


Version-Release number of selected component (if applicable): 3.7.0-0.176.0


How reproducible: Always when deleting namespaces while metrics are collected


Steps to Reproduce:
1.   Delete namespaces/projects while metrics are collected and monitor the hawkular and heapster pod logs
Expected results:


Additional info:

Comment 1 Matt Wringe 2017-10-26 18:32:22 UTC
This issue only happens if you are a cluster admin or reader and you are requesting metrics for a project which does not exist.

Hawkular Metrics see that OpenShift has authorised this user but then cannot find the corresponding project and throws the error message.

Its being tracked with https://issues.jboss.org/browse/HWKMETRICS-741 but its considered a low priority issue.

Comment 2 Matt Wringe 2017-10-26 18:37:06 UTC
Actually, wait, the Heapster issue is something else. In this case Heapster is trying to write metrics to Hawkular Metrics for a project which no longer exists.

Is this something you have seen before? Or only after trying Micke test container?

Comment 3 Mike Fiedler 2017-10-26 18:49:02 UTC
I've seen it before with non-patched 3.7.   Not sure about 3.6.   I think it is definitely when heapster tries to write metrics for the deleted project the first time.   It does not continue indefinitely.  Maybe a timing window while project is in Terminating state.  Low priority in any case.

Comment 4 John Sanda 2018-05-29 19:39:17 UTC
Mike,

The changes for HWMETRICS-741 went into OCP 3.9. Do you know if this is still a problem?

Comment 5 Mike Fiedler 2018-05-29 21:28:32 UTC
In OpenShift 3.10.0-0.53.0 I see a pair of messages in the heapster log for each deleted namespace:

W0529 21:23:05.204029       1 namespace_based_enricher.go:67] Namespace doesn't exist: clusterproject16                                                                                                                                                                                   
E0529 21:23:05.673293       1 client.go:254] Hawkular returned status code 404, error message: Reply could not be parsed: invalid character '<' looking for beginning of value 

In the Hawkular log I just see:

2018-05-29 21:23:05,370 WARN  [org.hawkular.openshift.auth.org.hawkular.openshift.namespace.NamespaceListener] (default I/O-8) Error getting project metadata. Code 404: Not Found

I no longer see the 500.

John, What do you think?  All that will be done here?  If so, go ahead and put this ON_QA.   I think the parse error in the Heapster log is ugly, but not a big deal.

Comment 6 John Sanda 2018-05-30 02:30:27 UTC
I agree that the parser error is ugly. Let's leave it open for now. There are some other tickets we have to tackle first, but the changes involved to clean this up some should be minor and low risk.

Comment 7 Jan Martiska 2018-11-06 15:03:07 UTC
John, I will take this if you don't mind.

Comment 8 Jan Martiska 2018-11-07 13:41:09 UTC
PR for 0.31.x which is used in OpenShift 3.11: https://github.com/hawkular/hawkular-metrics/pull/994 
If needed, I can backport to older versions too...

Comment 12 Wei Sun 2018-12-05 09:52:19 UTC
Hi Mike
Please help check if this bug could be verified.
Thanks!

Comment 13 Mike Fiedler 2018-12-05 15:58:19 UTC
On OCP 3.11.51 I still see the messages:


W1205 15:55:05.073712       1 namespace_based_enricher.go:67] Namespace doesn't exist: clusterproject7
E1205 15:55:05.268897       1 client.go:254] Hawkular returned status code 404, error message: Reply could not be parsed: invalid character '<' looking for beginning of value                                                                                                           

The hawkular version in this build is:

Hawkular Metrics version: 0.31.0.Final-redhat-1+72a93bbbf5

@jmartisk Do you expect the fix to be in ^^ level of hawkular?

Comment 15 Jan Martiska 2018-12-06 07:20:46 UTC
No, this is now merged in 0.31 branch but hasn't been part of any actual release yet, I suppose it will be in 0.31.1.Final. Not sure if MODIFIED is the right state for this as it seems to trigger some automatic transitions?! Should this be in POST?

Comment 16 Mike Fiedler 2018-12-06 12:52:47 UTC
I think MODIFIED is correct since the PR has merged.   POST is usually when the PR has been created but not yet merged.

Comment 18 Junqi Zhao 2018-12-21 05:36:26 UTC
On OCP 3.11.59 I still see the messages:
E1221 05:17:35.775110       1 client.go:254] Hawkular returned status code 404, error message: Reply could not be parsed: invalid character '<' looking for beginning of value
W1221 05:18:35.073278       1 namespace_based_enricher.go:67] Namespace doesn't exist: p6

all metrics images version: v3.11.59-2
Hawkular Metrics version: 0.31.0.Final-redhat-1

Move back to MODIFIED

I suggest to remove this defect from errata if we don't want to fix it on time, since it is not a critical issue.

Comment 20 Jan Martiska 2019-01-17 08:22:59 UTC
This is fixed in image brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-hawkular-metrics:v3.11.70 (Hawkular Metrics 0.31.1.Final-redhat-1)

Comment 21 Mike Fiedler 2019-01-18 16:14:49 UTC
Marking verified on hawkular v0.31.0.Final-redhat-1+72a93bbbf5 as installed from OpenShift 3.11.69

INFO  2019-01-18 14:42:06,591 [main] org.hawkular.metrics.schema.Installer:logVersion:110 - Hawkular Metrics Schema Installer v0.31.0.Final-redhat-1+72a93bbbf5

registry.reg-aws.openshift.com:443/openshift3/metrics-cassandra           v3.11.69            7d57819ca4f9        44 hours ago        596 MB
registry.reg-aws.openshift.com:443/openshift3/metrics-heapster            v3.11.69            486ca24c0e16        44 hours ago        284 MB
registry.reg-aws.openshift.com:443/openshift3/metrics-hawkular-metrics    v3.11.69            d5689ada5ca7        44 hours ago        1.6 GB


After namespace deletion, heapster logs now contain a reasonable pair of messages:

W0118 16:08:35.062221       1 namespace_based_enricher.go:67] Namespace doesn't exist: clusterproject10
E0118 16:08:35.668171       1 client.go:254] Hawkular returned status code 404, error message: Could not determine a namespace id for namespace clusterproject10

Comment 23 errata-xmlrpc 2019-02-20 14:11:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326