Description of problem:
The error messages in the Hawkular and Heapster logs when a namespace gets deleted aren't the greatest.
2017-10-26 16:30:48,888 WARN [org.hawkular.openshift.auth.org.hawkular.openshift.namespace.NamespaceListener] (default I/O-11) Error getting project metadata. Code 404: Not Found
2017-10-26 16:30:48,889 INFO [org.hawkular.openshift.auth.org.hawkular.openshift.namespace.NamespaceHandler] (default I/O-11) Could not determine a namespace id for namespace. Cannot process request. Returning an INTERNAL_SERVER_ERROR.
The 404 WARN is fine, but is there a better way to communicate a deleted/non-existent namespace than a 500?
Corresponding heapster log is:
E1026 16:30:49.334428 1 client.go:338] Hawkular returned status code 500, error message: Reply could not be parsed: invalid character '<' looking for beginning of value
Probably should not be an E message and not a parse error.
Version-Release number of selected component (if applicable): 3.7.0-0.176.0
How reproducible: Always when deleting namespaces while metrics are collected
Steps to Reproduce:
1. Delete namespaces/projects while metrics are collected and monitor the hawkular and heapster pod logs
This issue only happens if you are a cluster admin or reader and you are requesting metrics for a project which does not exist.
Hawkular Metrics see that OpenShift has authorised this user but then cannot find the corresponding project and throws the error message.
Its being tracked with https://issues.jboss.org/browse/HWKMETRICS-741 but its considered a low priority issue.
Actually, wait, the Heapster issue is something else. In this case Heapster is trying to write metrics to Hawkular Metrics for a project which no longer exists.
Is this something you have seen before? Or only after trying Micke test container?
I've seen it before with non-patched 3.7. Not sure about 3.6. I think it is definitely when heapster tries to write metrics for the deleted project the first time. It does not continue indefinitely. Maybe a timing window while project is in Terminating state. Low priority in any case.
The changes for HWMETRICS-741 went into OCP 3.9. Do you know if this is still a problem?
In OpenShift 3.10.0-0.53.0 I see a pair of messages in the heapster log for each deleted namespace:
W0529 21:23:05.204029 1 namespace_based_enricher.go:67] Namespace doesn't exist: clusterproject16
E0529 21:23:05.673293 1 client.go:254] Hawkular returned status code 404, error message: Reply could not be parsed: invalid character '<' looking for beginning of value
In the Hawkular log I just see:
2018-05-29 21:23:05,370 WARN [org.hawkular.openshift.auth.org.hawkular.openshift.namespace.NamespaceListener] (default I/O-8) Error getting project metadata. Code 404: Not Found
I no longer see the 500.
John, What do you think? All that will be done here? If so, go ahead and put this ON_QA. I think the parse error in the Heapster log is ugly, but not a big deal.
I agree that the parser error is ugly. Let's leave it open for now. There are some other tickets we have to tackle first, but the changes involved to clean this up some should be minor and low risk.
John, I will take this if you don't mind.
PR for 0.31.x which is used in OpenShift 3.11: https://github.com/hawkular/hawkular-metrics/pull/994
If needed, I can backport to older versions too...
Please help check if this bug could be verified.
On OCP 3.11.51 I still see the messages:
W1205 15:55:05.073712 1 namespace_based_enricher.go:67] Namespace doesn't exist: clusterproject7
E1205 15:55:05.268897 1 client.go:254] Hawkular returned status code 404, error message: Reply could not be parsed: invalid character '<' looking for beginning of value
The hawkular version in this build is:
Hawkular Metrics version: 0.31.0.Final-redhat-1+72a93bbbf5
@jmartisk Do you expect the fix to be in ^^ level of hawkular?
No, this is now merged in 0.31 branch but hasn't been part of any actual release yet, I suppose it will be in 0.31.1.Final. Not sure if MODIFIED is the right state for this as it seems to trigger some automatic transitions?! Should this be in POST?
I think MODIFIED is correct since the PR has merged. POST is usually when the PR has been created but not yet merged.
On OCP 3.11.59 I still see the messages:
E1221 05:17:35.775110 1 client.go:254] Hawkular returned status code 404, error message: Reply could not be parsed: invalid character '<' looking for beginning of value
W1221 05:18:35.073278 1 namespace_based_enricher.go:67] Namespace doesn't exist: p6
all metrics images version: v3.11.59-2
Hawkular Metrics version: 0.31.0.Final-redhat-1
Move back to MODIFIED
I suggest to remove this defect from errata if we don't want to fix it on time, since it is not a critical issue.
This is fixed in image brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-hawkular-metrics:v3.11.70 (Hawkular Metrics 0.31.1.Final-redhat-1)
Marking verified on hawkular v0.31.0.Final-redhat-1+72a93bbbf5 as installed from OpenShift 3.11.69
INFO 2019-01-18 14:42:06,591 [main] org.hawkular.metrics.schema.Installer:logVersion:110 - Hawkular Metrics Schema Installer v0.31.0.Final-redhat-1+72a93bbbf5
registry.reg-aws.openshift.com:443/openshift3/metrics-cassandra v3.11.69 7d57819ca4f9 44 hours ago 596 MB
registry.reg-aws.openshift.com:443/openshift3/metrics-heapster v3.11.69 486ca24c0e16 44 hours ago 284 MB
registry.reg-aws.openshift.com:443/openshift3/metrics-hawkular-metrics v3.11.69 d5689ada5ca7 44 hours ago 1.6 GB
After namespace deletion, heapster logs now contain a reasonable pair of messages:
W0118 16:08:35.062221 1 namespace_based_enricher.go:67] Namespace doesn't exist: clusterproject10
E0118 16:08:35.668171 1 client.go:254] Hawkular returned status code 404, error message: Could not determine a namespace id for namespace clusterproject10
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.