Bug 1465220 - NPE for DropWizardReporter every minute in hawkular-metrics logs
Summary: NPE for DropWizardReporter every minute in hawkular-metrics logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-27 02:13 UTC by Mike Fiedler
Modified: 2017-08-16 19:51 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-08-10 05:28:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Hawkular-metrics pod log (53.02 KB, application/x-gzip)
2017-06-27 02:15 UTC, Mike Fiedler
no flags Details
Issue is fixed, hawkular metrics pod log (79.04 KB, text/plain)
2017-07-10 09:25 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Mike Fiedler 2017-06-27 02:13:03 UTC
Description of problem:

Every minute I am seeing the following NPE in the hawkular-metrcs log.   Making it difficult to debug other issues if it is not in fact a real error.

2017-06-27 02:05:09,237 ERROR [com.codahale.metrics.ScheduledReporter] (metrics-hawkular-metrics-reporter-1-thread-1) RuntimeException thrown from DropWizardReporter#report. Exception was suppressed.: java.lang.NullPointerException
        at org.hawkular.metrics.core.dropwizard.MetricNameService.createMetricName(MetricNameService.java:77)
        at org.hawkular.metrics.core.dropwizard.DropWizardReporter.getMetricId(DropWizardReporter.java:194)
        at org.hawkular.metrics.core.dropwizard.DropWizardReporter.lambda$report$2(DropWizardReporter.java:103)
        at java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.forEach(Collections.java:1580)
        at org.hawkular.metrics.core.dropwizard.DropWizardReporter.report(DropWizardReporter.java:102)
        at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
        at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)



Version-Release number of selected component (if applicable): hawkular-metrics 3.6.122

registry.ops.openshift.com/openshift3/metrics-hawkular-metrics    v3.6.122            992323342eea 


How reproducible: Always


Steps to Reproduce:
1.  Deploy metrics (inventory below)
2.  oc logs <hawkular-metrics-pod>

Actual results:

Log is full of NPEs with the above stack


Expected results:

Clean logs for normal operation


Additional info:

[oo_first_master]
192.1.0.8

[oo_first_master:vars]
openshift_deployment_type=openshift-enterprise
openshift_release=v3.6.0

openshift_metrics_install_metrics=true
openshift_metrics_hawkular_hostname=hawkular-metrics.0615-yzo.qe.rhcloud.com
openshift_metrics_project=openshift-infra
openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=v3.6.122
openshift_metrics_cassandra_replicas=1
openshift_metrics_hawkular_replicas=1
openshift_metrics_cassandra_storage_type=pv
openshift_metrics_cassandra_pvc_size=395Gi

Comment 1 Mike Fiedler 2017-06-27 02:14:08 UTC
I found upstream issue https://issues.jboss.org/browse/HWKMETRICS-577?_sscc=t for another NPE in DropWizardReporter but the stack (and trigger) are different.

Comment 2 Mike Fiedler 2017-06-27 02:15:08 UTC
Created attachment 1292141 [details]
Hawkular-metrics pod log

Comment 3 John Sanda 2017-06-28 14:40:39 UTC
Lowering priority and bumping release since this does not impact any functionality used in OpenShift, and it can be disabled.

Comment 4 John Sanda 2017-06-28 15:45:18 UTC
Retargetting for 3.6 because this does impact some new, needed functionality for monitoring the health of hawkular-metrics.

Comment 5 John Sanda 2017-06-29 00:36:44 UTC
This is fixed upstream in https://issues.jboss.org/browse/HWKMETRICS-682 .

Comment 7 Mike Fiedler 2017-07-06 14:01:44 UTC
This issue still exists on hawkular-metrics 3.6.136.

Moving this back to ASSIGNED for mwringe until a new image is build with the upstream fix.

Comment 8 John Sanda 2017-07-06 14:36:38 UTC
(In reply to Mike Fiedler from comment #7)
> This issue still exists on hawkular-metrics 3.6.136.
> 
> Moving this back to ASSIGNED for mwringe until a new image is build with the
> upstream fix.

I am not sure if 3.6.136 has the necessary changes. The changes went into Hawkular Metrics 0.27.1 which was published into JBoss nexus repo on Monday afternoon.

Comment 11 Junqi Zhao 2017-07-10 09:18:07 UTC
Tested with the latest images(v3.6.140-1), use nfs pv, did not find NPE in hawkular-cassandra pod logs.

@Mike, please see my inventory file, we don't use [oo_first_master] now, and we usually deploy metrics by the following commands:

# cd /usr/share/ansible/openshift-ansible/
# ansible-playbook -vvv -i ${INVENTORY_FILE}   playbooks/byo/openshift-cluster/openshift-metrics.yml

**************************************************************************
Images from brew
metrics-hawkular-metrics   v3.6.140-1          3a5bebd0476a        2 hours ago         1.293 GB
metrics-cassandra          v3.6.140-1          9644ec21e399        2 hours ago         573.2 MB
metrics-heapster           v3.6.140-1          5549c67d8607        2 hours ago         274.4 MB

Inventory file:

[OSEv3:children]
masters

[masters]
${MASTER} openshift_public_hostname=${MASTER}

[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file="~/libra.pem"
deployment_type=openshift-enterprise


# Metrics
openshift_metrics_install_metrics=true
openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN}
openshift_metrics_project=openshift-infra
openshift_metrics_image_prefix=${IMAGE_PREFIX}
openshift_metrics_image_version=v3.6
openshift_metrics_cassandra_replicas=1
openshift_metrics_hawkular_replicas=1
openshift_metrics_cassandra_storage_type=pv
openshift_metrics_cassandra_pvc_size=10Gi

Comment 12 Junqi Zhao 2017-07-10 09:25:12 UTC
Created attachment 1295746 [details]
Issue is fixed, hawkular metrics pod log

Comment 14 errata-xmlrpc 2017-08-10 05:28:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.