Description of problem: Every minute I am seeing the following NPE in the hawkular-metrcs log. Making it difficult to debug other issues if it is not in fact a real error. 2017-06-27 02:05:09,237 ERROR [com.codahale.metrics.ScheduledReporter] (metrics-hawkular-metrics-reporter-1-thread-1) RuntimeException thrown from DropWizardReporter#report. Exception was suppressed.: java.lang.NullPointerException at org.hawkular.metrics.core.dropwizard.MetricNameService.createMetricName(MetricNameService.java:77) at org.hawkular.metrics.core.dropwizard.DropWizardReporter.getMetricId(DropWizardReporter.java:194) at org.hawkular.metrics.core.dropwizard.DropWizardReporter.lambda$report$2(DropWizardReporter.java:103) at java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.lambda$entryConsumer$0(Collections.java:1575) at java.lang.Iterable.forEach(Iterable.java:75) at java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet.forEach(Collections.java:1580) at org.hawkular.metrics.core.dropwizard.DropWizardReporter.report(DropWizardReporter.java:102) at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Version-Release number of selected component (if applicable): hawkular-metrics 3.6.122 registry.ops.openshift.com/openshift3/metrics-hawkular-metrics v3.6.122 992323342eea How reproducible: Always Steps to Reproduce: 1. Deploy metrics (inventory below) 2. oc logs <hawkular-metrics-pod> Actual results: Log is full of NPEs with the above stack Expected results: Clean logs for normal operation Additional info: [oo_first_master] 192.1.0.8 [oo_first_master:vars] openshift_deployment_type=openshift-enterprise openshift_release=v3.6.0 openshift_metrics_install_metrics=true openshift_metrics_hawkular_hostname=hawkular-metrics.0615-yzo.qe.rhcloud.com openshift_metrics_project=openshift-infra openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/ openshift_metrics_image_version=v3.6.122 openshift_metrics_cassandra_replicas=1 openshift_metrics_hawkular_replicas=1 openshift_metrics_cassandra_storage_type=pv openshift_metrics_cassandra_pvc_size=395Gi
I found upstream issue https://issues.jboss.org/browse/HWKMETRICS-577?_sscc=t for another NPE in DropWizardReporter but the stack (and trigger) are different.
Created attachment 1292141 [details] Hawkular-metrics pod log
Lowering priority and bumping release since this does not impact any functionality used in OpenShift, and it can be disabled.
Retargetting for 3.6 because this does impact some new, needed functionality for monitoring the health of hawkular-metrics.
This is fixed upstream in https://issues.jboss.org/browse/HWKMETRICS-682 .
This issue still exists on hawkular-metrics 3.6.136. Moving this back to ASSIGNED for mwringe until a new image is build with the upstream fix.
(In reply to Mike Fiedler from comment #7) > This issue still exists on hawkular-metrics 3.6.136. > > Moving this back to ASSIGNED for mwringe until a new image is build with the > upstream fix. I am not sure if 3.6.136 has the necessary changes. The changes went into Hawkular Metrics 0.27.1 which was published into JBoss nexus repo on Monday afternoon.
Tested with the latest images(v3.6.140-1), use nfs pv, did not find NPE in hawkular-cassandra pod logs. @Mike, please see my inventory file, we don't use [oo_first_master] now, and we usually deploy metrics by the following commands: # cd /usr/share/ansible/openshift-ansible/ # ansible-playbook -vvv -i ${INVENTORY_FILE} playbooks/byo/openshift-cluster/openshift-metrics.yml ************************************************************************** Images from brew metrics-hawkular-metrics v3.6.140-1 3a5bebd0476a 2 hours ago 1.293 GB metrics-cassandra v3.6.140-1 9644ec21e399 2 hours ago 573.2 MB metrics-heapster v3.6.140-1 5549c67d8607 2 hours ago 274.4 MB Inventory file: [OSEv3:children] masters [masters] ${MASTER} openshift_public_hostname=${MASTER} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" deployment_type=openshift-enterprise # Metrics openshift_metrics_install_metrics=true openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN} openshift_metrics_project=openshift-infra openshift_metrics_image_prefix=${IMAGE_PREFIX} openshift_metrics_image_version=v3.6 openshift_metrics_cassandra_replicas=1 openshift_metrics_hawkular_replicas=1 openshift_metrics_cassandra_storage_type=pv openshift_metrics_cassandra_pvc_size=10Gi
Created attachment 1295746 [details] Issue is fixed, hawkular metrics pod log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716