Bug 1939691 - Smart Gateway can crash when Ceilometer metrics are received
Summary: Smart Gateway can crash when Ceilometer metrics are received
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Service Telemetry Framework
Classification: Red Hat
Component: smart-gateway-container
Version: 1.2
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z1
: 1.2 (STF)
Assignee: Leif Madsen
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks: 1944496
TreeView+ depends on / blocked
 
Reported: 2021-03-16 20:06 UTC by Leif Madsen
Modified: 2021-05-16 13:57 UTC (History)
11 users (show)

Fixed In Version: smart-gateway-operator-bundle-container-2.2.3-3 smart-gateway-container-2.1.3-6
Doc Type: Bug Fix
Doc Text:
Before this update, Smart Gateway crashed if a nil or unexpected type value existed at one of the keys because of the type cast assumption that the incoming data was the correct type. With this update, Smart Gateway processes data points if they are of type 'string', and no longer crashes if the incoming data is an invalid type.
Clone Of:
Environment:
Last Closed: 2021-04-22 18:43:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github infrawatch smart-gateway pull 100 0 None open Ceilometer metric panic on label retreival hotfix 2021-03-16 22:20:27 UTC
Red Hat Product Errata RHBA-2021:1340 0 None None None 2021-04-22 18:43:37 UTC

Internal Links: 1944496

Description Leif Madsen 2021-03-16 20:06:33 UTC
Description of problem:

When certain ceilometer metrics are received by smart-gateway, it can crash due to an invalid value for the golang interface.

Version-Release number of selected component (if applicable):

stf/smart-gateway-rhel8:2.1.1

Additional info:

$ oc logs default-cloud1-ceil-meter-smartgateway-58458c896b-bgrrs
2021/03/16 18:55:44 AMQP1.0 ceilometer listen address configured at default-interconnect.service-telemetry.svc.cluster.local:5673/anycast/ceilometer/metering.sample
2021/03/16 18:55:44 Metric server at : 0.0.0.0:8081
2021/03/16 18:55:46 HTTP server is ready....
2021/03/16 18:55:46 Listening for AMQP1.0 messages
panic: interface conversion: interface {} is nil, not string
goroutine 86 [running]:
github.com/infrawatch/smart-gateway/internal/pkg/metrics/incoming.(*CeilometerMetric).GetLabels(0xc0002b0120, 0xc00002a560)
/remote-source/app/internal/pkg/metrics/incoming/ceilometer.go:210 +0x652
github.com/infrawatch/smart-gateway/internal/pkg/tsdb.NewPrometheusMetric(0xc0002b0101, 0xb6bb67, 0xa, 0xc55880, 0xc0002b0120, 0x0, 0xaec040, 0x1, 0x7f82ae5d8328, 0xc0000afda0)
/remote-source/app/internal/pkg/tsdb/prometheus.go:111 +0x7b2
github.com/infrawatch/smart-gateway/internal/pkg/cacheutil.(*ShardedIncomingDataCache).FlushPrometheusMetric(0xc00000ef60, 0x1, 0xc000070060, 0x0)
/remote-source/app/internal/pkg/cacheutil/processcache.go:38 +0x164
github.com/infrawatch/smart-gateway/internal/pkg/metrics.(*cacheHandler).Collect(0xc000098500, 0xc000070060)
/remote-source/app/internal/pkg/metrics/metrics.go:66 +0x25f
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.0/prometheus/registry.go:430 +0x19d
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.0/prometheus/registry.go:522 +0xe36

Comment 1 Paul Leimer 2021-03-16 22:21:43 UTC
PR is up

Comment 4 Jose Castillo Lema 2021-03-19 00:34:47 UTC
The hot fix deployment worked perfectly.
For the record, the only difference from the hot fix proposed process was:
 - instead of smart-gateway-operator.v2.2.1 in our environment we have mart-gateway-operator.v2.1.2

Thanks a lot

Comment 10 Paul Leimer 2021-03-31 15:25:53 UTC
Note: this must be released for stf 1.1

Comment 21 errata-xmlrpc 2021-04-22 18:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Service Telemetry Framework - Container Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1340


Note You need to log in before you can comment on or make changes to this bug.