Bug 1939691

Summary: Smart Gateway can crash when Ceilometer metrics are received
Product: Service Telemetry Framework Reporter: Leif Madsen <lmadsen>
Component: smart-gateway-containerAssignee: Leif Madsen <lmadsen>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.2CC: alolivei, jlema, joflynn, kahara, knoha, mfuruta, pkilambi, pleimer, rhayakaw, spower, tsabatuc
Target Milestone: z1Keywords: Triaged
Target Release: 1.2 (STF)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: smart-gateway-operator-bundle-container-2.2.3-3 smart-gateway-container-2.1.3-6 Doc Type: Bug Fix
Doc Text:
Before this update, Smart Gateway crashed if a nil or unexpected type value existed at one of the keys because of the type cast assumption that the incoming data was the correct type. With this update, Smart Gateway processes data points if they are of type 'string', and no longer crashes if the incoming data is an invalid type.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-22 18:43:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1944496    

Description Leif Madsen 2021-03-16 20:06:33 UTC
Description of problem:

When certain ceilometer metrics are received by smart-gateway, it can crash due to an invalid value for the golang interface.

Version-Release number of selected component (if applicable):

stf/smart-gateway-rhel8:2.1.1

Additional info:

$ oc logs default-cloud1-ceil-meter-smartgateway-58458c896b-bgrrs
2021/03/16 18:55:44 AMQP1.0 ceilometer listen address configured at default-interconnect.service-telemetry.svc.cluster.local:5673/anycast/ceilometer/metering.sample
2021/03/16 18:55:44 Metric server at : 0.0.0.0:8081
2021/03/16 18:55:46 HTTP server is ready....
2021/03/16 18:55:46 Listening for AMQP1.0 messages
panic: interface conversion: interface {} is nil, not string
goroutine 86 [running]:
github.com/infrawatch/smart-gateway/internal/pkg/metrics/incoming.(*CeilometerMetric).GetLabels(0xc0002b0120, 0xc00002a560)
/remote-source/app/internal/pkg/metrics/incoming/ceilometer.go:210 +0x652
github.com/infrawatch/smart-gateway/internal/pkg/tsdb.NewPrometheusMetric(0xc0002b0101, 0xb6bb67, 0xa, 0xc55880, 0xc0002b0120, 0x0, 0xaec040, 0x1, 0x7f82ae5d8328, 0xc0000afda0)
/remote-source/app/internal/pkg/tsdb/prometheus.go:111 +0x7b2
github.com/infrawatch/smart-gateway/internal/pkg/cacheutil.(*ShardedIncomingDataCache).FlushPrometheusMetric(0xc00000ef60, 0x1, 0xc000070060, 0x0)
/remote-source/app/internal/pkg/cacheutil/processcache.go:38 +0x164
github.com/infrawatch/smart-gateway/internal/pkg/metrics.(*cacheHandler).Collect(0xc000098500, 0xc000070060)
/remote-source/app/internal/pkg/metrics/metrics.go:66 +0x25f
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.0/prometheus/registry.go:430 +0x19d
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.0/prometheus/registry.go:522 +0xe36

Comment 1 Paul Leimer 2021-03-16 22:21:43 UTC
PR is up

Comment 4 Jose Castillo Lema 2021-03-19 00:34:47 UTC
The hot fix deployment worked perfectly.
For the record, the only difference from the hot fix proposed process was:
 - instead of smart-gateway-operator.v2.2.1 in our environment we have mart-gateway-operator.v2.1.2

Thanks a lot

Comment 10 Paul Leimer 2021-03-31 15:25:53 UTC
Note: this must be released for stf 1.1

Comment 21 errata-xmlrpc 2021-04-22 18:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Service Telemetry Framework - Container Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1340