Bug 2051615 - [STF 1.4] sg-core fails handling some messages due to some invalid escape char
Summary: [STF 1.4] sg-core fails handling some messages due to some invalid escape char
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Service Telemetry Framework
Classification: Red Hat
Component: sg-core-container
Version: 1.4
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: z1
: 1.4 (STF)
Assignee: Martin Magr
QA Contact: Leonid Natapov
Joanne O'Flynn
URL:
Whiteboard:
Depends On: 2016460
Blocks: 2053681
TreeView+ depends on / blocked
 
Reported: 2022-02-07 15:51 UTC by Leif Madsen
Modified: 2022-02-21 13:50 UTC (History)
5 users (show)

Fixed In Version: sg-core-container-4.1.1-1
Doc Type: Bug Fix
Doc Text:
In some cases, Ceilometer metrics were not handled properly by sg-core. This resulted in some Ceilometer metrics not being stored in Prometheus. In this release, the processing of metrics has been enhanced to be more robust. While the sg-core has been enhanced to support larger messages from Ceilometer, an additional change is required to support passing the larger messages through the sg-bridge ring buffer. The changes required to fully support this functionality is being tracked in RHBZ#2053681.
Clone Of: 2016460
Environment:
Last Closed: 2022-02-21 13:50:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github infrawatch sg-core pull 83 0 None Merged Increase reading buffer size on large messages (#74) 2022-02-07 15:56:49 UTC
Red Hat Issue Tracker STF-967 0 None None None 2022-02-07 15:54:12 UTC
Red Hat Product Errata RHSA-2022:0585 0 None None None 2022-02-21 13:50:55 UTC

Description Leif Madsen 2022-02-07 15:51:05 UTC
+++ This bug was initially created as a clone of Bug #2016460 +++

Description of problem:
STF 1.3 configured to monitor multiple OSP 16 clouds with out-of-the-box configuration (i.e. by following the official documentation [1]).

The container sg-core of the ceil-meter Smart Gateway fails on regularly on incoming messages with the following errors:

> $ oc logs -f default-tst-ceil-meter-smartgateway-5698bb44dc-4z4vs
> [...]
> 2021-10-21 08:45:20 [DEBUG] failed handling message [error: ceilometer.OsloSchema.Request: OsloMessage: readEscapedChar: invalid escape char after \, error found in #10 byte of ...|ephemeral\|..., bigger context ...|us\": 1, \"ram\": 1024, \"disk\": 40, \"ephemeral\|..., handler: ceilometer-metrics[socket]]
> 2021-10-21 08:45:20 [DEBUG] failed handling message [error: ceilometer.OsloSchema.Request: OsloMessage: readStringSlowPath: unexpected end of input, error found in #10 byte of ...|"vcpus\": |..., bigger context ...|": \"11\", \"name\": \"std.cpu1ram1\", \"vcpus\": |..., handler: ceilometer-metrics[socket]]
> [...]

Full log output is attached, with "dumpMessages" enabled in the SG configuration for increased verbosity.


Actual results:
Not exhaustive, but what has been observed so far:
- some metrics (e.g. cpu_ceilometer) are missing for some overcloud compute nodes in Prometheus/Grafana, resulting in some dashboards (e.g. Virtual Machine dashboard) to work partially (incomplete lists of projects and VMs).

Expected results:
All the metrics/events of all the overcloud compute nodes can be seen in Prometheus/Grafana.

Comment 8 Leif Madsen 2022-02-11 18:41:26 UTC
Verified this is working. Depends on changes tracked in RHBZ#2053681.

Comment 12 errata-xmlrpc 2022-02-21 13:50:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Service Telemetry Framework 1.4 (sg-core-container) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0585


Note You need to log in before you can comment on or make changes to this bug.