Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1975951

Summary: sg-core crashing under high log volume
Product: Service Telemetry Framework Reporter: Chris Sibbitt <csibbitt>
Component: sg-core-containerAssignee: Martin Magr <mmagr>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: low Docs Contact: Joanne O'Flynn <joflynn>
Priority: low    
Version: 1.3CC: lmadsen
Target Milestone: GAKeywords: Triaged
Target Release: 1.4 (STF)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sg-core-container-4.1.0-1 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-26 17:01:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Sibbitt 2021-06-24 19:44:59 UTC
Description of problem:

When sending large batches (100k lines+) of logs to STF via the rsyslog amqp plugin, the sg-core will crash, causing the loss of the majority of the log stream


Version-Release number of selected component (if applicable):

stf-1.3


How reproducible:

Very


Steps to Reproduce:

1. Set up STF server side

2. Set up an intermediate QDR without TLS since rsyslog omamqp doesn't support TLS[1] but STF requires it (like https://github.com/infrawatch/service-telemetry-operator/blob/9cd36d9c8df810efc475a73c7d10d8211e09d73c/tests/performance-test/logging-events/deploy/qdrouterd.yaml )
[1] https://www.rsyslog.com/doc/master/configuration/modules/omamqp1.html#todo

3. Set up rsyslog to read from an input file and output to amqp (like https://github.com/infrawatch/sg-core/blob/f5c2bb6641e9f2a2b032d85c29f4b3ff199ef7a5/ci/service_configs/rsyslog/rsyslog_config.conf )

4. Set up a logging smart gateway (like https://github.com/infrawatch/service-telemetry-operator/blob/e6d2ddd3f9b19cf05310f23d1731fc0b07cd59d5/tests/performance-test/logging-events/deploy/logging-sg.yaml )

5. Confirm that logs flow into the bridge (like `oc logs -f perftest-cloud1-rsyslog-event-smartgateway-6854d5484d-9tzcr -c bridge`) by writing a few lines to the rsyslog input file

6. Start watching the sg-core logs (like `oc logs -f perftest-cloud1-rsyslog-event-smartgateway-6854d5484d-9tzcr -c sg-core`)

7. Write a couple million log lines to the rsyslog input file


Actual results:

After about 1M logs, sg-core crashes with this log:

2021-06-24 16:53:00 [INFO] storing events and(or) logs to Elasticsearch. [plugin: elasticsearch, url: https://elasticsearch-es-http.service-telemetry.svc.cluster.local:9200]
2021-06-24 16:53:00 [INFO] socket listening on /tmp/smartgateway [plugin: socket]

panic: interface conversion: interface {} is nil, not []string

goroutine 175109 [running]:
github.com/infrawatch/sg-core/plugins/application/elasticsearch.(*Elasticsearch).ReceiveEvent(0xc0002d1110, 0xc00138cab0, 0x21, 0x41d8352e1bc00000, 0x2, 0xc0013f6280, 0xf, 0x2, 0xc0013b3260, 0x0, ...)
        /go/src/github.com/infrawatch/sg-core/plugins/application/elasticsearch/main.go:96 +0x87f
github.com/infrawatch/sg-core/pkg/bus.(*EventBus).Publish.func1(0xc00138cab0, 0x21, 0x41d8352e1bc00000, 0x2, 0xc0013f6280, 0xf, 0x2, 0xc0013b3260, 0x0, 0xc001390ca0, ...)
        /go/src/github.com/infrawatch/sg-core/pkg/bus/bus.go:35 +0x62
created by github.com/infrawatch/sg-core/pkg/bus.(*EventBus).Publish
        /go/src/github.com/infrawatch/sg-core/pkg/bus/bus.go:34 +0xbd


Expected results:

No crashing!

Comment 1 Leif Madsen 2021-09-15 21:16:46 UTC
Moving this to STF 1.4 since logging support is not part of STF 1.3.

Comment 2 Martin Magr 2021-09-27 15:28:08 UTC
Merged upstream, waiting for build.

Comment 10 errata-xmlrpc 2022-01-26 17:01:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Service Telemetry Framework 1.4.0 - Container Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0298