Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2178590

Summary: Sensubility stops working after updating overcloud. error: amqp:internal-error EOF (connection aborted)
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: collectd-sensubilityAssignee: Martin Magr <mmagr>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: dhughes, mmagr, mrunge
Target Milestone: gaKeywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: collectd-sensubility-0.2.0-1.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:14:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Leonid Natapov 2023-03-15 10:55:01 UTC
Sensubility stops working after updating overcloud. error: amqp:internal-error after updating overcloud

Scenario.

1.Freshly deployed OSP17.1  - sensubility is working without issues.

2.STF host has been updated in stf-connectors template and overcloud update was performed

3.After update successfully finished I can see that metrics_qdr container was restarted.

4. collectd container was not restarted.

5. sensubility logs in collectd container show the following error and Grafana doesn't show APIs uptime status

6. After restarting collectd container everything backs to normal.

[WARN] Failed to create AMQP1.0 sender on given connection, skipping processing message [connection: amqp://172.17.1.69:5666, message: {sensubility/osp17-telemetry {"labels":{"check":"check-container-health","client":"controller-0.redhat.local","severity":"OKAY"},"annotations":{"command":"/scripts/collectd_check_health.py","duration":6.826215605,"executed":1678709937,"issued":1678709937,"output":"[{\"service\": \"container-puppet-redis\", \"container\": \"1ace8dcd16f23937b79a34794ff88bd07117d992b805eea053c40f32b57039a2\", \"status\": \"stopped\", \"healthy\": 0}, {\"service\": \"gnocchi_init_lib\", \"container\": \"997b259cad2bbd4c0de55a185f7983e7df72c6a23bb216b46fba5805ea95a175\", \"status\": \"stopped\", \"healthy\": 0}, {\"service\": \"container-puppet-ceilometer\",
.
.
.
56f52cf30536edbb6c5200ad24a383ff91dd6448d860324d0890\\\", \\\"status\\\": \\\"healthy\\\", \\\"healthy\\\": 1}, {\\\"service\\\": \\\"nova_wait_for_api_service\\\", \\\"container\\\": \\\"af63e9165a7f6889468b9e9ae091c5cab4c36363ebaaee18104b00373164803c\\\", \\\"status\\\": \\\"stopped\\\", \\\"healthy\\\": 0}]\\n\",\"status\":\"0\"}}}"},"startsAt":"2023-03-13T12:19:04Z"} []}, error: amqp:internal-error: EOF (connection aborted)]
~~~

Comment 3 Leonid Natapov 2023-03-15 14:21:45 UTC
The sensubility version is collectd-sensubility-0.1.9-1.el9ost.x86_64

Comment 13 Leonid Natapov 2023-05-30 10:57:13 UTC
Failed QE. After overcloud update sensubility stops working.
[2023-05-29 06:45:33] amqp1 plugin: PN_TRANSPORT_CLOSED: amqp:connection:framing-error: connection aborted
[2023-05-29 06:45:34] amqp1 plugin: PN_TRANSPORT_CLOSED: proton:io: Connection refused - disconnected 172.17.1.45:5666

Comment 15 Matthias Runge 2023-06-06 09:21:47 UTC
moving this to z1, a workaround exists (restart collectd)

Comment 16 Martin Magr 2023-06-08 11:57:09 UTC
I believe that the compose you used for testing did not have latest sensubility code pulled it. The log message does not come from the latest version of the service.

Comment 17 Leonid Natapov 2023-06-08 11:57:41 UTC
Tested with the latest OSP17.1 compose RHOS-17.1-RHEL-9-20230607.n.0

Updated overcloud and after update sensubility continued to work e.g. send containers health status to the STF.

Comment 26 errata-xmlrpc 2023-08-16 01:14:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577