Bug 2178590 - Sensubility stops working after updating overcloud. error: amqp:internal-error EOF (connection aborted)
Summary: Sensubility stops working after updating overcloud. error: amqp:internal-erro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: collectd-sensubility
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 17.1
Assignee: Martin Magr
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-15 10:55 UTC by Leonid Natapov
Modified: 2023-08-16 01:15 UTC (History)
3 users (show)

Fixed In Version: collectd-sensubility-0.2.0-1.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:14:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-23102 0 None None None 2023-03-15 10:56:33 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:15:01 UTC

Description Leonid Natapov 2023-03-15 10:55:01 UTC
Sensubility stops working after updating overcloud. error: amqp:internal-error after updating overcloud

Scenario.

1.Freshly deployed OSP17.1  - sensubility is working without issues.

2.STF host has been updated in stf-connectors template and overcloud update was performed

3.After update successfully finished I can see that metrics_qdr container was restarted.

4. collectd container was not restarted.

5. sensubility logs in collectd container show the following error and Grafana doesn't show APIs uptime status

6. After restarting collectd container everything backs to normal.

[WARN] Failed to create AMQP1.0 sender on given connection, skipping processing message [connection: amqp://172.17.1.69:5666, message: {sensubility/osp17-telemetry {"labels":{"check":"check-container-health","client":"controller-0.redhat.local","severity":"OKAY"},"annotations":{"command":"/scripts/collectd_check_health.py","duration":6.826215605,"executed":1678709937,"issued":1678709937,"output":"[{\"service\": \"container-puppet-redis\", \"container\": \"1ace8dcd16f23937b79a34794ff88bd07117d992b805eea053c40f32b57039a2\", \"status\": \"stopped\", \"healthy\": 0}, {\"service\": \"gnocchi_init_lib\", \"container\": \"997b259cad2bbd4c0de55a185f7983e7df72c6a23bb216b46fba5805ea95a175\", \"status\": \"stopped\", \"healthy\": 0}, {\"service\": \"container-puppet-ceilometer\",
.
.
.
56f52cf30536edbb6c5200ad24a383ff91dd6448d860324d0890\\\", \\\"status\\\": \\\"healthy\\\", \\\"healthy\\\": 1}, {\\\"service\\\": \\\"nova_wait_for_api_service\\\", \\\"container\\\": \\\"af63e9165a7f6889468b9e9ae091c5cab4c36363ebaaee18104b00373164803c\\\", \\\"status\\\": \\\"stopped\\\", \\\"healthy\\\": 0}]\\n\",\"status\":\"0\"}}}"},"startsAt":"2023-03-13T12:19:04Z"} []}, error: amqp:internal-error: EOF (connection aborted)]
~~~

Comment 3 Leonid Natapov 2023-03-15 14:21:45 UTC
The sensubility version is collectd-sensubility-0.1.9-1.el9ost.x86_64

Comment 13 Leonid Natapov 2023-05-30 10:57:13 UTC
Failed QE. After overcloud update sensubility stops working.
[2023-05-29 06:45:33] amqp1 plugin: PN_TRANSPORT_CLOSED: amqp:connection:framing-error: connection aborted
[2023-05-29 06:45:34] amqp1 plugin: PN_TRANSPORT_CLOSED: proton:io: Connection refused - disconnected 172.17.1.45:5666

Comment 15 Matthias Runge 2023-06-06 09:21:47 UTC
moving this to z1, a workaround exists (restart collectd)

Comment 16 Martin Magr 2023-06-08 11:57:09 UTC
I believe that the compose you used for testing did not have latest sensubility code pulled it. The log message does not come from the latest version of the service.

Comment 17 Leonid Natapov 2023-06-08 11:57:41 UTC
Tested with the latest OSP17.1 compose RHOS-17.1-RHEL-9-20230607.n.0

Updated overcloud and after update sensubility continued to work e.g. send containers health status to the STF.

Comment 26 errata-xmlrpc 2023-08-16 01:14:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.