Sensubility stops working after updating overcloud. error: amqp:internal-error after updating overcloud Scenario. 1.Freshly deployed OSP17.1 - sensubility is working without issues. 2.STF host has been updated in stf-connectors template and overcloud update was performed 3.After update successfully finished I can see that metrics_qdr container was restarted. 4. collectd container was not restarted. 5. sensubility logs in collectd container show the following error and Grafana doesn't show APIs uptime status 6. After restarting collectd container everything backs to normal. [WARN] Failed to create AMQP1.0 sender on given connection, skipping processing message [connection: amqp://172.17.1.69:5666, message: {sensubility/osp17-telemetry {"labels":{"check":"check-container-health","client":"controller-0.redhat.local","severity":"OKAY"},"annotations":{"command":"/scripts/collectd_check_health.py","duration":6.826215605,"executed":1678709937,"issued":1678709937,"output":"[{\"service\": \"container-puppet-redis\", \"container\": \"1ace8dcd16f23937b79a34794ff88bd07117d992b805eea053c40f32b57039a2\", \"status\": \"stopped\", \"healthy\": 0}, {\"service\": \"gnocchi_init_lib\", \"container\": \"997b259cad2bbd4c0de55a185f7983e7df72c6a23bb216b46fba5805ea95a175\", \"status\": \"stopped\", \"healthy\": 0}, {\"service\": \"container-puppet-ceilometer\", . . . 56f52cf30536edbb6c5200ad24a383ff91dd6448d860324d0890\\\", \\\"status\\\": \\\"healthy\\\", \\\"healthy\\\": 1}, {\\\"service\\\": \\\"nova_wait_for_api_service\\\", \\\"container\\\": \\\"af63e9165a7f6889468b9e9ae091c5cab4c36363ebaaee18104b00373164803c\\\", \\\"status\\\": \\\"stopped\\\", \\\"healthy\\\": 0}]\\n\",\"status\":\"0\"}}}"},"startsAt":"2023-03-13T12:19:04Z"} []}, error: amqp:internal-error: EOF (connection aborted)] ~~~
The sensubility version is collectd-sensubility-0.1.9-1.el9ost.x86_64
Failed QE. After overcloud update sensubility stops working. [2023-05-29 06:45:33] amqp1 plugin: PN_TRANSPORT_CLOSED: amqp:connection:framing-error: connection aborted [2023-05-29 06:45:34] amqp1 plugin: PN_TRANSPORT_CLOSED: proton:io: Connection refused - disconnected 172.17.1.45:5666
moving this to z1, a workaround exists (restart collectd)
I believe that the compose you used for testing did not have latest sensubility code pulled it. The log message does not come from the latest version of the service.
Tested with the latest OSP17.1 compose RHOS-17.1-RHEL-9-20230607.n.0 Updated overcloud and after update sensubility continued to work e.g. send containers health status to the STF.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577