Description of problem: There are rabbit error logs in nova-metadata-api, but without apparent issues to the overcloud. Are they related to https://bugzilla.redhat.com/show_bug.cgi?id=1913177 ? Version-Release number of selected component (if applicable): OSP16.1.6 How reproducible: Multiple times per hour Steps to Reproduce: Seeing this in all three seeing environments Actual results: ~~~ nova/nova-metadata-api.log.1:2021-07-07 08:49:47.619 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 08:53:52.088 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 09:01:57.128 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 09:08:03.305 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 09:10:04.573 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 09:46:32.235 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 10:02:45.992 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 10:08:49.377 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 10:12:53.623 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 10:35:09.030 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 11:33:52.701 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 12:20:29.037 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 12:40:44.477 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 12:42:50.177 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 13:31:24.052 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 14:20:04.552 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 14:24:51.283 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 14:24:51.485 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 14:34:11.725 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 14:40:16.804 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 15:14:43.404 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 15:24:51.444 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 15:49:12.133 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 16:01:22.093 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 17:16:17.011 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 17:19:51.796 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 17:20:19.899 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 17:46:40.682 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 17:51:11.308 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 18:17:04.838 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 18:23:08.921 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 18:27:12.145 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 18:27:12.685 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 18:31:15.022 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 18:37:20.045 29 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 19:15:50.332 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 104] Connection reset by peer (retrying in 0 seconds): ConnectionResetError: [Errno 104] Connection reset by peer nova/nova-metadata-api.log.1:2021-07-07 19:23:56.820 29 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer ~~~ Expected results: No errors Additional info:
i have not reviewed the sosreports yet but this is expected behavior and i suspect its not a real bug likely this is nust the nova side of https://bugzilla.redhat.com/show_bug.cgi?id=1890037 the reason this happens is explained here https://bugzilla.redhat.com/show_bug.cgi?id=1913177#c6 effectively the liftime of the nova api an nova metadata api processes are managed by the wsgi server under which they are run. in this case mod_wsgi in the apache process in the nova-metadata-api container. one possible way to work around this is to enable running the heartbeat in a real pthread not an eventlet green tread ill bring this up in our triage call tomorrow but i suspect we will close this as a duplicate. or one of the proceeding bugs. the conection reset and reconnects are real and unlike the heart beat message we should not suppress this IMO as the loggign and connection lifetime is working as expected given the constraits imposed by runing under apache mod_wsgi.
reviewing there sosreports i can confirm they have #heartbeat_in_pthread=false im going to close this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1890037 they can optionally set heartbeat_in_pthread=true in the nova.conf in there contoler nodes to escape the life cycle model of the apache server and ensure the heartbeat is not terminated. this will reduce or elimiate the logs messages as the connection should not reset unless there is a tempory network issue. *** This bug has been marked as a duplicate of bug 1890037 ***