Description of problem: I think we have a bug with Ironic agent restarting on the node, it might be not removing the old container and failing to restart. I think it's in continuation of https://github.com/openshift/image-customization-controller/pull/34 Log:Jun 16 09:09:48 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211421]: b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba Jun 16 09:09:48 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Started Ironic Agent. Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211472]: Error: error creating container storage: the container name "ironic-agent" is already in use by "6d87213ed9c456f20ae70dee343caf377dc0ba217a63f4a2e9d2b13a9faa06ca". You have to remove that container to be able to reuse that name.: that name is already in use Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'. Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=5s expired, scheduling restart. Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 11096. Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped Ironic Agent. Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting Ironic Agent... Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2e56178af2677f1e825a1ede3d8ea80f34d6a89d203f083a5840ebf6abd3e17... Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Getting image source signatures Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:21a86dbf0e5a8583d4e1818a201dc0fc18e9ae20a1b98a71e43f9b60fc543466 Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652 Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720 Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying config sha256:b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Writing manifest to image destination Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Storing signatures Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Started Ironic Agent. Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211581]: Error: error creating container storage: the container name "ironic-agent" is already in use by "6d87213ed9c456f20ae70dee343caf377dc0ba217a63f4a2e9d2b13a9faa06ca". You have to remove that container to be able to reuse that name.: that name is already in use Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'. Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=5s expired, scheduling restart. Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 11097. Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped Ironic Agent. Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting Ironic Agent... And container: $ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6d87213ed9c4 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2e56178af2677f1e825a1ede3d8ea80f34d6a89d203f083a5840ebf6abd3e17 16 hours ago Exited (1) 16 hours ago ironic-agent Version-Release number of selected component (if applicable): 4.11 nightly How reproducible: always Steps to Reproduce: 1. start introspection, fail it and it will continue to restart Actual results: can't restart ironic-agent Expected results: ironic-agent restarts sucessfully
Issue verified: (.venv) [kni@provisionhost-0-0 ocp-edge-auto]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-25-132614 True False 59m Cluster version is 4.11.0-0.nightly-2022-06-25-132614 How verified: 1) Manually scaled up worker openshift-worker-0-2/ 2) Watched the logs for introspectiion and stopped process 3) adding metal3 logs here. 4) ironic continued as you can see from logs no leftover container exited (.venv) [kni@provisionhost-0-0 ocp-edge-auto]$ sudo podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069