Bug 2098627 - Ironic-agent can't restart because of existing container
Summary: Ironic-agent can't restart because of existing container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.z
Assignee: Riccardo Pittau
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On: 2097685
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-20 08:30 UTC by Riccardo Pittau
Modified: 2022-07-20 07:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2097685
Environment:
Last Closed: 2022-07-20 07:46:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift image-customization-controller pull 56 0 None open Bug 2098627: Remove the container at exit 2022-06-22 09:50:34 UTC
Red Hat Product Errata RHBA-2022:5568 0 None None None 2022-07-20 07:46:33 UTC

Description Riccardo Pittau 2022-06-20 08:30:23 UTC
+++ This bug was initially created as a clone of Bug #2097685 +++

Description of problem:

I think we have a bug with Ironic agent restarting on the node, it might be not removing the old container and failing to restart.
I think it's in continuation of https://github.com/openshift/image-customization-controller/pull/34

Log:Jun 16 09:09:48 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211421]: b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba
Jun 16 09:09:48 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Started Ironic Agent.
Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211472]: Error: error creating container storage: the container name "ironic-agent" is already in use by "6d87213ed9c456f20ae70dee343caf377dc0ba217a63f4a2e9d2b13a9faa06ca". You have to remove that container to be able to reuse that name.: that name is already in use
Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=5s expired, scheduling restart.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 11096.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting Ironic Agent...
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2e56178af2677f1e825a1ede3d8ea80f34d6a89d203f083a5840ebf6abd3e17...
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Getting image source signatures
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:21a86dbf0e5a8583d4e1818a201dc0fc18e9ae20a1b98a71e43f9b60fc543466
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying config sha256:b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Writing manifest to image destination
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Storing signatures
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Started Ironic Agent.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211581]: Error: error creating container storage: the container name "ironic-agent" is already in use by "6d87213ed9c456f20ae70dee343caf377dc0ba217a63f4a2e9d2b13a9faa06ca". You have to remove that container to be able to reuse that name.: that name is already in use
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=5s expired, scheduling restart.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 11097.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting Ironic Agent...

And container:

$ sudo podman ps -a
CONTAINER ID  IMAGE                                                                                                                   COMMAND     CREATED       STATUS                   PORTS       NAMES
6d87213ed9c4  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2e56178af2677f1e825a1ede3d8ea80f34d6a89d203f083a5840ebf6abd3e17              16 hours ago  Exited (1) 16 hours ago              ironic-agent


Version-Release number of selected component (if applicable):
4.11 nightly

How reproducible:

always

Steps to Reproduce:
1. start introspection, fail it and it will continue to restart


Actual results:
can't restart ironic-agent

Expected results:
ironic-agent restarts sucessfully

--- Additional comment from Riccardo Pittau on 2022-06-16 11:28:17 UTC ---

this was discussed on slack https://coreos.slack.com/archives/CFP6ST0A3/p1655371117596509

--- Additional comment from OpenShift Automated Release Tooling on 2022-06-17 20:11:01 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.
This bug is expected to ship in the next 4.11 release.

Comment 5 Eldar Weiss 2022-07-12 09:50:59 UTC
Issue seems to be resolved
(.venv) [kni@titan52 ocp-edge-auto]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-06-08-150219   True        False         25h     Cluster version is 4.10.0-0.nightly-2022-06-08-150219
(.venv) [kni@titan52 ocp-edge-auto]$ oc ^C
(.venv) [kni@titan52 ocp-edge-auto]$ sudo podman ps -a
CONTAINER ID  IMAGE                               COMMAND               CREATED       STATUS           PORTS   NAMES
d25a3e5af285  quay.io/ocp-edge-qe/nexus3:latest6  sh -c ${SONATYPE_...  26 hours ago  Up 26 hours ago          registry
ec4e821235dd  quay.io/ocp-edge-qe/httpd:latest    httpd-foreground      26 hours ago  Up 26 hours ago          image-cache


Ironic log inspection added.

Comment 9 errata-xmlrpc 2022-07-20 07:46:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.23 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5568


Note You need to log in before you can comment on or make changes to this bug.