Bug 2097685 - Ironic-agent can't restart because of existing container
Summary: Ironic-agent can't restart because of existing container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Riccardo Pittau
QA Contact: Eldar Weiss
URL:
Whiteboard:
Depends On:
Blocks: 2098627
TreeView+ depends on / blocked
 
Reported: 2022-06-16 10:06 UTC by Sagi Shnaidman
Modified: 2022-08-10 11:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2098627 (view as bug list)
Environment:
Last Closed: 2022-08-10 11:18:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift image-customization-controller pull 55 0 None open Bug 2097685: Remove the container at exit 2022-06-16 11:33:12 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:18:38 UTC

Description Sagi Shnaidman 2022-06-16 10:06:16 UTC
Description of problem:

I think we have a bug with Ironic agent restarting on the node, it might be not removing the old container and failing to restart.
I think it's in continuation of https://github.com/openshift/image-customization-controller/pull/34

Log:Jun 16 09:09:48 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211421]: b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba
Jun 16 09:09:48 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Started Ironic Agent.
Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211472]: Error: error creating container storage: the container name "ironic-agent" is already in use by "6d87213ed9c456f20ae70dee343caf377dc0ba217a63f4a2e9d2b13a9faa06ca". You have to remove that container to be able to reuse that name.: that name is already in use
Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 16 09:09:49 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=5s expired, scheduling restart.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 11096.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting Ironic Agent...
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2e56178af2677f1e825a1ede3d8ea80f34d6a89d203f083a5840ebf6abd3e17...
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Getting image source signatures
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:21a86dbf0e5a8583d4e1818a201dc0fc18e9ae20a1b98a71e43f9b60fc543466
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Copying config sha256:b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Writing manifest to image destination
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: Storing signatures
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211529]: b619b974e5cb4330c628c8ee5f2bb3dadd8ef8cf9e56e48e300f7c6f8120d5ba
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Started Ironic Agent.
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com podman[1211581]: Error: error creating container storage: the container name "ironic-agent" is already in use by "6d87213ed9c456f20ae70dee343caf377dc0ba217a63f4a2e9d2b13a9faa06ca". You have to remove that container to be able to reuse that name.: that name is already in use
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 16 09:09:54 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=5s expired, scheduling restart.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 11097.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 16 09:09:59 cnfdf05.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting Ironic Agent...

And container:

$ sudo podman ps -a
CONTAINER ID  IMAGE                                                                                                                   COMMAND     CREATED       STATUS                   PORTS       NAMES
6d87213ed9c4  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2e56178af2677f1e825a1ede3d8ea80f34d6a89d203f083a5840ebf6abd3e17              16 hours ago  Exited (1) 16 hours ago              ironic-agent


Version-Release number of selected component (if applicable):
4.11 nightly

How reproducible:

always

Steps to Reproduce:
1. start introspection, fail it and it will continue to restart


Actual results:
can't restart ironic-agent

Expected results:
ironic-agent restarts sucessfully

Comment 5 Eldar Weiss 2022-06-29 14:22:01 UTC
Issue verified:
(.venv) [kni@provisionhost-0-0 ocp-edge-auto]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-132614   True        False         59m     Cluster version is 4.11.0-0.nightly-2022-06-25-132614

How verified:
1) Manually scaled up worker openshift-worker-0-2/
2) Watched the logs for introspectiion and stopped process
3) adding metal3 logs here.
4) ironic continued as you can see from logs
no leftover container exited

(.venv) [kni@provisionhost-0-0 ocp-edge-auto]$ sudo podman ps --all
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES

Comment 8 errata-xmlrpc 2022-08-10 11:18:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.