Bug 2089683 - ZTP installation fails in case metal3 deployment moves to another host during the installaiton
Summary: ZTP installation fails in case metal3 deployment moves to another host during...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Dmitry Tantsur
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-24 08:58 UTC by Eran Cohen
Modified: 2022-07-14 10:01 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
This bug means that the ZTP converged flow is experimental and disabled by default. Cause: Since metal3 uses the host IP of the master on which metal3 deployment is running as the IP address of Ironic in case metal3 deployment moves the ironic IP address will change. Consequence: All registered agents are rendered useless since the ironic agent is trying to talk to the old IP address on the node can no longer talk to ironic. Workaround (if any): Result:
Clone Of:
Environment:
Last Closed: 2022-07-13 13:55:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eran Cohen 2022-05-24 08:58:20 UTC
Description of problem:

When the Provisioning Network is not being used (This is the case in ZTP) IPA gets the host IP of the master on which metal3 deployment is running as the IP address of Ironic.

When installing clusters with ZTP, the installation is performed by the assisted-installer-agent (The BMH has a CustomDeploy method: start_assisted_install) the ironic agent waits for the assisted-agent to complete the installation before rebooting the node. 
 
Version-Release number of selected component (if applicable):
4.11

How reproducible:
100

Steps to Reproduce:
1. Start ZTP installation with the converged flow (HUB cluster in version 4.11, and assisted-service master branch)
2. Once the nodes installation started (you can check the Agent status) kill the metal3 pod until it moves to  another node
3. 

Actual results:
Once the assisted-installer-agent finish the ironic agent can no longer talk to Ironic service and the installation fails.

Expected results:
Expected Ironic agent to be able to talk to the ironic service regardless of node the pod is running on.

Additional info:

Comment 2 tali@redhat.com 2022-06-29 14:21:33 UTC
I reproduced this issue today.  I restarted the meta3 pod prior to deploying a spoke cluster, and the spoke cluster stuck in inspecting state.

The meta3 pod moved from master-1 to master-2 after restart. The ironic-python-agent that built into the generated image was not updated. As a result, the agent failed to communicate with the ironic services on the hub cluster. 

On the spoke
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.46.55.218', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee48ded30>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))


cat /etc/ironic-python-agent.conf

[DEFAULT]
api_url = https://10.46.55.218:6385
inspection_callback_url = https://10.46.55.218:5050/v1/continue
insecure = True


On the hub cluster
oc describe  pod -n openshift-machine-api metal3-5c95c996f-npmmc | grep Node:
Node:                 dhcp-55-185.lab.eng.tlv2.redhat.com/10.46.55.185

Comment 3 tali@redhat.com 2022-06-29 14:38:04 UTC
The spoke cluster can reach all master IP addresses, but the services are not listening on the designated ports on the old host.

Comment 5 Eran Cohen 2022-06-30 05:57:04 UTC
This is the error in the ironic agent log (on the node)

Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 CRITICAL ironic-python-agent [-] Unhandled error: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.46.55.218', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object a>
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last):
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 162, in _new_conn
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     (self._dns_host, self.port), self.timeout, **extra_kw)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     raise err
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     sock.connect(sa)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 253, in connect
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     socket_checkerr(fd)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     raise socket.error(err, errno.errorcode[err])
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent ConnectionRefusedError: [Errno 111] ECONNREFUSED
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last):
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     chunked=chunked)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     self._validate_conn(conn)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     conn.connect()
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 315, in connect
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     conn = self._new_conn()
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     self, "Failed to establish a new connection: %s" % e)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee48ded30>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last):
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     timeout=timeout
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     _stacktrace=sys.exc_info()[2])
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     raise MaxRetryError(_pool, url, error or ResponseError(cause))
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.46.55.218', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee48ded30>: Failed t>
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last):
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/bin/ironic-python-agent", line 10, in <module>
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     sys.exit(run())
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/cmd/agent.py", line 63, in run
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     CONF.advertise_protocol).run()
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 471, in run
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     uuid = inspector.inspect()
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/inspector.py", line 106, in inspect
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     resp = call_inspector(data, failures)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/inspector.py", line 145, in call_inspector
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     resp = _post_to_inspector()
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 329, in wrapped_f
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     return self.call(f, *args, **kw)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 409, in call
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     do = self.iter(retry_state=retry_state)
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 368, in iter
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent     raise retry_exc.reraise()
Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 186, in reraise


After few retries it end up with this:

Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent.
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3601]: Error: error creating container storage: the container name "ironic-agent" is already in use by "b105987a980d874b5557f9dcb3ce5f6974f0175ff744e0673861f2fcdb9148f7". You have to remove that container to be able to reuse that name.: that name is alre>
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=100ms expired, scheduling restart.
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 4.
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Starting Ironic Agent...
Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f1d4d3cd5fbcf1b9249dd71d01be4b901d337fdc5f8f66569eb71df4d9d446...
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Getting image source signatures
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:cab00b7d011a94b724aca6be93f338a77f15620d2808dacab40a587843ec1c5b
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying config sha256:11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Writing manifest to image destination
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Storing signatures
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: 11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent.
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3718]: Error: error creating container storage: the container name "ironic-agent" is already in use by "b105987a980d874b5557f9dcb3ce5f6974f0175ff744e0673861f2fcdb9148f7". You have to remove that container to be able to reuse that name.: that name is alre>
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=100ms expired, scheduling restart.
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 5.
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Starting Ironic Agent...
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f1d4d3cd5fbcf1b9249dd71d01be4b901d337fdc5f8f66569eb71df4d9d446...
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Getting image source signatures
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:cab00b7d011a94b724aca6be93f338a77f15620d2808dacab40a587843ec1c5b
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying config sha256:11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Writing manifest to image destination
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Storing signatures
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: 11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent.
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3834]: Error: error creating container storage: the container name "ironic-agent" is already in use by "b105987a980d874b5557f9dcb3ce5f6974f0175ff744e0673861f2fcdb9148f7". You have to remove that container to be able to reuse that name.: that name is alre>
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a
Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=100ms expired, scheduling restart.
Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 6.
Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Stopped Ironic Agent.
Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Start request repeated too quickly.
Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'.
Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Failed to start Ironic Agent.

Comment 6 Eran Cohen 2022-06-30 06:00:48 UTC
Seems that in case the ironic agent crash no one cleans it up, and a new instance can't start because the name is already in use...
[root@api core]# podman ps -a
CONTAINER ID  IMAGE                                                                                                                   COMMAND     CREATED       STATUS                   PORTS       NAMES
b105987a980d  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f1d4d3cd5fbcf1b9249dd71d01be4b901d337fdc5f8f66569eb71df4d9d446              16 hours ago  Exited (1) 16 hours ago              ironic-agent

Comment 7 Dmitry Tantsur 2022-07-04 12:27:21 UTC
> Seems that in case the ironic agent crash no one cleans it up, and a new instance can't start because the name is already in use...

This is being fixed: https://github.com/openshift/image-customization-controller/pull/56

Comment 8 Dmitry Tantsur 2022-07-13 13:55:46 UTC
Thanks for the reproducer! Because of the amount of the effort required, this will be tracked as an epic (https://issues.redhat.com/browse/METAL-256 for those who have access).


Note You need to log in before you can comment on or make changes to this bug.