Description of problem: When the Provisioning Network is not being used (This is the case in ZTP) IPA gets the host IP of the master on which metal3 deployment is running as the IP address of Ironic. When installing clusters with ZTP, the installation is performed by the assisted-installer-agent (The BMH has a CustomDeploy method: start_assisted_install) the ironic agent waits for the assisted-agent to complete the installation before rebooting the node. Version-Release number of selected component (if applicable): 4.11 How reproducible: 100 Steps to Reproduce: 1. Start ZTP installation with the converged flow (HUB cluster in version 4.11, and assisted-service master branch) 2. Once the nodes installation started (you can check the Agent status) kill the metal3 pod until it moves to another node 3. Actual results: Once the assisted-installer-agent finish the ironic agent can no longer talk to Ironic service and the installation fails. Expected results: Expected Ironic agent to be able to talk to the ironic service regardless of node the pod is running on. Additional info:
I reproduced this issue today. I restarted the meta3 pod prior to deploying a spoke cluster, and the spoke cluster stuck in inspecting state. The meta3 pod moved from master-1 to master-2 after restart. The ironic-python-agent that built into the generated image was not updated. As a result, the agent failed to communicate with the ironic services on the hub cluster. On the spoke Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.46.55.218', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee48ded30>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)) cat /etc/ironic-python-agent.conf [DEFAULT] api_url = https://10.46.55.218:6385 inspection_callback_url = https://10.46.55.218:5050/v1/continue insecure = True On the hub cluster oc describe pod -n openshift-machine-api metal3-5c95c996f-npmmc | grep Node: Node: dhcp-55-185.lab.eng.tlv2.redhat.com/10.46.55.185
The spoke cluster can reach all master IP addresses, but the services are not listening on the designated ports on the old host.
This is the error in the ironic agent log (on the node) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 CRITICAL ironic-python-agent [-] Unhandled error: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.46.55.218', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object a> Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last): Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 162, in _new_conn Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent (self._dns_host, self.port), self.timeout, **extra_kw) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent raise err Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent sock.connect(sa) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 253, in connect Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent socket_checkerr(fd) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent raise socket.error(err, errno.errorcode[err]) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent ConnectionRefusedError: [Errno 111] ECONNREFUSED Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred: Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last): Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent chunked=chunked) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent self._validate_conn(conn) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent conn.connect() Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 315, in connect Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent conn = self._new_conn() Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent self, "Failed to establish a new connection: %s" % e) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee48ded30>: Failed to establish a new connection: [Errno 111] ECONNREFUSED Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred: Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last): Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent timeout=timeout Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent _stacktrace=sys.exc_info()[2]) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent raise MaxRetryError(_pool, url, error or ResponseError(cause)) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.46.55.218', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee48ded30>: Failed t> Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred: Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent Traceback (most recent call last): Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/bin/ironic-python-agent", line 10, in <module> Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent sys.exit(run()) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/ironic_python_agent/cmd/agent.py", line 63, in run Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent CONF.advertise_protocol).run() Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 471, in run Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent uuid = inspector.inspect() Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/ironic_python_agent/inspector.py", line 106, in inspect Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent resp = call_inspector(data, failures) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/ironic_python_agent/inspector.py", line 145, in call_inspector Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent resp = _post_to_inspector() Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 329, in wrapped_f Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent return self.call(f, *args, **kw) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 409, in call Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent do = self.iter(retry_state=retry_state) Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 368, in iter Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent raise retry_exc.reraise() Jun 29 13:44:55 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2716]: 2022-06-29 13:44:55.522 1 ERROR ironic-python-agent File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 186, in reraise After few retries it end up with this: Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent. Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3601]: Error: error creating container storage: the container name "ironic-agent" is already in use by "b105987a980d874b5557f9dcb3ce5f6974f0175ff744e0673861f2fcdb9148f7". You have to remove that container to be able to reuse that name.: that name is alre> Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'. Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=100ms expired, scheduling restart. Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 4. Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Stopped Ironic Agent. Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Starting Ironic Agent... Jun 29 13:44:59 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f1d4d3cd5fbcf1b9249dd71d01be4b901d337fdc5f8f66569eb71df4d9d446... Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Getting image source signatures Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:cab00b7d011a94b724aca6be93f338a77f15620d2808dacab40a587843ec1c5b Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720 Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652 Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Copying config sha256:11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15 Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Writing manifest to image destination Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: Storing signatures Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3660]: 11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15 Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent. Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3718]: Error: error creating container storage: the container name "ironic-agent" is already in use by "b105987a980d874b5557f9dcb3ce5f6974f0175ff744e0673861f2fcdb9148f7". You have to remove that container to be able to reuse that name.: that name is alre> Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'. Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=100ms expired, scheduling restart. Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 5. Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Stopped Ironic Agent. Jun 29 13:45:00 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Starting Ironic Agent... Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f1d4d3cd5fbcf1b9249dd71d01be4b901d337fdc5f8f66569eb71df4d9d446... Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Getting image source signatures Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:cab00b7d011a94b724aca6be93f338a77f15620d2808dacab40a587843ec1c5b Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720 Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652 Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Copying config sha256:11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15 Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Writing manifest to image destination Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: Storing signatures Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3777]: 11f6a72ca80379a3087b32db2c2d364ae438bf0252fcd05de1a2371e7d642f15 Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent. Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[3834]: Error: error creating container storage: the container name "ironic-agent" is already in use by "b105987a980d874b5557f9dcb3ce5f6974f0175ff744e0673861f2fcdb9148f7". You have to remove that container to be able to reuse that name.: that name is alre> Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Main process exited, code=exited, status=125/n/a Jun 29 13:45:01 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'. Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Service RestartSec=100ms expired, scheduling restart. Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Scheduled restart job, restart counter is at 6. Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Stopped Ironic Agent. Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Start request repeated too quickly. Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Failed with result 'exit-code'. Jun 29 13:45:02 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Failed to start Ironic Agent.
Seems that in case the ironic agent crash no one cleans it up, and a new instance can't start because the name is already in use... [root@api core]# podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b105987a980d quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f1d4d3cd5fbcf1b9249dd71d01be4b901d337fdc5f8f66569eb71df4d9d446 16 hours ago Exited (1) 16 hours ago ironic-agent
> Seems that in case the ironic agent crash no one cleans it up, and a new instance can't start because the name is already in use... This is being fixed: https://github.com/openshift/image-customization-controller/pull/56
Thanks for the reproducer! Because of the amount of the effort required, this will be tracked as an epic (https://issues.redhat.com/browse/METAL-256 for those who have access).