Bug 1957980

Summary: [Webscale] Ironic fails to deploy OpenShift Baremetal IPI 4.7.9 on
Product: OpenShift Container Platform Reporter: Marc Methot <mmethot>
Component: Bare Metal Hardware ProvisioningAssignee: Tomas Sedovic <tsedovic>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Amit Ugol <augol>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bfournie
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-06 22:03:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marc Methot 2021-05-06 21:48:09 UTC
The deployment fails with the following error:
~~~
Error: Agent API for node 32cbf50b-3cfd-49f9-a166-c5776a704529 returned HTTP status code 409 with error: Agent is busy: executing command execute_deploy_step.
~~~

The same deployment (same hosts, same install config) works when deploying 4.6.17

Version-Release number of selected component (if applicable):
4.6.9

How reproducible:
Fails every time, tested with both 4.7.6 and 4.7.9.


Actual results:
Checking on the half deployed host we eventually get to
~~~
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:15.422 3208 DEBUG ironic_python_agent.ironic_api_client [-] Heartbeat: announcing callback URL https://172.22.0.15:9999, API version is 1.68 heartbeat /usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py:150
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:21.039 3208 INFO root [-] Asynchronous command execute_deploy_step completed: {'deploy_result': {'result': 'prepare_image: image (rhcos-47.83.202103251640-0-compressed.x86_64.qcow2) written to device /dev/sda root_uuid=0x00000000'}, 'deploy_step': 
{'interface': 'deploy', 'step': 'write_image', 'args': {'image_info': {'id': 'rhcos-47.83.202103251640-0-compressed.x86_64.qcow2', 'urls': ['http://172.22.0.2:80/images/rhcos-47.83.202103251640-0-openstack.x86_64.qcow2/rhcos-47.83.202103251640-0-compressed.x86_64.qcow2'], 'disk_format': None, 'container_format': Non
e, 'stream_raw_images': True, 'checksum': 'http://172.22.0.2:80/images/rhcos-47.83.202103251640-0-openstack.x86_64.qcow2/rhcos-47.83.202103251640-0-compressed.x86_64.qcow2.md5sum', 'node_uuid': 'a7046d77-3d11-400f-99cb-c1a20cb86678'}, 'configdrive': '<...>'}}}
May 06 13:48:21 master-0 rsyslogd[3212]: message too long (312102) with configured size 8096, begin of message is: 2021-05-06 13:47:18.864 3208 INFO root [-] Deploy step completed: {'interface':  [v8.1911.0-6.el8 try https://www.rsyslog.com/e/2445 ]
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent [-] error sending heartbeat to http://172.22.0.2:6385: ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 400: Invalid or missing agent token received.
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent Traceback (most recent call last):
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 109, in do_heartbeat
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent     generated_cert=self.agent.generated_cert,
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py", line 163, in heartbeat
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent     raise errors.HeartbeatError(error)
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 400: Invalid or missing agent token received.
                                                    2021-05-06 13:48:21.154 3208 ERROR ironic_python_agent.agent 
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:21.237 3208 DEBUG ironic_python_agent.ironic_api_client [-] Heartbeat: announcing callback URL https://172.22.0.15:9999, API version is 1.68 heartbeat /usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py:150
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:21.500 3208 INFO ironic_python_agent.agent [-] sleeping before next heartbeat, interval: 156.01437867497847
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent [-] error sending heartbeat to http://172.22.0.2:6385: ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 400: Invalid or missing agent token received.
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent Traceback (most recent call last):
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 109, in do_heartbeat
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent     generated_cert=self.agent.generated_cert,
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py", line 163, in heartbeat
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent     raise errors.HeartbeatError(error)
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 400: Invalid or missing agent token received.
                                                    2021-05-06 13:48:21.581 3208 ERROR ironic_python_agent.agent 
May 06 13:48:21 master-0 ironic-python-agent[3208]: 2021-05-06 13:48:21.799 3208 INFO ironic_python_agent.agent [-] sleeping before next heartbeat, interval: 169.92832458453464
~~~

Expected results:
That it works just the same as 4.6 deployment.


Additional info:
bootstrap logs/data (`./openshift-install gather bootstrap --dir=<installation_directory>`) and while host is booted off ramdisk logs all attached to the case.

Comment 1 Marc Methot 2021-05-06 21:50:00 UTC
The hosts PXE boot normally using RedFish, then the cleanup is successful, and then it fails right after writing the image. Complains that the token is then no longer valid.

Comment 2 Bob Fournier 2021-05-06 21:59:54 UTC
we had actually just created a bug for same issue https://bugzilla.redhat.com/show_bug.cgi?id=1957976

Comment 3 Bob Fournier 2021-05-06 22:03:13 UTC
Closing this as a dup as same issue tracked in other one.

*** This bug has been marked as a duplicate of bug 1957976 ***