Bug 2053752
Summary: | [IPI] OCP-4.10 baremetal - boot partition is not mounted on temporary directory | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | tonyg | ||||
Component: | Bare Metal Hardware Provisioning | Assignee: | Derek Higgins <derekh> | ||||
Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Lubov <lshilin> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | asalvati, bfournie, bmuchiny, derekh, eglottma, fbaudin, fsoppels, gvillani, josearod, lshilin, manrodri, mcornea, openshift-bugs-escalate, rugouvei, shreepat, skrenger, snetting, tsedovic, yprokule | ||||
Version: | 4.10 | Keywords: | AutomationBlocker, Regression, Triaged | ||||
Target Milestone: | --- | Flags: | tsedovic:
needinfo-
|
||||
Target Release: | 4.10.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 2061278 (view as bug list) | Environment: | |||||
Last Closed: | 2022-06-28 11:50:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 2086759 | ||||||
Bug Blocks: | 2061278 | ||||||
Attachments: |
|
Description
tonyg
2022-02-11 22:56:59 UTC
a7985248-1cf3-494a-9771-de48e4500a62_master-0.cluster10.core.dfwt5g.lab_2022-02-11-18-06-10.tar.gz: 2022-02-11 18:06:04.093 1 DEBUG oslo_concurrency.processutils [-] CMD "partx -a /dev/sda" returned: 1 in 0.003s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 a7985248-1cf3-494a-9771-de48e4500a62_master-0.cluster10.core.dfwt5g.lab_2022-02-11-18-06-10.tar.gz: 2022-02-11 18:06:04.094 1 DEBUG oslo_concurrency.processutils [-] 'partx -a /dev/sda' failed. Not Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:474 a7985248-1cf3-494a-9771-de48e4500a62_master-0.cluster10.core.dfwt5g.lab_2022-02-11-18-06-10.tar.gz: 2022-02-11 18:06:04.094 1 DEBUG ironic_lib.utils [-] Command stdout is: "" _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:99 a7985248-1cf3-494a-9771-de48e4500a62_master-0.cluster10.core.dfwt5g.lab_2022-02-11-18-06-10.tar.gz: 2022-02-11 18:06:04.094 1 DEBUG ironic_lib.utils [-] Command stderr is: "partx: /dev/sda: error adding partitions 1-4 a7985248-1cf3-494a-9771-de48e4500a62_master-0.cluster10.core.dfwt5g.lab_2022-02-11-18-06-10.tar.gz: " _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:100 Okay, we're unable to tell the kernel about new partitions. Unfortunately, we run without the -v flag, so hard to tell why. *** Bug 2057668 has been marked as a duplicate of this bug. *** Unmarking this as triaged. We don't know the root cause, the errors in comment 1 also happen on successful deployments. We can observe sda2 existing both before and after the failure. I think Derek is looking into adding retries. We havn't been able to isolate/reproduce this to get to the root of the problem, in the mean time I've pushed a retry for the failing operation I was also seeing this issue on ProLiant DL380 Gen10 machines and it was reproducing pretty consistently with 4.10.0-rc.6 @mcornea could you, please try to deploy on your setup the last 4.11 nightly, please On out CI setup the deployment passed last night (In reply to Lubov from comment #7) > @mcornea could you, please try to deploy on your setup the last > 4.11 nightly, please > On out CI setup the deployment passed last night I can confirm the nodes were deployed as well on my environment with 4.11.0-0.nightly-2022-03-06-020555. Encountered this issue today during OCP-4.10 mgmt cluster installation. time="2022-03-17T11:12:15-05:00" level=error msg="Error: cannot go from state 'deploy failed' to state 'manageable' , last error was 'Deploy step deploy.install_coreos failed on node 920b03ad-9311-492c-899d-dcf56b181d2c. Could not verify uefi on device /dev/sda, failed with Unexpected error while running command." time="2022-03-17T11:12:15-05:00" level=error msg="Command: mount /dev/sda2 /tmp/tmpd66e7u6t/boot/efi" time="2022-03-17T11:12:15-05:00" level=error msg="Exit code: 32" time="2022-03-17T11:12:15-05:00" level=error msg="Stdout: ''" time="2022-03-17T11:12:15-05:00" level=error msg="Stderr: 'mount: /tmp/tmpd66e7u6t/boot/efi: special device /dev/sda2 does not exist.\\n'.' 2022-03-17 16:11:09.089 1 ERROR ironic.conductor.utils [-] Deploy step deploy.install_coreos failed on node 920b03ad-9311-492c-899d-dcf56b181d2c. Could not verify uefi on device /dev/sda, failed with Unexpected error while running command. DCI job : https://www.distributed-ci.io/jobs/4480dd4c-a4f8-41d3-9717-99a11393af61/jobStates verified on 4.11.0-0.nightly-2022-05-20-213928 *** Bug 2061278 has been marked as a duplicate of this bug. *** cherry-pick was missing a import (it wasn't needed in origin/main) https://bugzilla.redhat.com/show_bug.cgi?id=2090631 *** Bug 2090631 has been marked as a duplicate of this bug. *** verified on 4.10.18. run twice Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.20 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5172 |