Bug 2070519
Summary: | [OCP 4.8] Ironic inspector image fails to clean disks that are part of a multipath setup if they are passive paths | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mario Abajo <mabajodu> | |
Component: | Bare Metal Hardware Provisioning | Assignee: | Iury Gregory Melo Ferreira <imelofer> | |
Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Amit Ugol <augol> | |
Status: | CLOSED ERRATA | Docs Contact: | Tomas 'Sheldon' Radej <tradej> | |
Severity: | urgent | |||
Priority: | urgent | CC: | athomas, augol, awolff, ccrum, cgaynor, dhellmann, dmoessne, eglottma, iheim, imelofer, jkreger, jlebon, jsaucier, kurathod, lhh, lmurthy, lshilin, mcornea, nstielau, openshift-bugs-escalate, peasters, pprahlad, rlichti, rpittau, sasha, tsedovic, ykashtan | |
Version: | 4.8 | Keywords: | OtherQA, Triaged | |
Target Milestone: | --- | |||
Target Release: | 4.8.z | |||
Hardware: | x86_64 | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2076622 2089309 (view as bug list) | Environment: | ||
Last Closed: | 2022-06-30 16:35:30 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2089313 | |||
Bug Blocks: | 2076622 |
Description
Mario Abajo
2022-03-31 10:41:26 UTC
For reference, we've encountered a similar issue in UPI-land related to non-optimized paths. In the end, we added full support for installing to multipath devices. Some links: - https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/installing-bare-metal.html#rhcos-enabling-multipath_installing-bare-metal - https://github.com/coreos/fedora-coreos-config/pull/1011 - https://github.com/openshift/os/blob/master/docs/faq.md#q-does-rhcos-support-multipath It seems like Ironic needs to learn to do the same thing here: assemble the multipath, and only manipulate the multipathed device itself and not underlying singular paths. For hooking into RHCOS' support for first-boot multipath, it would also need to add the kernel arguments described in the OpenShift docs above. As a quick and dirty solution, can't we just skip devices that cannot be used? at least as a workaround. (In reply to Mario Abajo from comment #2) > As a quick and dirty solution, can't we just skip devices that cannot be > used? at least as a workaround. This is technically possible (with a code change to ironic-python-agent) but if we start ignoring errors cleaning disks then other ironic users could end up leaking data between customers if we failed to clean a disk and ignored it. My $0.02 is just load up multipathd and hopefully it will recognize the san pathing and de-duplicate it. Unfortunately SAN controllers often offer different behavior or sometimes need special configurations which is why we have been shy about incorporating in in to ramdisks by default. I'd personally prefer we don't ignore failed devices due to the data leakage risk being so high when that starts to happen. Hello everyone, I've talked with Julia today and we think we need to add a new element to the ramdisk (to be able to identify multipath devices) I've pushed the upstream change already and we will work on backporting from 4.11 to 4.8 Since using a release image with the modified ironic-ipa-downloader image (which contains https://review.opendev.org/c/openstack/ironic-python-agent/+/837784 )we can try manually updating the ironic-ipa-downloader image after having the cluster. The procedure I've tested locally and worked is: After having you deployment you should first check if there no unmanaged resources in you cluster $ oc get -o json clusterversion version | jq .spec.overrides After verifying that, make sure that move the cluster-baremetal-operator-images config map to unmanaged, this can be done by running the following command: $ oc patch clusterversion version --namespace openshift-cluster-version --type merge -p '{"spec":{"overrides":[{"kind":"ConfigMap","group":"v1","name":"cluster-baremetal-operator-images","namespace":"openshift-machine-api","unmanaged":true}]}}' clusterversion.config.openshift.io/version patched Check if the clusterversion shows a new resource that should be unmanaged $ oc get -o json clusterversion version | jq .spec.overrides [ { "group": "v1", "kind": "ConfigMap", "name": "cluster-baremetal-operator-images", "namespace": "openshift-machine-api", "unmanaged": true } ] Edit the ConfigMap for the cluster-baremetal-operator-images and change the value of baremetalIpaDownloader to the new image, in our case quay.io/imelofer/ipa-multipath@sha256:cead7e5a6fe9ad2c5027282a8a74ec12224aafe6b4524fd879f25c4ecc996485 $ oc edit ConfigMap cluster-baremetal-operator-images configmap/cluster-baremetal-operator-images edited Wait about 3min and double check if the ConfigMap still contains the right config $ oc describe ConfigMap cluster-baremetal-operator-images | grep "imelofer" Delete the CBO and wait for the CVO to bring back (after a new CBO starts you can try to add new nodes to your cluster) $ oc get pods -n openshift-machine-api NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-78dbcdbf85-hdp44 2/2 Running 0 78m cluster-baremetal-operator-58b9dd5c45-pfhwd 2/2 Running 1 78m machine-api-controllers-5bb58fb7bf-lp4fn 7/7 Running 1 73m machine-api-operator-658749fccf-rq6c8 2/2 Running 1 78m $ oc delete po cluster-baremetal-operator-58b9dd5c45-pfhwd -n openshift-machine-api pod "cluster-baremetal-operator-58b9dd5c45-pfhwd" deleted $ oc get pods -n openshift-machine-api NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-78dbcdbf85-hdp44 2/2 Running 0 79m cluster-baremetal-operator-58b9dd5c45-72rsp 2/2 Running 0 25s machine-api-controllers-5bb58fb7bf-lp4fn 7/7 Running 1 74m machine-api-operator-658749fccf-rq6c8 2/2 Running 1 79m *** Bug 2077067 has been marked as a duplicate of this bug. *** Setting target release for this BZ to 4.8.z, since we already have the bugs for 4.11, 4.10, 4.9 Deployment of 4.8.0-0.nightly-2022-06-15-131405 passed successfully and sanity tests passed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.45 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5167 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |