Bug 2077059
| Summary: | Deleting provisioned bmh for spoke cluster got stuck in deprovisioning | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexander Chuzhoy <sasha> |
| Component: | Bare Metal Hardware Provisioning | Assignee: | Riccardo Pittau <rpittau> |
| Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Amit Ugol <augol> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | derekh, rpittau, tsedovic, zbitter |
| Version: | 4.10 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-11 16:58:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Note that is the bmh is deleted by removing he finalizer, the next time you create bmh - it gets stuck in preparing and subsequently in deleting... [kni@provisionhost-0-0 ~]$ oc get bmh NAME STATE CONSUMER ONLINE ERROR AGE master-1-0 deleting true 17m master-1-1 deleting true 17m master-1-2 deleting true 17m This appears to be the problem:
2022-04-20T02:51:25.471 creating new PreprovisioningImage {baremetalhost: 'spk-factory-0/master-1-0', provisioningState: 'deprovisioning'}
2022-04-20T02:51:25.473 Reconciler error {name: 'master-1-0', namespace: 'spk-factory-0'}
action "deprovisioning" failed: preprovisioningimages.metal3.io "master-1-0" is forbidden: unable to create new content in namespace spk-factory-0 because it is being terminated
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
Because the state is deprovisioning, the BMO is trying to create a PreprovisioningImage (since this is required for cleaning). It's failing to do so, if I'm reading correctly, because the PreprovisioningImage already exists but has a DeletionTimestamp?
If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning (what does Ironic even do to deprovision in that case?).
It's not clear why the PreprovisioningImage is stuck, presumably with the finalizer set. We'd need to see the logs for the image-customization-controller, which are not attached (it's in a separate pod).
It could be due to https://github.com/openshift/image-customization-controller/pull/44 if the assisted side of that hasn't been implemented yet. > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning This is already tracked in https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark as a duplicate? > what does Ironic even do to deprovision in that case? Cleans up its internal stuff: disconnect vmedia/clean up PXE scripts. > unable to create new content in namespace spk-factory-0 because it is being terminated I've never seen this before, but looks like something wrong on the higher level? (In reply to Dmitry Tantsur from comment #5) > > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning > > This is already tracked in > https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark > as a duplicate? Hi sasha can this be marked as a dup? (In reply to Derek Higgins from comment #6) > (In reply to Dmitry Tantsur from comment #5) > > > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning > > > > This is already tracked in > > https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark > > as a duplicate? > > Hi sasha can this be marked as a dup? Hi Derek, the other bug is on 4.11 and is always reproduce. The one I reported was spotted in 4.10. I actually don't hit it for some time now. Thanks. (In reply to Dmitry Tantsur from comment #5) > > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning > > This is already tracked in > https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark > as a duplicate? Bug 2087213 is tracking inspection disabled, not automatedCleaningMode disabled. I don't think it's a duplicate (even though they are related) because there's no reason to think that verification of that one would check whether this one is fixed. if you're able to reproduce we will need a full set of logs, at least from the image-customization-controller as asked by Zane, thanks! Closing. The logs were not provided and we weren't able to reproduce. If you see this issue again, please provide the requested logs and feel free to reopen. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |
After attempt to deploy spoke cluster with a wrong pull-secret/ca, needed to delete the provisioned BMH. The bmh got stuck in deprovisioning: spk-factory-0 master-1-0 deprovisioning true 16h oc describe bmh -n spk-factory-0 master-1-0 Name: master-1-0 Namespace: spk-factory-0 Labels: infraenvs.agent-install.openshift.io=spk-factory-0 Annotations: bmac.agent-install.openshift.io/hostname: master-1-0 bmac.agent-install.openshift.io/role: master inspect.metal3.io: disabled API Version: metal3.io/v1alpha1 Kind: BareMetalHost Metadata: Creation Timestamp: 2022-04-19T20:49:57Z Deletion Grace Period Seconds: 0 Deletion Timestamp: 2022-04-20T02:51:24Z Finalizers: baremetalhost.metal3.io Generation: 3 Managed Fields: API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:bmac.agent-install.openshift.io/hostname: f:bmac.agent-install.openshift.io/role: f:inspect.metal3.io: f:labels: .: f:infraenvs.agent-install.openshift.io: f:spec: .: f:automatedCleaningMode: f:bmc: .: f:address: f:credentialsName: f:disableCertificateVerification: f:bootMACAddress: f:online: f:rootDeviceHints: .: f:deviceName: Manager: kubectl-create Operation: Update Time: 2022-04-19T20:49:57Z API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:errorCount: f:errorMessage: f:goodCredentials: .: f:credentials: .: f:name: f:namespace: f:credentialsVersion: f:hardwareProfile: f:operationHistory: .: f:deprovision: .: f:end: f:inspect: .: f:end: f:start: f:provision: f:register: .: f:end: f:start: f:operationalStatus: f:poweredOn: f:provisioning: .: f:ID: f:bootMode: f:image: f:raid: .: f:hardwareRAIDVolumes: f:softwareRAIDVolumes: f:rootDeviceHints: .: f:deviceName: f:triedCredentials: .: f:credentials: .: f:name: f:namespace: f:credentialsVersion: Manager: baremetal-operator Operation: Update Time: 2022-04-19T20:50:09Z API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: f:image: .: f:format: f:url: Manager: assisted-service Operation: Update Time: 2022-04-19T22:51:42Z API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: f:lastUpdated: f:operationHistory: f:deprovision: f:start: f:provision: f:end: f:start: f:provisioning: f:image: f:format: f:url: f:state: Manager: baremetal-operator Operation: Update Subresource: status Time: 2022-04-20T02:51:24Z Resource Version: 1477170 UID: ce785247-ddc3-47cb-9a18-b10e25e19d45 Spec: Automated Cleaning Mode: disabled Bmc: Address: redfish-virtualmedia+https://192.168.7.1:8000/redfish/v1/Systems/488904f8-6fc7-4665-b976-275e4ace571e Credentials Name: bmc-secret1 Disable Certificate Verification: true Boot MAC Address: 52:54:00:3a:c5:62 Image: Format: live-iso URL: https://assisted-image-service-open-cluster-management.apps.sno-0.qe.lab.redhat.com/images/6f7b2393-1940-4f5d-8a89-7a757e465e1d?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI2ZjdiMjM5My0xOTQwLTRmNWQtOGE4OS03YTc1N2U0NjVlMWQifQ.umjVYTWdtY9tOHObfLkn7n89iIMeAnwYY1_jjd-rs4_CvTsFm3aorjHmcmOy6hrohWxeyMiFeOmssSj9kTml_A&arch=x86_64&type=minimal-iso&version=4.10 Online: true Root Device Hints: Device Name: /dev/sda Status: Error Count: 0 Error Message: Good Credentials: Credentials: Name: bmc-secret1 Namespace: spk-factory-0 Credentials Version: 550741 Hardware Profile: unknown Last Updated: 2022-04-20T02:51:24Z Operation History: Deprovision: End: <nil> Start: 2022-04-20T02:51:24Z Inspect: End: 2022-04-19T20:50:08Z Start: 2022-04-19T20:50:08Z Provision: End: 2022-04-19T22:52:03Z Start: 2022-04-19T22:51:42Z Register: End: 2022-04-19T20:50:08Z Start: 2022-04-19T20:49:57Z Operational Status: OK Powered On: true Provisioning: ID: f8f85eb4-77e1-4a4c-8cbd-bb2e100e2ed2 Boot Mode: UEFI Image: Format: live-iso URL: https://assisted-image-service-open-cluster-management.apps.sno-0.qe.lab.redhat.com/images/6f7b2393-1940-4f5d-8a89-7a757e465e1d?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI2ZjdiMjM5My0xOTQwLTRmNWQtOGE4OS03YTc1N2U0NjVlMWQifQ.umjVYTWdtY9tOHObfLkn7n89iIMeAnwYY1_jjd-rs4_CvTsFm3aorjHmcmOy6hrohWxeyMiFeOmssSj9kTml_A&arch=x86_64&type=minimal-iso&version=4.10 Raid: Hardware RAID Volumes: <nil> Software RAID Volumes: Root Device Hints: Device Name: /dev/sda State: deprovisioning Tried Credentials: Credentials: Name: bmc-secret1 Namespace: spk-factory-0 Credentials Version: 550741 Events: <none> To force delete, I manually removed the finalizer baremetalhost.metal3.io