Bug 2077059 - Deleting provisioned bmh for spoke cluster got stuck in deprovisioning
Summary: Deleting provisioned bmh for spoke cluster got stuck in deprovisioning
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.10
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Riccardo Pittau
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-20 15:01 UTC by Alexander Chuzhoy
Modified: 2023-09-18 04:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-11 16:58:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Chuzhoy 2022-04-20 15:01:48 UTC
After attempt to deploy spoke cluster with a wrong pull-secret/ca, needed to delete the provisioned BMH.

The bmh got stuck in deprovisioning:
spk-factory-0   master-1-0   deprovisioning              true             16h




oc describe bmh -n spk-factory-0 master-1-0
Name:         master-1-0
Namespace:    spk-factory-0
Labels:       infraenvs.agent-install.openshift.io=spk-factory-0
Annotations:  bmac.agent-install.openshift.io/hostname: master-1-0
              bmac.agent-install.openshift.io/role: master
              inspect.metal3.io: disabled
API Version:  metal3.io/v1alpha1
Kind:         BareMetalHost
Metadata:
  Creation Timestamp:             2022-04-19T20:49:57Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2022-04-20T02:51:24Z
  Finalizers:
    baremetalhost.metal3.io
  Generation:  3
  Managed Fields:
    API Version:  metal3.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:bmac.agent-install.openshift.io/hostname:
          f:bmac.agent-install.openshift.io/role:
          f:inspect.metal3.io:
        f:labels:
          .:
          f:infraenvs.agent-install.openshift.io:
      f:spec:
        .:
        f:automatedCleaningMode:
        f:bmc:
          .:
          f:address:
          f:credentialsName:
          f:disableCertificateVerification:
        f:bootMACAddress:
        f:online:
        f:rootDeviceHints:
          .:
          f:deviceName:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2022-04-19T20:49:57Z
    API Version:  metal3.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:errorCount:
        f:errorMessage:
        f:goodCredentials:
          .:
          f:credentials:
            .:
            f:name:
            f:namespace:
          f:credentialsVersion:
        f:hardwareProfile:
        f:operationHistory:
          .:
          f:deprovision:
            .:
            f:end:
          f:inspect:
            .:
            f:end:
            f:start:
          f:provision:
          f:register:
            .:
            f:end:
            f:start:
        f:operationalStatus:
        f:poweredOn:
        f:provisioning:
          .:
          f:ID:
          f:bootMode:
          f:image:
          f:raid:
            .:
            f:hardwareRAIDVolumes:
            f:softwareRAIDVolumes:
          f:rootDeviceHints:
            .:
            f:deviceName:
        f:triedCredentials:
          .:
          f:credentials:
            .:
            f:name:
            f:namespace:
          f:credentialsVersion:
    Manager:      baremetal-operator
    Operation:    Update
    Time:         2022-04-19T20:50:09Z
    API Version:  metal3.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:image:
          .:
          f:format:
          f:url:
    Manager:      assisted-service
    Operation:    Update
    Time:         2022-04-19T22:51:42Z
    API Version:  metal3.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:lastUpdated:
        f:operationHistory:
          f:deprovision:
            f:start:
          f:provision:
            f:end:
            f:start:
        f:provisioning:
          f:image:
            f:format:
            f:url:
          f:state:
    Manager:         baremetal-operator
    Operation:       Update
    Subresource:     status
    Time:            2022-04-20T02:51:24Z
  Resource Version:  1477170
  UID:               ce785247-ddc3-47cb-9a18-b10e25e19d45
Spec:
  Automated Cleaning Mode:  disabled
  Bmc:
    Address:                           redfish-virtualmedia+https://192.168.7.1:8000/redfish/v1/Systems/488904f8-6fc7-4665-b976-275e4ace571e
    Credentials Name:                  bmc-secret1
    Disable Certificate Verification:  true
  Boot MAC Address:                    52:54:00:3a:c5:62
  Image:
    Format:  live-iso
    URL:     https://assisted-image-service-open-cluster-management.apps.sno-0.qe.lab.redhat.com/images/6f7b2393-1940-4f5d-8a89-7a757e465e1d?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI2ZjdiMjM5My0xOTQwLTRmNWQtOGE4OS03YTc1N2U0NjVlMWQifQ.umjVYTWdtY9tOHObfLkn7n89iIMeAnwYY1_jjd-rs4_CvTsFm3aorjHmcmOy6hrohWxeyMiFeOmssSj9kTml_A&arch=x86_64&type=minimal-iso&version=4.10
  Online:    true
  Root Device Hints:
    Device Name:  /dev/sda
Status:
  Error Count:    0
  Error Message:  
  Good Credentials:
    Credentials:
      Name:               bmc-secret1
      Namespace:          spk-factory-0
    Credentials Version:  550741
  Hardware Profile:       unknown
  Last Updated:           2022-04-20T02:51:24Z
  Operation History:
    Deprovision:
      End:    <nil>
      Start:  2022-04-20T02:51:24Z
    Inspect:
      End:    2022-04-19T20:50:08Z
      Start:  2022-04-19T20:50:08Z
    Provision:
      End:    2022-04-19T22:52:03Z
      Start:  2022-04-19T22:51:42Z
    Register:
      End:             2022-04-19T20:50:08Z
      Start:           2022-04-19T20:49:57Z
  Operational Status:  OK
  Powered On:          true
  Provisioning:
    ID:         f8f85eb4-77e1-4a4c-8cbd-bb2e100e2ed2
    Boot Mode:  UEFI
    Image:
      Format:  live-iso
      URL:     https://assisted-image-service-open-cluster-management.apps.sno-0.qe.lab.redhat.com/images/6f7b2393-1940-4f5d-8a89-7a757e465e1d?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI2ZjdiMjM5My0xOTQwLTRmNWQtOGE4OS03YTc1N2U0NjVlMWQifQ.umjVYTWdtY9tOHObfLkn7n89iIMeAnwYY1_jjd-rs4_CvTsFm3aorjHmcmOy6hrohWxeyMiFeOmssSj9kTml_A&arch=x86_64&type=minimal-iso&version=4.10
    Raid:
      Hardware RAID Volumes:  <nil>
      Software RAID Volumes:
    Root Device Hints:
      Device Name:  /dev/sda
    State:          deprovisioning
  Tried Credentials:
    Credentials:
      Name:               bmc-secret1
      Namespace:          spk-factory-0
    Credentials Version:  550741
Events:                   <none>




To force delete, I manually removed the finalizer     baremetalhost.metal3.io

Comment 2 Alexander Chuzhoy 2022-04-20 19:04:48 UTC
Note that is the bmh is deleted by removing he finalizer, the next time you create bmh - it gets stuck in preparing and subsequently in deleting...

[kni@provisionhost-0-0 ~]$ oc get bmh 
NAME         STATE      CONSUMER   ONLINE   ERROR   AGE
master-1-0   deleting              true             17m
master-1-1   deleting              true             17m
master-1-2   deleting              true             17m

Comment 3 Zane Bitter 2022-05-04 14:46:35 UTC
This appears to be the problem:

2022-04-20T02:51:25.471 creating new PreprovisioningImage {baremetalhost: 'spk-factory-0/master-1-0', provisioningState: 'deprovisioning'}
2022-04-20T02:51:25.473 Reconciler error {name: 'master-1-0', namespace: 'spk-factory-0'}
action "deprovisioning" failed: preprovisioningimages.metal3.io "master-1-0" is forbidden: unable to create new content in namespace spk-factory-0 because it is being terminated
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214

Because the state is deprovisioning, the BMO is trying to create a PreprovisioningImage (since this is required for cleaning). It's failing to do so, if I'm reading correctly, because the PreprovisioningImage already exists but has a DeletionTimestamp?

If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning (what does Ironic even do to deprovision in that case?).

It's not clear why the PreprovisioningImage is stuck, presumably with the finalizer set. We'd need to see the logs for the image-customization-controller, which are not attached (it's in a separate pod).

Comment 4 Zane Bitter 2022-05-04 14:49:57 UTC
It could be due to https://github.com/openshift/image-customization-controller/pull/44 if the assisted side of that hasn't been implemented yet.

Comment 5 Dmitry Tantsur 2022-06-01 12:41:31 UTC
> If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning

This is already tracked in https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark as a duplicate?

> what does Ironic even do to deprovision in that case?

Cleans up its internal stuff: disconnect vmedia/clean up PXE scripts.

> unable to create new content in namespace spk-factory-0 because it is being terminated

I've never seen this before, but looks like something wrong on the higher level?

Comment 6 Derek Higgins 2022-06-07 16:30:12 UTC
(In reply to Dmitry Tantsur from comment #5)
> > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning
> 
> This is already tracked in
> https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark
> as a duplicate?

Hi sasha can this be marked as a dup?

Comment 7 Alexander Chuzhoy 2022-06-07 20:05:29 UTC
(In reply to Derek Higgins from comment #6)
> (In reply to Dmitry Tantsur from comment #5)
> > > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning
> > 
> > This is already tracked in
> > https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark
> > as a duplicate?
> 
> Hi sasha can this be marked as a dup?

Hi Derek,
the other bug is on 4.11 and is always reproduce. The one I reported was spotted in 4.10. I actually don't hit it for some time now.
Thanks.

Comment 8 Zane Bitter 2022-06-07 21:01:33 UTC
(In reply to Dmitry Tantsur from comment #5)
> > If automatedCleaningMode: disabled is set then perhaps we shouldn't require a PreprovisioningImage for deprovisioning
> 
> This is already tracked in
> https://bugzilla.redhat.com/show_bug.cgi?id=2087213 for another reason. Mark
> as a duplicate?

Bug 2087213 is tracking inspection disabled, not automatedCleaningMode disabled. I don't think it's a duplicate (even though they are related) because there's no reason to think that verification of that one would check whether this one is fixed.

Comment 9 Riccardo Pittau 2022-06-14 09:39:59 UTC
if you're able to reproduce we will need a full set of logs, at least from the image-customization-controller as asked by Zane, thanks!

Comment 12 Tomas Sedovic 2022-11-11 16:58:05 UTC
Closing. The logs were not provided and we weren't able to reproduce.

If you see this issue again, please provide the requested logs and feel free to reopen.

Comment 13 Red Hat Bugzilla 2023-09-18 04:35:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.