Created attachment 1866623 [details] metal3-ironic-conductor.log Created attachment 1866623 [details] metal3-ironic-conductor.log Description of problem: I'm having problems when scaling down a node in an hypershift cluster, instead of booting again assisted-installer's discovery ISO, the BMH is being shutdown and set into maintenance mode Long explanation: I set up a HUB cluster using dev-scripts, and installed there assisted-service, hive and hypershift I create an infraenv, and create BMH with infraenvs.agent-install.openshift.io: myinfraenv label Then I create an agent hypershift cluster, and scale it up The problem appears when I try to scale it down The expected behaviour would be that a worker node is removed from the cluster, and then BMH is reprovisioned using assisted's discovery ISO What actually happens is that the node is shutdown How reproducible: Always Steps to Reproduce: Start a HUB cluster using dev-scripts. This is my config_$USER.sh file: --- config_$USER.sh --- export WORKING_DIR=/home/dev-scripts #Go to https://console-openshift-console.apps.ci.l2s4.p1.openshiftapps.com/, click on your name in the top right, and use the token from the login command export CI_TOKEN='mysecrettoken' export CLUSTER_NAME="hub" export BASE_DOMAIN="redhat.com" export OPENSHIFT_VERSION=4.9.17 export OPENSHIFT_RELEASE_TYPE=ga export IP_STACK=v4 export BMC_DRIVER=redfish-virtualmedia export REDFISH_EMULATOR_IGNORE_BOOT_DEVICE=True export NUM_WORKERS=0 export MASTER_VCPU=8 export MASTER_MEMORY=20000 export MASTER_DISK=60 export NUM_EXTRA_WORKERS=2 export WORKER_VCPU=8 export WORKER_MEMORY=35000 export WORKER_DISK=150 export VM_EXTRADISKS=true export VM_EXTRADISKS_LIST="vda vdb" export VM_EXTRADISKS_SIZE=30G --- END FILE --- Steps to reproduce: NAMESPACE=mynamespace CLUSTERNAME=mycluster INFRAENV=myinfraenv BASEDOMAIN=redhat.com SSHKEY=~/.ssh/id_rsa.pub HYPERSHIFT_IMAGE=quay.io/hypershift/hypershift-operator:latest make all assisted cp ocp/hub/auth/kubeconfig ~/.kube/config oc patch storageclass assisted-service -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true}}' alias hypershift="podman run --net host --rm --entrypoint /usr/bin/hypershift -e KUBECONFIG=/working_dir/ocp/hub/auth/kubeconfig -v $HOME/.ssh:/root/.ssh -v $(pwd):/working_dir $HYPERSHIFT_IMAGE" hypershift install --hypershift-image $HYPERSHIFT_IMAGE oc create namespace $NAMESPACE # Create pull secret cat << EOF | oc apply -n $NAMESPACE -f - apiVersion: v1 kind: Secret type: kubernetes.io/dockerconfigjson metadata: name: my-pull-secret data: .dockerconfigjson: $(cat pull_secret.json | base64 -w0) EOF # Create infraenv cat << EOF | oc apply -n $NAMESPACE -f - apiVersion: agent-install.openshift.io/v1beta1 kind: InfraEnv metadata: name: $INFRAENV spec: pullSecretRef: name: my-pull-secret sshAuthorizedKey: $(cat $SSHKEY) EOF # Create BMH cat ocp/hub/saved-assets/assisted-installer-manifests/06-extra-host-manifests.yaml | sed "s/assisted-installer/$NAMESPACE/g; s/myinfraenv/$INFRAENV/g" | oc apply -f - # Create hypershift cluster INFRAID=$(oc get infraenv $INFRAENV -n $NAMESPACE -o yaml| yq -r .status.isoDownloadURL | awk -F/ '{print $NF}'| cut -d \? -f 1) hypershift create cluster agent --name $CLUSTERNAME --base-domain $BASEDOMAIN --pull-secret /working_dir/pull_secret.json --ssh-key $SSHKEY --agent-namespace $NAMESPACE --namespace $NAMESPACE --infra-id=$INFRAID --control-plane-operator-image=$HYPERSHIFT_IMAGE KUBECONFIG=ocp/hub/auth/kubeconfig:<(hypershift create kubeconfig) kubectl config view --flatten > ~/.kube/config # Scale up oc scale nodepool/$CLUSTERNAME -n $NAMESPACE --replicas=2 # Force EFI next-boot to be CD ssh core@extraworker-0 sudo efibootmgr -n1 ssh core@extraworker-1 sudo efibootmgr -n1 # Scale down oc scale nodepool/$CLUSTERNAME -n $NAMESPACE --replicas=1 Actual results: Scaled down node is shutdown, and in ironic it appears as in maintenance mode Expected results: Scaled down node is rebooted into assisted service's discovery ISO Additional info: Related slack threads https://coreos.slack.com/archives/C02N0RJR170/p1647513139893809 https://coreos.slack.com/archives/CFP6ST0A3/p1647449263684629 I also attach logs from metal3-ironic-conductor, metal3-baremetal-operator, cluster-api-agent-provider and assisted-service I'll also attach an affected BareMetalHost CR
Created attachment 1866624 [details] metal3-baremetal-operator.log
Created attachment 1866625 [details] cluster-api-agent-provider.log
Created attachment 1866626 [details] assisted-service.log
Created attachment 1866627 [details] BareMetalHost.yaml CustomResource
Findings so far: when scaling down, BMO first goes down the Ironic node deletion path. For that, it sets the maintenance flag on the Ironic node and reschedule the reconciliation. On the next iteration, BMO goes down the deprovisioning path (as expected). Since the node is in maintenance still, the provisioning action is not allowed on it, and the process gets stuck. 1) I'm not sure why BMO tries the node deletion (but never completes it). My guess is that it has something to do with the detached annotation removal. 2) We need to enforce the maintenance mode or lack of it on Ironic nodes in BMO.
Created attachment 1866654 [details] full metal3-baremetal-operator.log
So far I've been unable to artificially reproduce this problem by deploying a live ISO on a BMH, detaching it and the deprovisioning. Maybe it's something that is only present on 4.9 (I'm on master, i.e. 4.11) or my testing is missing some critical step. Or it's timing-sensitive. Anyway, https://github.com/metal3-io/baremetal-operator/pull/1101 should make the maintenance mode less of a problem at least. I still hope to get to the root cause eventually, but I'm also leaving for PTO soon.
I'll try to build a custom bmo image with your changes and a spawn a new cluster to see if the usability improves :) Anyway, since I can easily reproduce this behavior, if you want when you're back from PTO I can give you access to one affected cluster so you can debug it better
After reproducing the behaviour, I replaced baremetal-operator image to one with Dimitry's patch, and now the nodes dont get stuck on maintenance mode :)
@dtantsur do you want to keep this open to get to the root issue? or is the workaround you wrote enough to close this? As a user I cannot find the wrong behaviour anymore
The workaround that did the trick for this BZ is already present in the PR#239[1] via commit d46e118fe5a055d1d84a9b62850e7d1456acee8a[2], so far so good, moving to VERIFIED. [1] - https://github.com/openshift/baremetal-operator/pull/239 [2] - https://github.com/openshift/baremetal-operator/pull/239/commits/d46e118fe5a055d1d84a9b62850e7d1456acee8a
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399