Machine stuck in provisioned state , no warning or error message in machine-controller logs, this happens while trying to use default settings for machineset Release : Cluster version is 4.8.0-0.nightly-2021-02-23-013453 Always Reproducible Steps : Use below yaml to create machineset https://privatebin-it-iso.int.open.paas.redhat.com/?fa8e071584ae0242#GvPoSkPYNwtJbG5FF5v3iGS2ia19Bej452daxsarjkZH Actual and expected : Machineset created successfully 2.Check machine status Expected : Machine should be in Running status with node attached , if no errors or warnings in logs Actual : Machine stuck in Provisioned phase , no errors in logs https://privatebin-it-iso.int.open.paas.redhat.com/?0ae78ca18f0e9838#3eAP5pFgt5VCXweXBfkicA3EB4t6L6SZvXLTYXMisfJP [miyadav@miyadav aws]$ oc get machines . . NAME PHASE TYPE REGION tenancy-dedicated-37132-f42zh Provisioned m5.large us-east-2 us-east-2a 20m Additional info: Must - gather - https://drive.google.com/file/d/1qFvlpsKg3Rk0GzbsQCnbg1LWOLQ1n2O-/view?usp=sharing
hi Milind, just starting to take a look at this. i am looking through the must-gather and i don't see any CertificateSigningRequests for the new machines you are creating. would it be possible for you to get check the cluster to see if those machines ever made a CSR (oc get csr)? if no CSRs were generated, then i think this is an issue with the kubelet or node startup process, see [0] for more details. [0] https://github.com/openshift/machine-api-operator/blob/master/docs/user/TroubleShooting.md#machine-status-phase-provisioned
Thanks Michael , I could find that the yaml need to have : (apiversion and iaminstanceProfile info) apiVersion: awsproviderconfig.openshift.io/v1beta1 iamInstanceProfile: id: miyadav-oc48-2502-f5l7x-worker-profile to be able to generate a csr , when we do not pass it , the csr is not generated , but we dont get any error message or warning , after I added it to machineset yaml and scaled machineset , new machine was provisioned successfully and node was attached to it (in READY state) Let me know if wish to check something more in this. So basically we encountered this as we are trying to use default values which does not have iamInstanceProfile
Looks like we need to add a warning if the iamInstanceProfile is missing, as we do for the service account on GCP. This would tell clients that the machine may not join the cluster if the instance profile is not provided and should be enough of a hint to users that they should include this
sounds good Joel, i wonder if there shouldn't be something in the product documentation about this too?
just confirmed that these entries are in the product doc example, i'm going to work on adding the warning in a similar manner as the gcp provider.
Validated at : [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-26-002831 True False 4h50m Error while reconciling 4.8.0-0.nightly-2021-03-26-002831: the cluster operator etcd is degraded [miyadav@miyadav ~]$ Steps : created machineset without iaminstance profile in the yaml Actual and expected results : [miyadav@miyadav ~]$ oc create -f rhv/aws/bugval.yml W0326 16:03:02.423741 45575 warnings.go:67] providerSpec.iamInstanceProfile: no IAM instance profile provided: nodes may be unable to join the cluster machineset.machine.openshift.io/miyadav-aws-26-sq5k5-worker-bug created Additional Info: Warning message displayed as expected moved to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438