Bug 1932154 - [AWS ] machine stuck in provisioned phase , no warnings or errors
Summary: [AWS ] machine stuck in provisioned phase , no warnings or errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.8.0
Assignee: Michael McCune
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-24 05:25 UTC by Milind Yadav
Modified: 2021-07-27 22:48 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Missing iamInstanceProfile in awsproviderconfig.openshift.io resource of MachineSet. Consequence: Machine is not able to pass "Provisioning" phase and join the OpenShift cluster as a node. Fix: A warning has been added in cases where the iamInstanceProfile is not provided. Result: User has a clear indication of what has caused the Machine to fail to join the cluster.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:48:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 824 0 None open BUG 1932154: add warning for missing IAMInstanceProfile in AWS 2021-03-08 19:55:28 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:17 UTC

Description Milind Yadav 2021-02-24 05:25:42 UTC
Machine stuck in provisioned state , no warning or  error message in machine-controller logs, this happens while trying to use default settings for machineset


Release : Cluster version is 4.8.0-0.nightly-2021-02-23-013453

Always Reproducible

Steps :
Use below yaml to create machineset 
https://privatebin-it-iso.int.open.paas.redhat.com/?fa8e071584ae0242#GvPoSkPYNwtJbG5FF5v3iGS2ia19Bej452daxsarjkZH



Actual and expected :
Machineset created successfully

2.Check machine status
Expected : Machine should be in Running status with node attached , if no errors or warnings in logs 
Actual : Machine stuck in Provisioned phase , no errors in logs

https://privatebin-it-iso.int.open.paas.redhat.com/?0ae78ca18f0e9838#3eAP5pFgt5VCXweXBfkicA3EB4t6L6SZvXLTYXMisfJP


[miyadav@miyadav aws]$ oc get machines
.
.

NAME                                               PHASE         TYPE        REGION      tenancy-dedicated-37132-f42zh                      Provisioned   m5.large    us-east-2   us-east-2a   20m

Additional info:
Must - gather - https://drive.google.com/file/d/1qFvlpsKg3Rk0GzbsQCnbg1LWOLQ1n2O-/view?usp=sharing

Comment 1 Michael McCune 2021-02-24 14:48:54 UTC
hi Milind, just starting to take a look at this.

i am looking through the must-gather and i don't see any CertificateSigningRequests for the new machines you are creating. would it be possible for you to get check the cluster to see if those machines ever made a CSR (oc get csr)?

if no CSRs were generated, then i think this is an issue with the kubelet or node startup process, see [0] for more details.

[0] https://github.com/openshift/machine-api-operator/blob/master/docs/user/TroubleShooting.md#machine-status-phase-provisioned

Comment 2 Milind Yadav 2021-02-25 04:44:01 UTC
Thanks Michael , I could find that the yaml need to have : (apiversion and iaminstanceProfile info)
          
          apiVersion: awsproviderconfig.openshift.io/v1beta1 
          iamInstanceProfile:
            id: miyadav-oc48-2502-f5l7x-worker-profile

to be able to generate a csr , when we do not pass it , the csr is not generated , but we dont get any error message or warning , after I added it to machineset yaml and scaled machineset , new machine was provisioned successfully and node was attached to it (in READY state)

Let me know if wish to check something more in this.

So basically we encountered this as we are trying to use default values which does not have iamInstanceProfile

Comment 3 Joel Speed 2021-02-25 10:09:24 UTC
Looks like we need to add a warning if the iamInstanceProfile is missing, as we do for the service account on GCP. 
This would tell clients that the machine may not join the cluster if the instance profile is not provided and should be enough of a hint to users that they should include this

Comment 4 Michael McCune 2021-02-25 14:35:16 UTC
sounds good Joel, i wonder if there shouldn't be something in the product documentation about this too?

Comment 5 Michael McCune 2021-02-25 14:45:47 UTC
just confirmed that these entries are in the product doc example, i'm going to work on adding the warning in a similar manner as the gcp provider.

Comment 8 Milind Yadav 2021-03-26 10:38:36 UTC
Validated at :

[miyadav@miyadav ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-03-26-002831   True        False         4h50m   Error while reconciling 4.8.0-0.nightly-2021-03-26-002831: the cluster operator etcd is degraded
[miyadav@miyadav ~]$ 

Steps : created machineset without iaminstance profile in the yaml 

Actual and expected results :

[miyadav@miyadav ~]$ oc create -f rhv/aws/bugval.yml 
W0326 16:03:02.423741   45575 warnings.go:67] providerSpec.iamInstanceProfile: no IAM instance profile provided: nodes may be unable to join the cluster
machineset.machine.openshift.io/miyadav-aws-26-sq5k5-worker-bug created


Additional Info:

Warning message displayed as expected

moved to VERIFIED

Comment 11 errata-xmlrpc 2021-07-27 22:48:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.