1932154 – [AWS ] machine stuck in provisioned phase , no warnings or errors

Bug 1932154 - [AWS ] machine stuck in provisioned phase , no warnings or errors

Summary: [AWS ] machine stuck in provisioned phase , no warnings or errors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Michael McCune
QA Contact:	Milind Yadav
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-24 05:25 UTC by Milind Yadav
Modified:	2021-07-27 22:48 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Missing iamInstanceProfile in awsproviderconfig.openshift.io resource of MachineSet. Consequence: Machine is not able to pass "Provisioning" phase and join the OpenShift cluster as a node. Fix: A warning has been added in cases where the iamInstanceProfile is not provided. Result: User has a clear indication of what has caused the Machine to fail to join the cluster.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:48:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-api-operator pull 824	0	None	open	BUG 1932154: add warning for missing IAMInstanceProfile in AWS	2021-03-08 19:55:28 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:48:17 UTC

Description Milind Yadav 2021-02-24 05:25:42 UTC

Machine stuck in provisioned state , no warning or  error message in machine-controller logs, this happens while trying to use default settings for machineset


Release : Cluster version is 4.8.0-0.nightly-2021-02-23-013453

Always Reproducible

Steps :
Use below yaml to create machineset 
https://privatebin-it-iso.int.open.paas.redhat.com/?fa8e071584ae0242#GvPoSkPYNwtJbG5FF5v3iGS2ia19Bej452daxsarjkZH



Actual and expected :
Machineset created successfully

2.Check machine status
Expected : Machine should be in Running status with node attached , if no errors or warnings in logs 
Actual : Machine stuck in Provisioned phase , no errors in logs

https://privatebin-it-iso.int.open.paas.redhat.com/?0ae78ca18f0e9838#3eAP5pFgt5VCXweXBfkicA3EB4t6L6SZvXLTYXMisfJP


[miyadav@miyadav aws]$ oc get machines
.
.

NAME                                               PHASE         TYPE        REGION      tenancy-dedicated-37132-f42zh                      Provisioned   m5.large    us-east-2   us-east-2a   20m

Additional info:
Must - gather - https://drive.google.com/file/d/1qFvlpsKg3Rk0GzbsQCnbg1LWOLQ1n2O-/view?usp=sharing

Comment 1 Michael McCune 2021-02-24 14:48:54 UTC

hi Milind, just starting to take a look at this.

i am looking through the must-gather and i don't see any CertificateSigningRequests for the new machines you are creating. would it be possible for you to get check the cluster to see if those machines ever made a CSR (oc get csr)?

if no CSRs were generated, then i think this is an issue with the kubelet or node startup process, see [0] for more details.

[0] https://github.com/openshift/machine-api-operator/blob/master/docs/user/TroubleShooting.md#machine-status-phase-provisioned

Comment 2 Milind Yadav 2021-02-25 04:44:01 UTC

Thanks Michael , I could find that the yaml need to have : (apiversion and iaminstanceProfile info)
          
          apiVersion: awsproviderconfig.openshift.io/v1beta1 
          iamInstanceProfile:
            id: miyadav-oc48-2502-f5l7x-worker-profile

to be able to generate a csr , when we do not pass it , the csr is not generated , but we dont get any error message or warning , after I added it to machineset yaml and scaled machineset , new machine was provisioned successfully and node was attached to it (in READY state)

Let me know if wish to check something more in this.

So basically we encountered this as we are trying to use default values which does not have iamInstanceProfile

Comment 3 Joel Speed 2021-02-25 10:09:24 UTC

Looks like we need to add a warning if the iamInstanceProfile is missing, as we do for the service account on GCP. 
This would tell clients that the machine may not join the cluster if the instance profile is not provided and should be enough of a hint to users that they should include this

Comment 4 Michael McCune 2021-02-25 14:35:16 UTC

sounds good Joel, i wonder if there shouldn't be something in the product documentation about this too?

Comment 5 Michael McCune 2021-02-25 14:45:47 UTC

just confirmed that these entries are in the product doc example, i'm going to work on adding the warning in a similar manner as the gcp provider.

Comment 8 Milind Yadav 2021-03-26 10:38:36 UTC

Validated at :

[miyadav@miyadav ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-03-26-002831   True        False         4h50m   Error while reconciling 4.8.0-0.nightly-2021-03-26-002831: the cluster operator etcd is degraded
[miyadav@miyadav ~]$ 

Steps : created machineset without iaminstance profile in the yaml 

Actual and expected results :

[miyadav@miyadav ~]$ oc create -f rhv/aws/bugval.yml 
W0326 16:03:02.423741   45575 warnings.go:67] providerSpec.iamInstanceProfile: no IAM instance profile provided: nodes may be unable to join the cluster
machineset.machine.openshift.io/miyadav-aws-26-sq5k5-worker-bug created


Additional Info:

Warning message displayed as expected

moved to VERIFIED

Comment 11 errata-xmlrpc 2021-07-27 22:48:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.