Hide Forgot
Description of problem: node does not join OCP cluster after scaleup playbook finished successfully Version-Release number of the following components: openshift-ansible-4.1.0-201904201251.git.148.6de1227.el7.noarch $ ansible --version ansible 2.7.9 config file = /etc/ansible/ansible.cfg configured module search path = ['/home/wmeng/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/local/lib/python3.6/site-packages/ansible executable location = /usr/local/bin/ansible python version = 3.6.6 (default, Mar 29 2019, 00:03:27) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] How reproducible: Always Steps to Reproduce: 1. run scaleup playbook to add new RHEL node to existing OCP4 cluster $ ansible-playbook -vvv -i ~/hosts playbooks/scaleup.yml 2. check cluster status after playbook finished successfully. Actual results: ### before scaleup: $ oc get nodes NAME STATUS ROLES AGE VERSION ip-172-31-128-63.ap-northeast-1.compute.internal Ready worker 87m v1.13.4+da48e8391 ip-172-31-130-205.ap-northeast-1.compute.internal Ready master 94m v1.13.4+da48e8391 ip-172-31-149-11.ap-northeast-1.compute.internal Ready master 94m v1.13.4+da48e8391 ip-172-31-154-138.ap-northeast-1.compute.internal Ready worker 87m v1.13.4+da48e8391 ip-172-31-164-59.ap-northeast-1.compute.internal Ready worker 87m v1.13.4+da48e8391 ip-172-31-169-48.ap-northeast-1.compute.internal Ready master 94m v1.13.4+da48e8391 ### after scaleup $ oc get nodes NAME STATUS ROLES AGE VERSION ip-172-31-128-63.ap-northeast-1.compute.internal Ready worker 178m v1.13.4+da48e8391 ip-172-31-130-205.ap-northeast-1.compute.internal Ready master 3h5m v1.13.4+da48e8391 ip-172-31-149-11.ap-northeast-1.compute.internal Ready master 3h5m v1.13.4+da48e8391 ip-172-31-154-138.ap-northeast-1.compute.internal Ready worker 178m v1.13.4+da48e8391 ip-172-31-164-59.ap-northeast-1.compute.internal Ready worker 178m v1.13.4+da48e8391 ip-172-31-169-48.ap-northeast-1.compute.internal Ready master 3h5m v1.13.4+da48e8391 Expected results: new RHEL node should show up in cluster by oc get nodes
The kubelet is failing to start due to an AWS issue. Apr 22 11:20:08 ip-172-31-12-224.ap-northeast-1.compute.internal hyperkube[20023]: F0422 11:20:08.216862 20023 server.go:264] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0f03fe4975e10d1fa: "error listing AWS instances: \"NoCredentialProviders: no valid providers in chain. Deprecated.\\n\\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors\"" Is this a valid instance in the region? Could be a possible IAM permissions issue in the region as well. Could you try again with a fresh node? This one will not bootstrap due to expired certs.
What kind if instance is valid instance? I tried again for another fresh node, got the same result. steps to create host, choose rhel7 AMI in the same region with the existing cluster, launch it in the same VPC of the cluster, and choose public subnet use the same Security Group as other worker node, and open SSH port for ansible
In AWS, the RHEL7 node provisioned must have an IAM role attached with a policy to allow describing all EC2 instances. A new role can be created or you can use the role created by the installer which will have the format <cluster_name>-xxxxx-worker-role. This role will allow the kublet to query the AWS API.
RHEL7 host join the cluster successfully after IAM assigned and TAG added $ oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-135-71.ap-northeast-1.compute.internal Ready worker 3h30m v1.13.4+da48e8391 172.31.135.71 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-172-31-143-235.ap-northeast-1.compute.internal Ready master 3h38m v1.13.4+da48e8391 172.31.143.235 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-172-31-147-96.ap-northeast-1.compute.internal Ready worker 3h30m v1.13.4+da48e8391 172.31.147.96 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-172-31-151-240.ap-northeast-1.compute.internal Ready master 3h38m v1.13.4+da48e8391 172.31.151.240 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-172-31-169-106.ap-northeast-1.compute.internal Ready worker 3h30m v1.13.4+da48e8391 172.31.169.106 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-172-31-175-155.ap-northeast-1.compute.internal Ready master 3h38m v1.13.4+da48e8391 172.31.175.155 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-172-31-29-93.ap-northeast-1.compute.internal Ready worker 52m v1.13.4+8730f3882 172.31.29.93 46.51.238.198 Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.1.3.el7.x86_64 cri-o://1.13.6-1.dev.rhaos4.1.gitee2e748.el7-dev
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758