Bug 1614876 - Installer fails with "Wait for all control plane pods to become ready"
Summary: Installer fails with "Wait for all control plane pods to become ready"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.11.0
Assignee: Vadim Rutkovsky
QA Contact: Weihua Meng
URL:
Whiteboard: aos-scalability-311
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-10 15:24 UTC by Matt Bruzek
Modified: 2018-10-11 07:25 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:24:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:25:04 UTC

Description Matt Bruzek 2018-08-10 15:24:21 UTC
Description of problem:

Running the installer:

source /home/cloud-user/keystonerc; ansible-playbook -vvv --user openshift -i inventory -i openshift-ansible/playbooks/openstack/inventory.py openshift-ansible/playbooks/openstack/openshift-cluster/install.yml 2>&1 >> /home/cloud-user/logs/openshift_install.log

Failed with message:

  1. Hosts:    master-0.scale-ci.example.com, master-1.scale-ci.example.com, master-2.scale-ci.example.com
     Play:     Configure masters
     Task:     Wait for all control plane pods to become ready
     Message:  All items completed

It seems 2 of the 3 api server pods are in CrashLoopBackOff

root@master-0: ~ # oc get pods --all-namespaces
NAMESPACE     NAME                                               READY     STATUS             RESTARTS   AGE
kube-system   master-api-master-0.scale-ci.example.com           0/1       CrashLoopBackOff   19         1h
kube-system   master-api-master-1.scale-ci.example.com           1/1       Running            0          54m
kube-system   master-api-master-2.scale-ci.example.com           0/1       CrashLoopBackOff   19         55m
kube-system   master-controllers-master-0.scale-ci.example.com   1/1       Running            0          1h
kube-system   master-controllers-master-1.scale-ci.example.com   1/1       Running            1          55m
kube-system   master-controllers-master-2.scale-ci.example.com   1/1       Running            0          54m
kube-system   master-etcd-master-0.scale-ci.example.com          1/1       Running            0          1h
kube-system   master-etcd-master-1.scale-ci.example.com          1/1       Running            0          54m
kube-system   master-etcd-master-2.scale-ci.example.com          1/1       Running            0          54m

Looking at the logs for the pods it complains about "no RBAC policy matched"

# oc logs master-api-master-2.scale-ci.example.com -n kube-system

E0810 15:08:32.741192       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list services at the cluster scope: no RBAC policy matched
E0810 15:08:32.742424       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:anonymous" cannot list endpoints at the cluster scope: no RBAC policy matched
E0810 15:08:32.743343       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.ServiceAccount: serviceaccounts is forbidden: User "system:anonymous" cannot list serviceaccounts at the cluster scope: no RBAC policy matched
E0810 15:08:32.744451       1 reflector.go:136] github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: Failed to list *v1.OAuthClient: oauthclients.oauth.openshift.io is forbidden: User "system:anonymous" cannot list oauthclients.oauth.openshift.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.745902       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Namespace: namespaces is forbidden: User "system:anonymous" cannot list namespaces at the cluster scope: no RBAC policy matched
E0810 15:08:32.746846       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ClusterRole: clusterroles.rbac.authorization.k8s.io is forbidden: User "system:anonymous" cannot list clusterroles.rbac.authorization.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.748059       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Pod: pods is forbidden: User "system:anonymous" cannot list pods at the cluster scope: no RBAC policy matched
E0810 15:08:32.749296       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Service: services is forbidden: User "system:anonymous" cannot list services at the cluster scope: no RBAC policy matched
E0810 15:08:32.750361       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Role: roles.rbac.authorization.k8s.io is forbidden: User "system:anonymous" cannot list roles.rbac.authorization.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.751477       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *scheduling.PriorityClass: priorityclasses.scheduling.k8s.io is forbidden: User "system:anonymous" cannot list priorityclasses.scheduling.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.752297       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.RoleBinding: rolebindings.rbac.authorization.k8s.io is forbidden: User "system:anonymous" cannot list rolebindings.rbac.authorization.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.753720       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Endpoints: endpoints is forbidden: User "system:anonymous" cannot list endpoints at the cluster scope: no RBAC policy matched
E0810 15:08:32.754915       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *rbac.RoleBinding: rolebindings.rbac.authorization.k8s.io is forbidden: User "system:anonymous" cannot list rolebindings.rbac.authorization.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.756068       1 reflector.go:136] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:101: Failed to list *quota.ClusterResourceQuota: clusterresourcequotas.quota.openshift.io is forbidden: User "system:anonymous" cannot list clusterresourcequotas.quota.openshift.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.756945       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:anonymous" cannot list persistentvolumeclaims at the cluster scope: no RBAC policy matched
E0810 15:08:32.758118       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.LimitRange: limitranges is forbidden: User "system:anonymous" cannot list limitranges at the cluster scope: no RBAC policy matched
E0810 15:08:32.759126       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.PersistentVolume: persistentvolumes is forbidden: User "system:anonymous" cannot list persistentvolumes at the cluster scope: no RBAC policy matched
E0810 15:08:32.760209       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.ResourceQuota: resourcequotas is forbidden: User "system:anonymous" cannot list resourcequotas at the cluster scope: no RBAC policy matched
E0810 15:08:32.761333       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:anonymous" cannot list clusterrolebindings.rbac.authorization.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.762965       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Node: nodes is forbidden: User "system:anonymous" cannot list nodes at the cluster scope: no RBAC policy matched
E0810 15:08:32.763644       1 reflector.go:136] github.com/openshift/origin/pkg/security/generated/informers/internalversion/factory.go:101: Failed to list *security.SecurityContextConstraints: securitycontextconstraints.security.openshift.io is forbidden: User "system:anonymous" cannot list securitycontextconstraints.security.openshift.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.764672       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Namespace: namespaces is forbidden: User "system:anonymous" cannot list namespaces at the cluster scope: no RBAC policy matched
E0810 15:08:32.765711       1 reflector.go:136] github.com/openshift/client-go/user/informers/externalversions/factory.go:101: Failed to list *v1.Group: groups.user.openshift.io is forbidden: User "system:anonymous" cannot list groups.user.openshift.io at the cluster scope: no RBAC policy matched
E0810 15:08:32.766835       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:anonymous" cannot list volumeattachments.storage.k8s.io at the cluster scope: no RBAC policy matched
E0810 15:08:33.658089       1 oauth_apiserver.go:242] oauthclients.oauth.openshift.io is forbidden: User "system:anonymous" cannot create oauthclients.oauth.openshift.io at the cluster scope: no RBAC policy matched


Version-Release number of the following components:
Linux 3.10.0-862.1.1.el7.x86_64 x86_64
NAME="Red Hat Enterprise Linux Server"
VERSION="7.5 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.5"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.5 (Maipo)"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.5
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.5"
ansible 2.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 16 2018, 19:52:45) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

Using openshift-ansible master git branch to install 3.11 here is git describe:

openshift-ansible-3.11.0-0.13.0-13-g4b0bac1

How reproducible: Have only hit this once.

Steps to Reproduce:
1. Install OpenStack 
2. Install OpenShift on OpenStack
3. Notice the error in the log file

Actual results:

TASK [openshift_control_plane : Wait for all control plane pods to become ready] ***
task path: /home/cloud-user/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:256

... retried 50 some times with verbose on it was several hundreds of lines...

Failure summary:


  1. Hosts:    master-0.scale-ci.example.com, master-1.scale-ci.example.com, master-2.scale-ci.example.com
     Play:     Configure masters
     Task:     Wait for all control plane pods to become ready
     Message:  All items completed

Expected results:

Expected the installer to complete successfully

Comment 2 Matt Bruzek 2018-08-10 16:07:03 UTC
Ran the install again and hit the same problem different pods are failing but similar error message:

root@master-0: ~ # oc get pods --all-namespaces
NAMESPACE     NAME                                               READY     STATUS             RESTARTS   AGE
kube-system   master-api-master-0.scale-ci.example.com           0/1       CrashLoopBackOff   7          4m
kube-system   master-api-master-1.scale-ci.example.com           0/1       CrashLoopBackOff   7          4m
kube-system   master-api-master-2.scale-ci.example.com           1/1       Running            0          3m
kube-system   master-controllers-master-0.scale-ci.example.com   1/1       Running            0          3m
kube-system   master-controllers-master-1.scale-ci.example.com   1/1       Running            0          3m
kube-system   master-controllers-master-2.scale-ci.example.com   1/1       Running            0          4m
kube-system   master-etcd-master-0.scale-ci.example.com          1/1       Running            0          3m
kube-system   master-etcd-master-1.scale-ci.example.com          1/1       Running            0          3m
kube-system   master-etcd-master-2.scale-ci.example.com          1/1       Running            0          3m

# oc logs master-api-master-0.scale-ci.example.com -n kube-system

E0810 16:04:47.114516       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *scheduling.PriorityClass: priorityclasses.scheduling.k8s.io is forbidden: User "system:anonymous" cannot list priorityclasses.scheduling.k8s.io at the cluster scope: no RBAC policy matched
E0810 16:04:47.115348       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:anonymous" cannot list endpoints at the cluster scope: no RBAC policy matched
E0810 16:04:47.116387       1 reflector.go:136] github.com/openshift/origin/pkg/image/generated/informers/internalversion/factory.go:101: Failed to list *image.ImageStream: imagestreams.image.openshift.io is forbidden: User "system:anonymous" cannot list imagestreams.image.openshift.io at the cluster scope: no RBAC policy matched
E0810 16:04:47.117570       1 reflector.go:136] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.PersistentVolume: persistentvolumes is forbidden: User "system:anonymous" cannot list persistentvolumes at the cluster scope: no RBAC policy matched
E0810 16:04:47.118665       1 reflector.go:136] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:101: Failed to list *quota.ClusterResourceQuota: clusterresourcequotas.quota.openshift.io is forbidden: User "system:anonymous" cannot list clusterresourcequotas.quota.openshift.io at the cluster scope: no RBAC policy matched

Comment 5 Matt Bruzek 2018-08-10 20:50:07 UTC
I ran an install with the change https://github.com/openshift/openshift-ansible/pull/9528 and did not see the same error. I believe this fixes the problem please merge that one to master.

Comment 7 Johnny Liu 2018-08-14 10:23:20 UTC
Seem like I also hit the same issue since openshift-ansible-3.11.0-0.13.0 installer.

Only 1 master api is running, the other master go to bad state.
# oc get po -n kube-system
NAME                                               READY     STATUS             RESTARTS   AGE
master-api-ip-172-18-1-52.ec2.internal             0/1       Running            0          4m
master-api-ip-172-18-10-238.ec2.internal           1/1       Running            0          2h
master-api-ip-172-18-7-51.ec2.internal             0/1       CrashLoopBackOff   37         2h
master-controllers-ip-172-18-1-52.ec2.internal     1/1       Running            0          2h
master-controllers-ip-172-18-10-238.ec2.internal   1/1       Running            0          2h
master-controllers-ip-172-18-7-51.ec2.internal     1/1       Running            0          1h
master-etcd-ip-172-18-1-52.ec2.internal            1/1       Running            0          2h
master-etcd-ip-172-18-10-238.ec2.internal          1/1       Running            0          2h
master-etcd-ip-172-18-7-51.ec2.internal            1/1       Running            0          1h

# oc get csr
NAME        AGE       REQUESTOR                                   CONDITION
csr-47pvf   5m        system:node:ip-172-18-7-51.ec2.internal     Pending
csr-4dkls   40m       system:node:ip-172-18-1-52.ec2.internal     Pending
csr-4px65   1h        system:admin                                Pending
csr-4pzdj   1h        system:node:ip-172-18-7-51.ec2.internal     Pending
csr-4r698   1m        system:node:ip-172-18-1-52.ec2.internal     Pending
csr-5678j   1h        system:node:ip-172-18-7-51.ec2.internal     Pending
csr-5zxmk   1h        system:admin                                Approved,Issued
csr-6dg65   1m        system:node:ip-172-18-10-238.ec2.internal   Pending
csr-6wvkl   27m       system:node:ip-172-18-10-238.ec2.internal   Pending
csr-6xk8g   1h        system:admin                                Pending
csr-75d4v   1h        system:admin                                Approved,Issued
csr-79mjs   27m       system:node:ip-172-18-1-52.ec2.internal     Pending
csr-85kjn   1h        system:admin                                Approved,Issued
csr-9mgnq   31m       system:node:ip-172-18-7-51.ec2.internal     Pending
csr-fvtj2   1h        system:node:ip-172-18-7-51.ec2.internal     Pending
csr-jgc7w   14m       system:node:ip-172-18-1-52.ec2.internal     Pending
csr-jxb76   14m       system:node:ip-172-18-10-238.ec2.internal   Pending
csr-ksg6p   1h        system:admin                                Pending
csr-n4svm   1h        system:node:ip-172-18-1-52.ec2.internal     Pending
csr-n98ns   1h        system:node:ip-172-18-10-238.ec2.internal   Pending
csr-nctbz   1h        system:node:ip-172-18-10-238.ec2.internal   Pending
csr-qdftt   1h        system:node:ip-172-18-1-52.ec2.internal     Pending
csr-qfzpv   44m       system:node:ip-172-18-7-51.ec2.internal     Pending
csr-r97xk   1h        system:node:ip-172-18-7-51.ec2.internal     Pending
csr-rp2ln   1h        system:node:ip-172-18-7-51.ec2.internal     Pending
csr-rrxsb   1h        system:node:ip-172-18-1-52.ec2.internal     Pending
csr-rtzrp   57m       system:node:ip-172-18-7-51.ec2.internal     Pending
csr-tjddg   1h        system:node:ip-172-18-1-52.ec2.internal     Pending
csr-tkbb8   52m       system:node:ip-172-18-1-52.ec2.internal     Pending
csr-vf8zh   18m       system:node:ip-172-18-7-51.ec2.internal     Pending
csr-vm8w9   1h        system:node:ip-172-18-10-238.ec2.internal   Pending
csr-w984s   40m       system:node:ip-172-18-10-238.ec2.internal   Pending
csr-xftt8   52m       system:node:ip-172-18-10-238.ec2.internal   Pending
csr-zj9k4   1h        system:node:ip-172-18-10-238.ec2.internal   Pending

Comment 8 Scott Dodson 2018-08-14 21:25:10 UTC
Should be in openshift-ansible-3.11.0-0.15.0

Comment 9 Vadim Rutkovsky 2018-08-15 12:03:34 UTC
Fix is available in openshift-ansible-3.11.0-0.16.0

Comment 10 Weihua Meng 2018-08-16 05:10:23 UTC
Fixed.
openshift-ansible-3.11.0-0.16.0.git.0.e82689aNone.noarch
HA OCP installation succeeded.
cluster is working well.

Comment 12 errata-xmlrpc 2018-10-11 07:24:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.