Bug 1695522 - [OSP] All CSRs are pending due to machine controller not work well
Summary: [OSP] All CSRs are pending due to machine controller not work well
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Eric Duen
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-03 09:32 UTC by weiwei jiang
Modified: 2019-10-16 06:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
CLOSED / CURRENTRELEASE
Clone Of:
Environment:
Last Closed: 2019-10-16 06:28:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:28:22 UTC

Description weiwei jiang 2019-04-03 09:32:53 UTC
Description of problem:
When trying to setup OCP cluster, openshift-install exit with 
INFO Waiting up to 30m0s for the cluster at https://api.wjiang-ocp.shiftstack.com:6443 to initialize...
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 87% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 89% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 90% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 92% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 92% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 92% complete
DEBUG Still waiting for the cluster to initialize: Could not update rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (231 of 310): the server has forbidden updates to this resource                                                                  
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 97% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 98% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.8: 98% complete
FATAL failed to initialize the cluster: timed out waiting for the condition


[openshift@dhcp-140-70 installer]$ oc get csr 
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-25kqd   45m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-2qjbq   56m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-42ztb   6m37s   system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-4j5th   89m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-4t5qs   80m     system:node:host-192-168-0-21                                               Pending
csr-5kqzk   9m57s   system:node:host-192-168-0-21                                               Pending
csr-5xt7p   10m     system:node:host-192-168-0-5                                                Pending
csr-69jrw   81m     system:node:host-192-168-0-5                                                Pending
csr-6cv5z   79m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-6kd74   6m10s   system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-74zvh   67m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-7twf2   67m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-8wdfp   44m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-8zfjr   89m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-9v2qh   81m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-cc8l8   56m     system:node:host-192-168-0-21                                               Pending
csr-cf98z   89m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-ckqwc   31m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-fgt9r   49m     system:node:host-192-168-0-5                                                Pending
csr-fjsln   35m     system:node:host-192-168-0-21                                               Pending
csr-fknp5   18m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-g45fh   85m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-g9jnp   23m     system:node:host-192-168-0-5                                                Pending
csr-gmsgf   58m     system:node:host-192-168-0-18                                               Pending
csr-hgdgr   85m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-hq7pv   89m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-hrn6z   89m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-jdrnm   56m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-jvhbn   22m     system:node:host-192-168-0-21                                               Pending
csr-k597c   33m     system:node:host-192-168-0-18                                               Pending
csr-k5xkc   68m     system:node:host-192-168-0-21                                               Pending
csr-lhvnb   6m42s   system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-m78th   20m     system:node:host-192-168-0-18                                               Pending
csr-mcvsw   56m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-mghxk   19m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-nfblq   59m     system:node:host-192-168-0-5                                                Pending
csr-nzvcw   45m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-pgdrw   85m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-pv49t   19m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-q46fn   81m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-q65qz   79m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-qdn28   72m     system:node:host-192-168-0-5                                                Pending
csr-qj7vf   80m     system:node:host-192-168-0-18                                               Pending
csr-qp5m4   36m     system:node:host-192-168-0-5                                                Pending
csr-qqhxc   48m     system:node:host-192-168-0-21                                               Pending
csr-qrdst   32m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending
csr-rp6zr   80m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-ssld4   32m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-tl6qb   46m     system:node:host-192-168-0-18                                               Pending
csr-tvgdx   80m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-wbgjj   79m     system:node:wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com             Pending
csr-xqkbh   7m30s   system:node:host-192-168-0-18                                               Pending
csr-xqsrv   89m     system:node:wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com             Pending
csr-z9k7d   71m     system:node:host-192-168-0-18                                               Pending
csr-zm6l2   67m     system:node:wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com             Pending

[openshift@dhcp-140-70 installer]$ oc get nodes 
NAME                                                  STATUS   ROLES    AGE   VERSION
host-192-168-0-18                                     Ready    worker   81m   v1.12.4+30e6a0f55
host-192-168-0-21                                     Ready    worker   80m   v1.12.4+30e6a0f55
host-192-168-0-5                                      Ready    worker   81m   v1.12.4+30e6a0f55
wjiang-ocp-66jxn-master-0.wjiang-ocp.shiftstack.com   Ready    master   90m   v1.12.4+30e6a0f55
wjiang-ocp-66jxn-master-1.wjiang-ocp.shiftstack.com   Ready    master   90m   v1.12.4+30e6a0f55
wjiang-ocp-66jxn-master-2.wjiang-ocp.shiftstack.com   Ready    master   89m   v1.12.4+30e6a0f55
[openshift@dhcp-140-70 installer]$ oc get machine -n openshift-machine-api
NAME                            INSTANCE   STATE   TYPE   REGION   ZONE   AGE
wjiang-ocp-66jxn-master-0                                                 89m
wjiang-ocp-66jxn-master-1                                                 89m
wjiang-ocp-66jxn-master-2                                                 89m
wjiang-ocp-66jxn-worker-5p5sr                                             89m
wjiang-ocp-66jxn-worker-gxj2p                                             89m
wjiang-ocp-66jxn-worker-m7v2q                                             89m

[core@wjiang-ocp-66jxn-master-1 ~]$ sudo crictl ps |grep -i machine-controller
89073cc921495       9daf9179643571b80d722e20b62c58609dec36d3b94a12d5ddb957992cd39d97                                                         3 hours ago         Running             machine-controller                       0                   c5b5232d3f079
[core@wjiang-ocp-66jxn-master-1 ~]$ sudo crictl logs  89073cc921495 2>&1 |grep -i E0403 
E0403 06:35:44.249839       1 controller.go:169] Error checking existence of machine instance for machine object wjiang-ocp-66jxn-worker-gxj2p; Create providerClient err: Authentication failed
E0403 07:09:06.002854       1 controller.go:169] Error checking existence of machine instance for machine object wjiang-ocp-66jxn-master-2; Create providerClient err: Internal Server Error
E0403 07:18:25.513596       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=441, ErrCode=NO_ERROR, debug=""
E0403 07:18:25.517014       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=441, ErrCode=NO_ERROR, debug=""
E0403 07:19:11.299273       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=19, ErrCode=NO_ERROR, debug=""
E0403 07:19:11.299888       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=19, ErrCode=NO_ERROR, debug=""
E0403 07:32:09.901915       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=39, ErrCode=NO_ERROR, debug=""
E0403 07:32:09.901956       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=39, ErrCode=NO_ERROR, debug=""
E0403 07:32:58.257573       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=23, ErrCode=NO_ERROR, debug=""
E0403 07:32:58.259924       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=23, ErrCode=NO_ERROR, debug=""
E0403 08:33:21.974827       1 controller.go:169] Error checking existence of machine instance for machine object wjiang-ocp-66jxn-master-1; Create providerClient err: Authentication failed

openstack credenticals for installer
cat ~/.config/openstack/clouds.yaml
clouds:
  openstack:
    auth:
      auth_url: https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13000/v3
      username: "wjiang"
      password: "xxxxxxxxx"
      project_id: 542c6ebd48bf40fa857fc245c7572e30
      project_name: "openshift-qe-jenkins"
      user_domain_name: "redhat.com"
    region_name: "regionOne"
    interface: "public"
    identity_api_version: 3

Decode from ` oc get secret -n kube-system openstack-credentials`
clouds:
  openstack:
    auth:
      application_credential_id: ""
      application_credential_name: ""
      application_credential_secret: ""
      auth_url: https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13000/v3
      default_domain: ""
      domain_id: ""
      domain_name: ""
      password: xxxxxxxx
      project_domain_id: ""
      project_domain_name: ""
      project_id: 542c6ebd48bf40fa857fc245c7572e30
      project_name: openshift-qe-jenkins
      token: ""
      user_domain_id: ""
      user_domain_name: redhat.com
      user_id: ""
      username: wjiang
    auth_type: ""
    cacert: ""
    cert: ""
    cloud: ""
    identity_api_version: "3"
    key: ""
    profile: ""
    region_name: regionOne
    regions: null
    verify: true
    volume_api_version: ""


Version-Release number of the following components:
DEBUG OpenShift Installer v0.15.0-dirty                                                                   
DEBUG Built from commit 462d548ed561aab9b5866c9a612fc85d1a47419f
$ oc get clusterversion version -o template --template={{.status.desired}}
map[image:quay.io/openshift-release-dev/ocp-release@sha256:358585fa0d2e709ce3964a245474b49b4360d8946455ab5b0467a11b135a21df version:4.0.0-0.8]

How reproducible:
Always

Steps to Reproduce:
1. Try to setup OCP cluster with OSP provider
2. Check if openshift-install exit successfully
3.

Actual results:
2. openshift-install exit with "failed to initialize the cluster: timed out waiting for the condition", and all the CSRs are in pending status

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 weiwei jiang 2019-04-08 02:29:00 UTC
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1694303

Comment 5 weiwei jiang 2019-08-01 06:43:33 UTC
Checked with 4.2.0-0.nightly-2019-07-31-162901, this issue has been fixed.

➜  ~ oc get nodes -o wide 
NAME                                STATUS   ROLES    AGE   VERSION             INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION               CONTAINER-RUNTIME
wjiangosp0801d-9pkp4-master-0       Ready    master   46m   v1.14.0+c569285e9   192.168.0.14   <none>        Red Hat Enterprise Linux CoreOS 42.80.20190731.2 (Ootpa)   4.18.0-80.7.1.el8_0.x86_64   cri-o://1.14.10-0.5.dev.rhaos4.2.gitcf4220b.el8-dev
wjiangosp0801d-9pkp4-master-1       Ready    master   46m   v1.14.0+c569285e9   192.168.0.6    <none>        Red Hat Enterprise Linux CoreOS 42.80.20190731.2 (Ootpa)   4.18.0-80.7.1.el8_0.x86_64   cri-o://1.14.10-0.5.dev.rhaos4.2.gitcf4220b.el8-dev
wjiangosp0801d-9pkp4-master-2       Ready    master   46m   v1.14.0+c569285e9   192.168.0.5    <none>        Red Hat Enterprise Linux CoreOS 42.80.20190731.2 (Ootpa)   4.18.0-80.7.1.el8_0.x86_64   cri-o://1.14.10-0.5.dev.rhaos4.2.gitcf4220b.el8-dev
wjiangosp0801d-9pkp4-worker-7nxtm   Ready    worker   37m   v1.14.0+c569285e9   192.168.0.26   <none>        Red Hat Enterprise Linux CoreOS 42.80.20190731.2 (Ootpa)   4.18.0-80.7.1.el8_0.x86_64   cri-o://1.14.10-0.5.dev.rhaos4.2.gitcf4220b.el8-dev
wjiangosp0801d-9pkp4-worker-987f7   Ready    worker   35m   v1.14.0+c569285e9   192.168.0.7    <none>        Red Hat Enterprise Linux CoreOS 42.80.20190731.2 (Ootpa)   4.18.0-80.7.1.el8_0.x86_64   cri-o://1.14.10-0.5.dev.rhaos4.2.gitcf4220b.el8-dev
wjiangosp0801d-9pkp4-worker-w2x8k   Ready    worker   35m   v1.14.0+c569285e9   192.168.0.10   <none>        Red Hat Enterprise Linux CoreOS 42.80.20190731.2 (Ootpa)   4.18.0-80.7.1.el8_0.x86_64   cri-o://1.14.10-0.5.dev.rhaos4.2.gitcf4220b.el8-dev
➜  ~ oc get machine --all-namespaces
NAMESPACE               NAME                                INSTANCE   STATE   TYPE   REGION   ZONE   AGE
openshift-machine-api   wjiangosp0801d-9pkp4-master-0                                                 45m
openshift-machine-api   wjiangosp0801d-9pkp4-master-1                                                 45m
openshift-machine-api   wjiangosp0801d-9pkp4-master-2                                                 45m
openshift-machine-api   wjiangosp0801d-9pkp4-worker-7nxtm                                             41m
openshift-machine-api   wjiangosp0801d-9pkp4-worker-987f7                                             41m
openshift-machine-api   wjiangosp0801d-9pkp4-worker-w2x8k                                             41m
➜  ~ oc get machineset --all-namespaces
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   wjiangosp0801d-9pkp4-worker   3         3         3       3           45m
➜  ~ oc get csr 
NAME        AGE   REQUESTOR                                                                   CONDITION
csr-7gqff   36m   system:node:wjiangosp0801d-9pkp4-worker-987f7                               Approved,Issued
csr-b62tw   37m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-bpbw7   47m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-cf5mw   38m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-cl7vs   47m   system:node:wjiangosp0801d-9pkp4-master-1                                   Approved,Issued
csr-cp78k   37m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-g29hd   37m   system:node:wjiangosp0801d-9pkp4-worker-7nxtm                               Approved,Issued
csr-nxfhs   47m   system:node:wjiangosp0801d-9pkp4-master-2                                   Approved,Issued
csr-nzp8s   47m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-vd22k   36m   system:node:wjiangosp0801d-9pkp4-worker-w2x8k                               Approved,Issued
csr-w6rlb   47m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-zdjfj   47m   system:node:wjiangosp0801d-9pkp4-master-0                                   Approved,Issued
➜  ~ oc get clusterversion 
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          47m     Unable to apply 4.2.0-0.nightly-2019-07-31-162901: the cluster operator image-registry has not yet successfully rolled out

Comment 6 Eric Duen 2019-08-05 18:59:45 UTC
CSR's not being approved has recently been fixed.  Returning to QE to validate.

Comment 7 weiwei jiang 2019-08-06 08:56:15 UTC
Verified on 4.2.0-0.nightly-2019-08-05-223032, this is no longer an issue now.

Comment 8 errata-xmlrpc 2019-10-16 06:28:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.