Bug 1597904 - OpenShift on Openstack - pending csrs on scaleup
Summary: OpenShift on Openstack - pending csrs on scaleup
Keywords:
Status: CLOSED DUPLICATE of bug 1597908
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Simo Sorce
QA Contact: Chuan Yu
URL:
Whiteboard:
: 1597902 1597903 1597906 1597907 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-03 21:36 UTC by Matt Bruzek
Modified: 2018-07-05 01:39 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-05 01:39:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Matt Bruzek 2018-07-03 21:36:41 UTC
Description of problem:

We have automation to install OpenShift on OpenStack in a repeatable way. The recent 3.10 install completes successfully. On the attempt to scale to 250 nodes our install gets stuck on the approval step and I see several hundred Pending certificate signing request (csr)s. 

The scaleup operation ran until about 161 nodes and eventually failed to approve nodes. The log message was:

TASK [Approve bootstrap nodes] *************************************************
task path: /home/cloud-user/openshift-ansible/playbooks/openshift-node/private/join.yml:40

Version-Release number of selected component (if applicable):
$ oc version
oc v3.10.10
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://lb-0.scale-ci.example.com:8443
openshift v3.10.10
kubernetes v1.10.0+b81c8f8

$ git describe
v3.10.0-rc.0-115-g1d59617

How reproducible: We can often get this csr problem.


Steps to Reproduce:
1. Install OpenStack
2. Install OpenShift on OpenStack
3. Attempt to scale up to 250 nodes and notice the failure to approve nodes. 

Actual results:

The openshift-ansible playbook openshift-ansible/playbooks/openshift-node/scaleup.yml fails with the following error:


TASK [Approve bootstrap nodes] *************************************************
task path: /home/cloud-user/openshift-ansible/playbooks/openshift-node/private/join.yml:40
Tuesday 03 July 2018  12:56:29 -0400 (0:00:00.179)       0:08:23.501 **********
fatal: [master-1.scale-ci.example.com]: FAILED! => {"changed": true, "finished": false, "msg": "Timed out accepting certificate signing requests. Failing as requested.

When I went to the cluster I saw just over 500 csrs in "Pending" state.

root@master-1: /home/openshift # oc get csr --all-namespaces | grep Pending | wc -l                                                       
507 

Expected results:
I expected the scale up to succeed.

Additional info:

I will attach the logs in further comments.

Comment 1 Xiaoli Tian 2018-07-05 01:36:39 UTC
*** Bug 1597907 has been marked as a duplicate of this bug. ***

Comment 2 Xiaoli Tian 2018-07-05 01:36:44 UTC
*** Bug 1597906 has been marked as a duplicate of this bug. ***

Comment 3 Xiaoli Tian 2018-07-05 01:37:34 UTC
*** Bug 1597903 has been marked as a duplicate of this bug. ***

Comment 4 Xiaoli Tian 2018-07-05 01:37:37 UTC
*** Bug 1597902 has been marked as a duplicate of this bug. ***

Comment 5 Xiaoli Tian 2018-07-05 01:39:39 UTC

*** This bug has been marked as a duplicate of bug 1597908 ***


Note You need to log in before you can comment on or make changes to this bug.