Bug 1625817 - [3.10] Installation stuck at TASK [Approve node certificates when bootstrapping]
Summary: [3.10] Installation stuck at TASK [Approve node certificates when bootstrapping]
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.z
Assignee: Michael Gugino
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On: 1622945
Blocks: 1479956 1565405 1623204 1623248
TreeView+ depends on / blocked
 
Reported: 2018-09-06 03:18 UTC by Wei Sun
Modified: 2019-01-03 17:34 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The node CSR approval process has been refactored to address several process deficiencies. This process now approves certificates for relevant nodes and waits for the certificate to be verifiable via the API. In the event that this new process fails, the logs will include relevant debugging information required by support to diagnose any remaining issues. Please make sure you capture these logs and provide them to support in the event of a failure.
Clone Of: 1622945
Environment:
Last Closed: 2019-01-03 17:34:48 UTC
Target Upstream Version:
Embargoed:
sdodson: needinfo-


Attachments (Terms of Use)

Comment 4 Michael Gugino 2018-09-18 17:29:26 UTC
Need output of `oc get nodes` and `oc get csr -o yaml`

Need ansible-playbook -vvv output (that's 3 v's)

Comment 5 Manoj Kumar 2018-09-21 18:15:52 UTC
I can hit this every time on a Power 8 bare-metal node with OCP 3.10

Comment 6 Manoj Kumar 2018-09-21 18:17:53 UTC
[root@rhel-ocpapp2 openshift-ansible]# oc project openshift-sdn
Now using project "openshift-sdn" on server "https://rhel-ocpapp2:8443".
[root@rhel-ocpapp2 openshift-ansible]# oc get all
NAME            READY     STATUS             RESTARTS   AGE
pod/ovs-j25wz   1/1       Running            0          5m
pod/sdn-h9c8k   0/1       CrashLoopBackOff   6          5m

NAME                 DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/ovs   1         1         1         1            1           <none>          5m
daemonset.apps/sdn   1         1         0         1            0           <none>          5m

NAME                                  DOCKER REPO                                           TAGS      UPDATED
imagestream.image.openshift.io/node   docker-registry.default.svc:5000/openshift-sdn/node   v3.10     5 minutes ago
[root@rhel-ocpapp2 openshift-ansible]# oc logs -f pod/sdn-h9c8k
Error from server: Get https://rhel-ocpapp2:10250/containerLogs/openshift-sdn/sdn-h9c8k/sdn?follow=true: remote error: tls: internal error

Comment 7 Scott Dodson 2018-09-27 13:39:12 UTC
A number of CSR approval changes have been backported from 3.11 to 3.10 and may have addressed this. Can we please test with the latest 3.10 code.

Comment 8 Manoj Kumar 2018-09-27 13:51:47 UTC
Willing to test it out on Power, if you can drop me the changes.

Comment 9 Weihua Meng 2018-10-01 05:06:29 UTC
I tried on different metrics, not hit this issue.
openshift-ansible-3.10.51-1.git.0.44a646c.el7.noarch
x86
EC2, GCP, OpenStack
docker, cri-o
HA, none-HA
with/without proxy
with/without system-container


Note You need to log in before you can comment on or make changes to this bug.