Bug 1798272 - Workers node deployment on bare metal with IPv6 control plane is blocked because worker nodes CSRs are not automatically approved
Summary: Workers node deployment on bare metal with IPv6 control plane is blocked beca...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.4.0
Assignee: Derek Higgins
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: 1771572 1804278
TreeView+ depends on / blocked
 
Reported: 2020-02-05 02:14 UTC by Marius Cornea
Modified: 2020-05-04 11:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1804278 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:33:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ironic-ipa-downloader pull 27 0 None closed Bug1798272 : DHCPv6 set hostname and predictable hostname 2020-10-27 15:08:33 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:33:48 UTC

Description Marius Cornea 2020-02-05 02:14:16 UTC
Description of problem:

Workers node deployment on bare metal with IPv6 is blocked because openshift-machine-config-operator:node-bootstrapper CSRs are not automatically approved:


[kni@provisionhost-0 ~]$ oc -n openshift-machine-api get bmh
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                                    HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://[fd2e:6f44:5dd8:c956::1]:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://[fd2e:6f44:5dd8:c956::1]:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://[fd2e:6f44:5dd8:c956::1]:6232                      true     
openshift-worker-0   OK       provisioned              ocp-edge-cluster-worker-0-6qm56   ipmi://[fd2e:6f44:5dd8:c956::1]:6233   unknown            true     
openshift-worker-1   OK       provisioned              ocp-edge-cluster-worker-0-g8756   ipmi://[fd2e:6f44:5dd8:c956::1]:6234   unknown            true    

[kni@provisionhost-0 ~]$ oc -n openshift-machine-api get nodes
NAME                                          STATUS     ROLES    AGE   VERSION
master-0.ocp-edge-cluster.qe.lab.redhat.com   Ready      master   20m   v1.16.2
master-1.ocp-edge-cluster.qe.lab.redhat.com   Ready      master   20m   v1.16.2
master-2.ocp-edge-cluster.qe.lab.redhat.com   Ready      master   20m   v1.16.2

Checking the pending CSRs we can see openshift-machine-config-operator:node-bootstrapper:

[kni@provisionhost-0 ~]$ oc -n openshift-machine-api get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-kxk6f   20m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-lr9qq   20m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-mp552   20m     system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-nng8s   20m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-p24dr   20m     system:node:master-2.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-qdrj5   2m42s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-ss79l   2m48s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-zv8q6   20m     system:node:master-1.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued


After approving the node-bootstrapper pending CSRs we can see that the workers CSRs show up in Pending state: 

for csr in $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done
certificatesigningrequest.certificates.k8s.io/csr-qdrj5 approved
certificatesigningrequest.certificates.k8s.io/csr-ss79l approved

[kni@provisionhost-0 ~]$ oc -n openshift-machine-api get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-bhfzn   13s     system:node:worker-1.ocp-edge-cluster.qe.lab.redhat.com                     Pending
csr-bztkq   6s      system:node:worker-0.ocp-edge-cluster.qe.lab.redhat.com                     Pending
csr-kxk6f   21m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-lr9qq   22m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-mp552   21m     system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-nng8s   22m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-p24dr   22m     system:node:master-2.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-qdrj5   4m13s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-ss79l   4m19s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-zv8q6   22m     system:node:master-1.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued

After approving the workers CSRs then the worker nodes show up in the node list:

[kni@provisionhost-0 ~]$ for csr in $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done
certificatesigningrequest.certificates.k8s.io/csr-bhfzn approved
certificatesigningrequest.certificates.k8s.io/csr-bztkq approved

[kni@provisionhost-0 ~]$ oc -n openshift-machine-api get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-bhfzn   85s     system:node:worker-1.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-bztkq   78s     system:node:worker-0.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-kxk6f   23m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-lr9qq   23m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-mp552   22m     system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-nng8s   23m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-p24dr   23m     system:node:master-2.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued
csr-qdrj5   5m25s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-ss79l   5m31s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-zv8q6   23m     system:node:master-1.ocp-edge-cluster.qe.lab.redhat.com                     Approved,Issued

After this the worker nodes show up:

[kni@provisionhost-0 ~]$ oc get nodes
NAME                                          STATUS     ROLES    AGE    VERSION
master-0.ocp-edge-cluster.qe.lab.redhat.com   Ready      master   23m    v1.16.2
master-1.ocp-edge-cluster.qe.lab.redhat.com   Ready      master   23m    v1.16.2
master-2.ocp-edge-cluster.qe.lab.redhat.com   Ready      master   23m    v1.16.2
worker-0.ocp-edge-cluster.qe.lab.redhat.com   NotReady   worker   96s    v1.16.2
worker-1.ocp-edge-cluster.qe.lab.redhat.com   NotReady   worker   103s   v1.16.2

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-02-03-115336-ipv6.2

How reproducible:
100%

Steps to Reproduce:
1. Deploy 3 x master + 2 x worker nodes bare metal environment with IPv6 control plane

Actual results:
Worker nodes do not come up because CSRs do not get automatically approved.

Expected results:
Worker nodes come up without issues.

Additional info:

Comment 1 Russell Bryant 2020-02-05 15:53:16 UTC
This issue is caused by the IPA image (from Ironic) does not have its DHCPv6 client configured the same way as RHCOS, so it does not receive a lease from the DHCPv6 reservation set up for that host.  As a result, introspection reports an incorrect IP and hostname.  That info is used later and compared against what is reported for the running Node.  It doesn't match so CSR approval is rejected.  Derek Higgins is working on resolving this.

The fix will be in the IPA image, not the installer.

Comment 2 Steven Hardy 2020-02-05 16:44:04 UTC
Note that if we fix this by modifying the container image vs the IPA image RPM this bz will depend on https://bugzilla.redhat.com/show_bug.cgi?id=1798491

Comment 6 Marius Cornea 2020-02-27 18:20:13 UTC
Following up on this issue - with 4.3.0-0.nightly-2020-02-21-091838-ipv6.3 deployment ends successfully including worker nodes but post deployment the worker nodes CSRs do not automatically approved. The issue was reported to a different BZ as the IPA image changed tracked in this BZ is fixed:

https://bugzilla.redhat.com/show_bug.cgi?id=1807854

Comment 8 errata-xmlrpc 2020-05-04 11:33:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.