Bug 1832137 - Invalid bootstrap APIServer certificates - Azure
Summary: Invalid bootstrap APIServer certificates - Azure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.3.z
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 4.4.z
Assignee: Abhinav Dahiya
QA Contact: Etienne Simard
URL:
Whiteboard:
Depends On: 1831760
Blocks: 1840238
TreeView+ depends on / blocked
 
Reported: 2020-05-06 07:29 UTC by OpenShift BugZilla Robot
Modified: 2020-05-26 17:53 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-26 16:50:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3554 0 None closed Bug 1832137: Fix bootstrap certificate generation 2020-05-26 15:36:04 UTC
Red Hat Product Errata RHBA-2020:2180 0 None None None 2020-05-26 16:51:13 UTC

Description OpenShift BugZilla Robot 2020-05-06 07:29:59 UTC
This is a clone of Bug #1831760. This is the description of that bug:
Description of problem:

Azure Loadbalancer is not accepting api server certificates for HTTPS probes.

Loadbalancer considers certificate invalid:
Error from the azure side (not visible for normal users):
WINHTTP_CALLBACK_STATUS_FLAG_INVALID_CERT


Based on the https://tools.ietf.org/html/rfc3280#section-4.2.1.1 and https://tools.ietf.org/html/rfc3280#section-4.2.1.2 we are using SubjectKeyID and AuthorityKeyId not as per specification. 

Current certificate configuration:

CA:
```
...
CA:TRUE
 X509v3 Subject Key Identifier:  
   81:11:91:F6:17:0F:F7:1E:B0:E3:CB:72:22:FC:17:03:FD:C7:82:C8 
...
```

Certificate:
```
...
X509v3 Subject Key Identifier:  
   81:11:91:F6:17:0F:F7:1E:B0:E3:CB:72:22:FC:17:03:FD:C7:82:C8
X509v3 Authority Key Identifier:  
 keyid:81:11:91:F6:17:0F:F7:1E:B0:E3:CB:72:22:FC:17:03:FD:C7:82:C8
...
```

Those fields should not be the same for a signed certificate. 

Both fields being equal in a signed certificate is considered an invalid configuration. 

How reproduce:

1. Create 2 azure VMs (we need 2 VMs as azure do not allows "same leg recursive calls via loadbalancer) and Internal LoadBalancer, vnet. 
2. Add HTTPS probe, load-balancing rules, for port 8443
3. SSH into VM1 and run script: https://gist.github.com/mjudeikis/4c0fc47552897bf13e82414b7d8a9f28 
4. SSH into VM2 and try reaching VM1 via NODE IP (curl https://ip:8443/readyz -Ik). This should work.
5. Try reaching VM1 via Loadbalancer IP - This should fail.
If you run ssldump on VM1:
   ssldump -i eth0 port 8443 

you will see that load-balancer is terminating the connection and never send ClientKeyExchange message.

6. Run the same script but change code behaviour for signed certificates (search for GOODCONFIG in the "gist").

Now calls via LB should work as the certificate is considered valid.

Comment 3 Etienne Simard 2020-05-19 02:43:32 UTC
Verified with:

./openshift-install version
./openshift-install unreleased-master-2655-g15eac3785998a5bc250c9f72101a4a9cb767e494-dirty
built from commit 15eac3785998a5bc250c9f72101a4a9cb767e494
release image registry.svc.ci.openshift.org/origin/release:4.3


I've downloaded the installer source code and changed both azurerm_lb_probe templates (/data/data/azure/vnet/internal-lb.tf and
/data/data/azure/vnet/public-lb.tf)

with the following configurations:

~~~
resource "azurerm_lb_probe" "internal_lb_probe_api_internal" {
  name                = "api-internal-probe"
  resource_group_name = var.resource_group_name
  interval_in_seconds = 10
  number_of_probes    = 3
  loadbalancer_id     = azurerm_lb.internal.id
  port                = 6443
  protocol            = "HTTPS"
  request_path        = "/readyz"
}

~~~
~~~
resource "azurerm_lb_probe" "public_lb_probe_api_internal" {
  count = var.private ? 0 : 1

  name                = "api-internal-probe"
  resource_group_name = var.resource_group_name
  interval_in_seconds = 10
  number_of_probes    = 3
  loadbalancer_id     = azurerm_lb.public.id
  port                = 6443
  protocol            = "HTTPS"
  request_path        = "/readyz"

}

~~~

I've compiled the `openshift-installer` binary from the release-4.4 branch after those changes with by running `hack/build.sh`

./openshift-install version
./openshift-install unreleased-master-2655-g15eac3785998a5bc250c9f72101a4a9cb767e494-dirty
built from commit 15eac3785998a5bc250c9f72101a4a9cb767e494
release image registry.svc.ci.openshift.org/origin/release:4.3

After exporting the release image override, I was able to install the cluster (with the https health check).

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="registry.svc.ci.openshift.org/ocp/release:4.4"
$ ./openshift-install create cluster --dir test3
? SSH Public Key /home/qe/.ssh/id_rsa.pub
? Platform azure
INFO Credentials loaded from file "/home/qe/.azure/osServicePrincipal.json" 
? Region centralus
? Base Domain openshift.com
? Cluster Name qe
? Pull Secret [? for help] *********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************WARNING Found override for release image. Please be warned, this is not advised 
INFO Creating infrastructure resources...         
INFO Waiting up to 20m0s for the Kubernetes API at https://api.qe.openshift.com:6443... 
INFO API v1.17.1 up                               
INFO Waiting up to 40m0s for bootstrapping to complete... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 30m0s for the cluster at https://api.qe.openshift.com:6443 to initialize... 
INFO Waiting up to 10m0s for the openshift-console route to be created... 
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/qe/openshift/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.qe.openshift.com 
INFO Login to the console with user: kubeadmin, password:

No issue with basic health checks on the cluster.

Comment 5 errata-xmlrpc 2020-05-26 16:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2180


Note You need to log in before you can comment on or make changes to this bug.