Bug 1654558
| Summary: | The ca.crt created in pod by installer couldn't pass the SSL certificate verification | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | |
| Component: | Installer | Assignee: | Alex Crawford <crawford> | |
| Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> | |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | aalevy, anli, aos-bugs, jokerman, mmccomas, qitang, rmeggins, wking | |
| Version: | 4.1.0 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.1.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1670282 (view as bug list) | Environment: | ||
| Last Closed: | 2019-01-07 02:15:41 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1670282 | |||
|
Description
Qiaoling Tang
2018-11-29 05:34:23 UTC
@rich, It maybe an installer/master bug. It seems the kube-ca certificate is not in /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. Moving to Installer team as logging doesn't control this cert. Maybe it should be master? If the CA in the service account as incorrect - a lot more than logging would be failing. So I have my doubts the installer (or master) is the source of the issue. I'm unsure the behavior of piping the CA into openssl connect - but I have a feeling that if you were to specify `-CAfile` (similar to how the curl command is setting `--cacert`) the openssl connect command would work. Along those lines, I do believe this might be something in the logging stack not properly specifying/using the cluster CA for trust. tl;dr I would very, very, strongly encourage the installer to provide the _entire_ cert chain in the ca.crt provided with the serviceaccount secrets. I've already had a lot of problems with fluentd, and for rsyslog I'm probably to have to hack together a CA file containing the entire cert chain in the rsyslog configmap. Not to mention that there may be many other openssl, ruby, python, and other apps which will be affected. I still don't know if java apps will be affected, or what the remediation will be if so. See below for further analysis. (In reply to Aaron Levy from comment #3) > If the CA in the service account as incorrect - a lot more than logging > would be failing. So I have my doubts the installer (or master) is the > source of the issue. > > I'm unsure the behavior of piping the CA into openssl connect - but I have a > feeling that if you were to specify `-CAfile` (similar to how the curl > command is setting `--cacert`) the openssl connect command would work. No, it would not - verify error:num=2:unable to get issuer certificate sh-4.4# openssl s_client -connect kubernetes.default.svc:443 -CAfile /var/run/se crets/kubernetes.io/serviceaccount/ca.crt CONNECTED(00000004) depth=1 OU = bootkube, CN = kube-ca verify error:num=2:unable to get issuer certificate issuer= OU = openshift, CN = root-ca --- Certificate chain 0 s:O = kube-master, CN = system:kube-apiserver i:OU = bootkube, CN = kube-ca 1 s:OU = bootkube, CN = kube-ca i:OU = openshift, CN = root-ca --- Server certificate -----BEGIN CERTIFICATE----- ... Verification error: unable to get issuer certificate ... Verify return code: 2 (unable to get issuer certificate) In this case, this means "unable to get the root CA" i.e. client has only the intermediate CA cert for "OU = bootkube, CN = kube-ca", not the root CA cert for "OU = openshift, CN = root-ca" so cannot verify the whole chain. However, if you provide the `-partial_chain` flag, you can avoid the error: depth=1 OU = bootkube, CN = kube-ca verify return:1 depth=0 O = kube-master, CN = system:kube-apiserver verify return:1 CONNECTED(00000004) ... Verification: OK ... etc. The problem is that openssl "sets" `-partial_chain` _off_ by default - by default, it requires the full chain. In order to disable this behavior, the openssl client must provide a way to set the flag on the underlying trust store to X509_V_FLAG_PARTIAL_CHAIN. Note that NSS, gnutls, and golang tls/crypto appear so far to be unaffected - they "set" partial_chain "on" by default. I haven't done an exhaustive analysis, but the `curl` command works (NSS), wget works (gnutls). python and ruby both fail, and it depends a great deal on the application's rest/http/ssl layers as to how easy it is to pass this flag in. Of course, you can usually resort to monkeypatching for dynamic languages. > > Along those lines, I do believe this might be something in the logging stack > not properly specifying/using the cluster CA for trust. It turns out we got unlucky with logging because we have clients that use openssl such as fluentd and rsyslog. It turns out we got fairly lucky with the ruby kubeclient because it assumes openssl and allows passing the right flags all the way down into the crypto layer. https://github.com/openshift/origin-aggregated-logging/pull/1483 and https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/164 For rsyslog, I'm going to have to hack the logging operator to dynamically construct the serviceaccount CA cert from the default one and the root ca in kube-system. If the installer team is unwilling to change this, then there is still an installer bug, for the docs. WARNING: YOUR APPS MAY BREAK IF USING THE /var/run/secrets/kubernetes.io/serviceaccount/ca.crt SINCE IT NOW CONTAINS ONLY AN INTERMEDIATE CA CERT AND NOT A ROOT CA CERT. OpenSSL and applications that use it (ruby, python, etc.) by default REQUIRE THE ENTIRE CERT CHAIN TO VERIFY SERVER CERTS. You don't need the entire cert chain, you only need the root CA:
$ oc extract -n kube-system cm/root-ca --to=.
ca.crt
$ mv ca.crt root-ca.crt
$ # or any serviceaccount secret
$ oc extract -n openshift-logging secret/fluentd-token-jj7zq --to=.
namespace
token
ca.crt
$ oc get endpoints --all-namespaces
NAMESPACE NAME
ENDPOINTS
AGE
...
default kubernetes
192.168.126.11:6443
$ openssl s_client -connect 192.168.126.11:6443 -CAfile ca.crt
CONNECTED(00000003)
depth=1 OU = bootkube, CN = kube-ca
verify error:num=2:unable to get issuer certificate
issuer= OU = openshift, CN = root-ca
... failure case ...
$ openssl s_client -connect 192.168.126.11:6443 -CAfile root-ca.crt
CONNECTED(00000003)
depth=2 OU = openshift, CN = root-ca
verify return:1
depth=1 OU = bootkube, CN = kube-ca
verify return:1
depth=0 O = kube-master, CN = system:kube-apiserver
verify return:1
...
Verify return code: 0 (ok)
https://github.com/openshift/cluster-kube-controller-manager-operator/pull/110 has been merged. Can you verify that this has been fixed? Fluentd pod can be in running status now.
$ oc get pod
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-8866ff9c8-9bhnk 1/1 Running 0 15m
elasticsearch-clientdatamaster-0-1-84d764899d-wl4l5 1/1 Running 0 13m
elasticsearch-operator-86599f8849-544tn 1/1 Running 0 14m
fluentd-8wfqd 1/1 Running 0 14m
fluentd-dkbvx 1/1 Running 0 14m
fluentd-f27rf 1/1 Running 0 14m
kibana-675b587dfd-prf28 2/2 Running 0 14m
$ oc rsh fluentd-8wfqd
sh-4.2# openssl s_client -connect kubernetes.default.svc:443 -CAfile /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
CONNECTED(00000003)
depth=2 OU = openshift, CN = root-ca
verify return:1
depth=1 OU = bootkube, CN = kube-ca
verify return:1
depth=0 O = kube-master, CN = system:kube-apiserver
verify return:1
---
Certificate chain
0 s:/O=kube-master/CN=system:kube-apiserver
i:/OU=bootkube/CN=kube-ca
1 s:/OU=bootkube/CN=kube-ca
i:/OU=openshift/CN=root-ca
---
Server certificate
-----BEGIN CERTIFICATE-----
MIID+jCCAuKgAwIBAgIILPsNawY4PM0wDQYJKoZIhvcNAQELBQAwJTERMA8GA1UE
CxMIYm9vdGt1YmUxEDAOBgNVBAMTB2t1YmUtY2EwHhcNMTgxMjE3MDEwMjAwWhcN
MjgxMjE0MDEwMjAyWjA2MRQwEgYDVQQKEwtrdWJlLW1hc3RlcjEeMBwGA1UEAxMV
c3lzdGVtOmt1YmUtYXBpc2VydmVyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
CgKCAQEA6ouj44LNJfKuqQ1uz11gXPbQvb/aTVpfUk1wPyPw7C/NRQlkS6wrZw7U
tRk8rGoe6ADx2Yq4yWmGbQNXSjnm947YDBaMVDekoaadzWBCNs0AHNXzsjF06o+r
AbCKYPhh3ipFZcxqCpcak860LJ/ZpogDTqJRuqv0DIw4EeawXyM6gCGzWlHakVWg
bVISxRWRGX2039VCRMxTq3SBARoAEOKQGehMz/aKV0lLui9IG/KYPYR5VRow9PV/
oaeOSKpbhbCQUAqe1SyNcDNdfLct9ayt2CT3hdP79eGLFUNLxpo6V/uBwfL7YvNm
2BHR0Y6Geh/CxLaQEEycUJIHLrQLbwIDAQABo4IBGzCCARcwDgYDVR0PAQH/BAQD
AgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjAMBgNVHRMBAf8EAjAA
MB0GA1UdDgQWBBRIjdrtpZbJ+IRuaLy0c3ejzF8MwzAfBgNVHSMEGDAWgBRyWPAN
UkBHoG6Z9xsBVxcl88VLAzCBlwYDVR0RBIGPMIGMghVxaXRhbmctYXBpLnR0LnRl
c3RpbmeCCmt1YmVybmV0ZXOCEmt1YmVybmV0ZXMuZGVmYXVsdIIWa3ViZXJuZXRl
cy5kZWZhdWx0LnN2Y4Ika3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxv
Y2Fsgglsb2NhbGhvc3SHBKweAAGHBH8AAAEwDQYJKoZIhvcNAQELBQADggEBAHGv
vSZFsi0QBStbWB+vaHxrRkaIBH+Y8Lwow14qEtltso638g/05f6FVlq8ILZaujFU
T8esPe2wWtZ9Ua5WN7kMppOn5SwlawGUtl1R71JJN8A9ZzPU9e37mPbm7fnRvDKa
2Yys52pR8Xb9lPkdVWBKl5GHl2FLbkWaDxY+CA6PWIB0QfSList/3mGmPwQ6c6aR
oPlJH/dt/2KUgbQEY7pNlS/iKiBO2UBv0nxF81USjPSaGwwIZ2cW1LzV5SmsbmOs
myY5FQ3GBbCtrJ7KswzuJ+3Gjwdv7GZyQ6PznLchg4zWu+wUcqH1giMSagylYB4F
/KEvF5l+2Cc+m2cGCAY=
-----END CERTIFICATE-----
subject=/O=kube-master/CN=system:kube-apiserver
issuer=/OU=bootkube/CN=kube-ca
---
Acceptable client certificate CA names
/OU=bootkube/CN=kube-ca
/OU=bootkube/CN=aggregator
Client Certificate Types: RSA sign, ECDSA sign
Requested Signature Algorithms: RSA+SHA256:ECDSA+SHA256:RSA+SHA384:ECDSA+SHA384:RSA+SHA512:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA+SHA256:ECDSA+SHA256:RSA+SHA384:ECDSA+SHA384:RSA+SHA512:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Peer signing digest: SHA512
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 2568 bytes and written 427 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-AES128-GCM-SHA256
Session-ID: 0FC52E41120AD73CDC8A81F74DE353D1F06987FF320204AA7941DF6CB3F6E29C
Session-ID-ctx:
Master-Key: 178CA609E42EC7EEE845D487A78FD63488554794DA99A0C2F9264091C3C28B3AF1E2A78BDDAB9AE51C5FD326EAEDA060
Key-Arg : None
Krb5 Principal: None
PSK identity: None
PSK identity hint: None
TLS session ticket:
0000 - 3d 10 af 5a 9b eb 60 d5-c7 d3 22 f8 e7 d8 f3 72 =..Z..`..."....r
0010 - a1 d9 4c 7f 6b 46 c1 68-3f 9a 13 97 89 fc 4d f2 ..L.kF.h?.....M.
0020 - 97 df 41 51 dc 98 12 42-62 c2 07 7f 0b d7 81 98 ..AQ...Bb.......
0030 - 3d aa f8 91 ac 75 26 25-02 45 a0 35 82 d5 3a fa =....u&%.E.5..:.
0040 - 55 f0 29 66 bc d7 1a b8-08 2d 1e 73 f9 97 3b 43 U.)f.....-.s..;C
0050 - c3 76 6f 5c 87 ff 16 00-87 39 b1 21 c5 cc 95 fa .vo\.....9.!....
0060 - c5 f6 77 d7 ff f0 4f fb-4a a8 69 12 bc bd 2f 47 ..w...O.J.i.../G
0070 - 0e b8 50 bb 94 9b 67 fd- ..P...g.
Start Time: 1545011294
Timeout : 300 (sec)
Verify return code: 0 (ok)
---
Remove keyword "TestBlocker"
> Fluentd pod can be in running status now
So we can mark this CLOSED CURRENTRELEASE? Or is there anything left to do?
|