Bug 1654558
Summary: | The ca.crt created in pod by installer couldn't pass the SSL certificate verification | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | |
Component: | Installer | Assignee: | Alex Crawford <crawford> | |
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> | |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aalevy, anli, aos-bugs, jokerman, mmccomas, qitang, rmeggins, wking | |
Version: | 4.1.0 | |||
Target Milestone: | --- | |||
Target Release: | 4.1.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1670282 (view as bug list) | Environment: | ||
Last Closed: | 2019-01-07 02:15:41 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1670282 |
Description
Qiaoling Tang
2018-11-29 05:34:23 UTC
@rich, It maybe an installer/master bug. It seems the kube-ca certificate is not in /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. Moving to Installer team as logging doesn't control this cert. Maybe it should be master? If the CA in the service account as incorrect - a lot more than logging would be failing. So I have my doubts the installer (or master) is the source of the issue. I'm unsure the behavior of piping the CA into openssl connect - but I have a feeling that if you were to specify `-CAfile` (similar to how the curl command is setting `--cacert`) the openssl connect command would work. Along those lines, I do believe this might be something in the logging stack not properly specifying/using the cluster CA for trust. tl;dr I would very, very, strongly encourage the installer to provide the _entire_ cert chain in the ca.crt provided with the serviceaccount secrets. I've already had a lot of problems with fluentd, and for rsyslog I'm probably to have to hack together a CA file containing the entire cert chain in the rsyslog configmap. Not to mention that there may be many other openssl, ruby, python, and other apps which will be affected. I still don't know if java apps will be affected, or what the remediation will be if so. See below for further analysis. (In reply to Aaron Levy from comment #3) > If the CA in the service account as incorrect - a lot more than logging > would be failing. So I have my doubts the installer (or master) is the > source of the issue. > > I'm unsure the behavior of piping the CA into openssl connect - but I have a > feeling that if you were to specify `-CAfile` (similar to how the curl > command is setting `--cacert`) the openssl connect command would work. No, it would not - verify error:num=2:unable to get issuer certificate sh-4.4# openssl s_client -connect kubernetes.default.svc:443 -CAfile /var/run/se crets/kubernetes.io/serviceaccount/ca.crt CONNECTED(00000004) depth=1 OU = bootkube, CN = kube-ca verify error:num=2:unable to get issuer certificate issuer= OU = openshift, CN = root-ca --- Certificate chain 0 s:O = kube-master, CN = system:kube-apiserver i:OU = bootkube, CN = kube-ca 1 s:OU = bootkube, CN = kube-ca i:OU = openshift, CN = root-ca --- Server certificate -----BEGIN CERTIFICATE----- ... Verification error: unable to get issuer certificate ... Verify return code: 2 (unable to get issuer certificate) In this case, this means "unable to get the root CA" i.e. client has only the intermediate CA cert for "OU = bootkube, CN = kube-ca", not the root CA cert for "OU = openshift, CN = root-ca" so cannot verify the whole chain. However, if you provide the `-partial_chain` flag, you can avoid the error: depth=1 OU = bootkube, CN = kube-ca verify return:1 depth=0 O = kube-master, CN = system:kube-apiserver verify return:1 CONNECTED(00000004) ... Verification: OK ... etc. The problem is that openssl "sets" `-partial_chain` _off_ by default - by default, it requires the full chain. In order to disable this behavior, the openssl client must provide a way to set the flag on the underlying trust store to X509_V_FLAG_PARTIAL_CHAIN. Note that NSS, gnutls, and golang tls/crypto appear so far to be unaffected - they "set" partial_chain "on" by default. I haven't done an exhaustive analysis, but the `curl` command works (NSS), wget works (gnutls). python and ruby both fail, and it depends a great deal on the application's rest/http/ssl layers as to how easy it is to pass this flag in. Of course, you can usually resort to monkeypatching for dynamic languages. > > Along those lines, I do believe this might be something in the logging stack > not properly specifying/using the cluster CA for trust. It turns out we got unlucky with logging because we have clients that use openssl such as fluentd and rsyslog. It turns out we got fairly lucky with the ruby kubeclient because it assumes openssl and allows passing the right flags all the way down into the crypto layer. https://github.com/openshift/origin-aggregated-logging/pull/1483 and https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/164 For rsyslog, I'm going to have to hack the logging operator to dynamically construct the serviceaccount CA cert from the default one and the root ca in kube-system. If the installer team is unwilling to change this, then there is still an installer bug, for the docs. WARNING: YOUR APPS MAY BREAK IF USING THE /var/run/secrets/kubernetes.io/serviceaccount/ca.crt SINCE IT NOW CONTAINS ONLY AN INTERMEDIATE CA CERT AND NOT A ROOT CA CERT. OpenSSL and applications that use it (ruby, python, etc.) by default REQUIRE THE ENTIRE CERT CHAIN TO VERIFY SERVER CERTS. You don't need the entire cert chain, you only need the root CA: $ oc extract -n kube-system cm/root-ca --to=. ca.crt $ mv ca.crt root-ca.crt $ # or any serviceaccount secret $ oc extract -n openshift-logging secret/fluentd-token-jj7zq --to=. namespace token ca.crt $ oc get endpoints --all-namespaces NAMESPACE NAME ENDPOINTS AGE ... default kubernetes 192.168.126.11:6443 $ openssl s_client -connect 192.168.126.11:6443 -CAfile ca.crt CONNECTED(00000003) depth=1 OU = bootkube, CN = kube-ca verify error:num=2:unable to get issuer certificate issuer= OU = openshift, CN = root-ca ... failure case ... $ openssl s_client -connect 192.168.126.11:6443 -CAfile root-ca.crt CONNECTED(00000003) depth=2 OU = openshift, CN = root-ca verify return:1 depth=1 OU = bootkube, CN = kube-ca verify return:1 depth=0 O = kube-master, CN = system:kube-apiserver verify return:1 ... Verify return code: 0 (ok) https://github.com/openshift/cluster-kube-controller-manager-operator/pull/110 has been merged. Can you verify that this has been fixed? Fluentd pod can be in running status now. $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-8866ff9c8-9bhnk 1/1 Running 0 15m elasticsearch-clientdatamaster-0-1-84d764899d-wl4l5 1/1 Running 0 13m elasticsearch-operator-86599f8849-544tn 1/1 Running 0 14m fluentd-8wfqd 1/1 Running 0 14m fluentd-dkbvx 1/1 Running 0 14m fluentd-f27rf 1/1 Running 0 14m kibana-675b587dfd-prf28 2/2 Running 0 14m $ oc rsh fluentd-8wfqd sh-4.2# openssl s_client -connect kubernetes.default.svc:443 -CAfile /var/run/secrets/kubernetes.io/serviceaccount/ca.crt CONNECTED(00000003) depth=2 OU = openshift, CN = root-ca verify return:1 depth=1 OU = bootkube, CN = kube-ca verify return:1 depth=0 O = kube-master, CN = system:kube-apiserver verify return:1 --- Certificate chain 0 s:/O=kube-master/CN=system:kube-apiserver i:/OU=bootkube/CN=kube-ca 1 s:/OU=bootkube/CN=kube-ca i:/OU=openshift/CN=root-ca --- Server certificate -----BEGIN CERTIFICATE----- MIID+jCCAuKgAwIBAgIILPsNawY4PM0wDQYJKoZIhvcNAQELBQAwJTERMA8GA1UE CxMIYm9vdGt1YmUxEDAOBgNVBAMTB2t1YmUtY2EwHhcNMTgxMjE3MDEwMjAwWhcN MjgxMjE0MDEwMjAyWjA2MRQwEgYDVQQKEwtrdWJlLW1hc3RlcjEeMBwGA1UEAxMV c3lzdGVtOmt1YmUtYXBpc2VydmVyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEA6ouj44LNJfKuqQ1uz11gXPbQvb/aTVpfUk1wPyPw7C/NRQlkS6wrZw7U tRk8rGoe6ADx2Yq4yWmGbQNXSjnm947YDBaMVDekoaadzWBCNs0AHNXzsjF06o+r AbCKYPhh3ipFZcxqCpcak860LJ/ZpogDTqJRuqv0DIw4EeawXyM6gCGzWlHakVWg bVISxRWRGX2039VCRMxTq3SBARoAEOKQGehMz/aKV0lLui9IG/KYPYR5VRow9PV/ oaeOSKpbhbCQUAqe1SyNcDNdfLct9ayt2CT3hdP79eGLFUNLxpo6V/uBwfL7YvNm 2BHR0Y6Geh/CxLaQEEycUJIHLrQLbwIDAQABo4IBGzCCARcwDgYDVR0PAQH/BAQD AgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjAMBgNVHRMBAf8EAjAA MB0GA1UdDgQWBBRIjdrtpZbJ+IRuaLy0c3ejzF8MwzAfBgNVHSMEGDAWgBRyWPAN UkBHoG6Z9xsBVxcl88VLAzCBlwYDVR0RBIGPMIGMghVxaXRhbmctYXBpLnR0LnRl c3RpbmeCCmt1YmVybmV0ZXOCEmt1YmVybmV0ZXMuZGVmYXVsdIIWa3ViZXJuZXRl cy5kZWZhdWx0LnN2Y4Ika3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxv Y2Fsgglsb2NhbGhvc3SHBKweAAGHBH8AAAEwDQYJKoZIhvcNAQELBQADggEBAHGv vSZFsi0QBStbWB+vaHxrRkaIBH+Y8Lwow14qEtltso638g/05f6FVlq8ILZaujFU T8esPe2wWtZ9Ua5WN7kMppOn5SwlawGUtl1R71JJN8A9ZzPU9e37mPbm7fnRvDKa 2Yys52pR8Xb9lPkdVWBKl5GHl2FLbkWaDxY+CA6PWIB0QfSList/3mGmPwQ6c6aR oPlJH/dt/2KUgbQEY7pNlS/iKiBO2UBv0nxF81USjPSaGwwIZ2cW1LzV5SmsbmOs myY5FQ3GBbCtrJ7KswzuJ+3Gjwdv7GZyQ6PznLchg4zWu+wUcqH1giMSagylYB4F /KEvF5l+2Cc+m2cGCAY= -----END CERTIFICATE----- subject=/O=kube-master/CN=system:kube-apiserver issuer=/OU=bootkube/CN=kube-ca --- Acceptable client certificate CA names /OU=bootkube/CN=kube-ca /OU=bootkube/CN=aggregator Client Certificate Types: RSA sign, ECDSA sign Requested Signature Algorithms: RSA+SHA256:ECDSA+SHA256:RSA+SHA384:ECDSA+SHA384:RSA+SHA512:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1 Shared Requested Signature Algorithms: RSA+SHA256:ECDSA+SHA256:RSA+SHA384:ECDSA+SHA384:RSA+SHA512:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1 Peer signing digest: SHA512 Server Temp Key: ECDH, P-256, 256 bits --- SSL handshake has read 2568 bytes and written 427 bytes --- New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256 Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE No ALPN negotiated SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES128-GCM-SHA256 Session-ID: 0FC52E41120AD73CDC8A81F74DE353D1F06987FF320204AA7941DF6CB3F6E29C Session-ID-ctx: Master-Key: 178CA609E42EC7EEE845D487A78FD63488554794DA99A0C2F9264091C3C28B3AF1E2A78BDDAB9AE51C5FD326EAEDA060 Key-Arg : None Krb5 Principal: None PSK identity: None PSK identity hint: None TLS session ticket: 0000 - 3d 10 af 5a 9b eb 60 d5-c7 d3 22 f8 e7 d8 f3 72 =..Z..`..."....r 0010 - a1 d9 4c 7f 6b 46 c1 68-3f 9a 13 97 89 fc 4d f2 ..L.kF.h?.....M. 0020 - 97 df 41 51 dc 98 12 42-62 c2 07 7f 0b d7 81 98 ..AQ...Bb....... 0030 - 3d aa f8 91 ac 75 26 25-02 45 a0 35 82 d5 3a fa =....u&%.E.5..:. 0040 - 55 f0 29 66 bc d7 1a b8-08 2d 1e 73 f9 97 3b 43 U.)f.....-.s..;C 0050 - c3 76 6f 5c 87 ff 16 00-87 39 b1 21 c5 cc 95 fa .vo\.....9.!.... 0060 - c5 f6 77 d7 ff f0 4f fb-4a a8 69 12 bc bd 2f 47 ..w...O.J.i.../G 0070 - 0e b8 50 bb 94 9b 67 fd- ..P...g. Start Time: 1545011294 Timeout : 300 (sec) Verify return code: 0 (ok) --- Remove keyword "TestBlocker" > Fluentd pod can be in running status now
So we can mark this CLOSED CURRENTRELEASE? Or is there anything left to do?
|