Bug 1708640
Summary: | Not able to query jolokia on jboss images | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | mchoma | ||||
Component: | apiserver-auth | Assignee: | Stefan Schimanski <sttts> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Chuan Yu <chuyu> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.1.0 | CC: | anpicker, aos-bugs, astefanu, bparees, calfonso, ccoleman, decarr, dsimansk, erooth, evb, fshaikh, gblomqui, jokerman, kwills, maschmid, mfojtik, mloibl, mmccomas, pkrupa, sponnaga, sttts, surbania, xtian, xxia | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-08-26 10:20:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
mchoma
2019-05-10 12:48:04 UTC
I don't think this belongs to the OCP Monitoring component. We take care of the cluster monitoring stack and Prometheus itself. Component and product teams are responsible for their monitoring themselves. Who put that component there, and who's responsible for jolokia? In that case I agree Monitoring was bad choice. I was thinking about several components: Auth, Networking, Pod, Security ... Could someone help me to redirect issue to proper component then? Which component is involved in accessing through api https://api.$CLUSTER:6443/api/v1/namespaces/namespace/pods/https:$POD:8778/proxy/ ? I'd say auth. Although, it looks like this may just be a setup/permission issue, you may just not have enough permissions to use the proxy subresource of that pod. I'd check the audit log and/or apiserver log to see whether this was an RBAC failure. I can check that on Monday. Although I was using as kubeadmin, which is most powerful user on our installation, so I doubt that could be the case. It is possible our test in time of creation was impacted by these resources [1] [2] [3], which describes that way of usage. If this is no more supported way we can look into alternative ways. Could you elaborate more? Are you talking about exposing pod:8787 through service to get public access point? [1] https://access.redhat.com/documentation/en-us/red_hat_jboss_a-mq/6.3/html/red_hat_jboss_a-mq_for_openshift/tutorials#monitoring_a_mq [2] https://developers.redhat.com/blog/2016/03/30/jolokia-jvm-monitoring-in-openshift/#error=interaction_required&state=6d2c29b9-d7d7-4229-887e-6e43d26f9655 [3] https://access.redhat.com/solutions/3756021 Hold on. The proxy is part of our public API. It must work. If we broke it in 4.1 it must be fixed. Proxy is a V1, public, supported API. I deleted the token, so don't freak out. I tried this an it worked fine. [deads@deads-02 installer]$ curl -v -k --oauth2-bearer N-tLptkeh71tpwHffvxRMStClh_Ym5d807YHYDPlC_g https://api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/default/pods/https:jolokia-reproducer-1-v8xh2:8778/proxy/jolokia * Trying 3.213.152.204... * TCP_NODELAY set * Connected to api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com (3.213.152.204) port 6443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Request CERT (13): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Certificate (11): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com * start date: May 10 16:54:12 2019 GMT * expire date: Jun 9 16:54:13 2019 GMT * issuer: OU=openshift; CN=kube-apiserver-lb-signer * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Server auth using Bearer with user '' * Using Stream ID: 1 (easy handle 0x562326501630) > GET /api/v1/namespaces/default/pods/https:jolokia-reproducer-1-v8xh2:8778/proxy/jolokia HTTP/2 > Host: api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443 > Authorization: Bearer N-tLptkeh71tpwHffvxRMStClh_Ym5d807YHYDPlC_g > User-Agent: curl/7.61.1 > Accept: */* > * Connection state changed (MAX_CONCURRENT_STREAMS == 2000)! < HTTP/2 404 < audit-id: 74397f07-c82c-4f95-a3ef-9a2eb778b4f8 < cache-control: no-store < content-type: text/html < content-length: 50 < date: Fri, 10 May 2019 17:12:57 GMT < * Connection #0 to host api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com left intact <h1>404 Not Found</h1>No context found for request[deads@deads-02 installer]$ Now as for the specific issue you saw. The 401 you saw in your curl was actually a 401 from your application, not from the kube-apiserver. You were authenticated to the kube-apiserver and because this endpoint isn't a real proxy (as Ben pointed out about), you cannot pass a secondary token to it. Your app (jolokia I guess), didn't get any authentication information and replied with a 401. It stands out because `www-authenticate: Basic realm="jolokia"` is a challenge header and our kube-apiserver doesn't accept basic auth and so it doesn't reply with challenge headers. And it doesn't know anything about a jolokia realm. Notice the difference if you use a bad token: curl -v -k --oauth2-bearer bad-token https://api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/default/pods/https:jolokia-reproducer-1-v8xh2:8778/proxy/jolokia * Trying 3.213.152.204... * TCP_NODELAY set * Connected to api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com (3.213.152.204) port 6443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Request CERT (13): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Certificate (11): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com * start date: May 10 16:54:12 2019 GMT * expire date: Jun 9 16:54:13 2019 GMT * issuer: OU=openshift; CN=kube-apiserver-lb-signer * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Server auth using Bearer with user '' * Using Stream ID: 1 (easy handle 0x55ab3c901630) > GET /api/v1/namespaces/default/pods/https:jolokia-reproducer-1-v8xh2:8778/proxy/jolokia HTTP/2 > Host: api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443 > Authorization: Bearer bad-token > User-Agent: curl/7.61.1 > Accept: */* > * Connection state changed (MAX_CONCURRENT_STREAMS == 2000)! < HTTP/2 401 < audit-id: 8275b14e-076d-44aa-a768-a28ccb3e9fc9 < cache-control: no-store < content-type: application/json < content-length: 165 < date: Fri, 10 May 2019 17:21:06 GMT < { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "Unauthorized", "reason": "Unauthorized", "code": 401 * Connection #0 to host api.ci-ln-1nvz24k-d5d6b.origin-ci-int-aws.dev.rhcloud.com left intact }[deads@deads-02 installer]$ There is no challenge header and our 401 messages have a json format in the body. If you want us to have another look, please use an different app, something simple that doesn't require secondary authentication. Thanks for clarification. But again, do you understand why this scenario works on OCP 3.11 and stopped to work on OCP 4.1? Which behaviour is correct? We are discussing this in paralel on https://issues.jboss.org/browse/JBEAP-16845. It seems to us proxy -> jolokia communication is secured by CLIENT_CERT authentication. As jolokia is secured with self signed certificate theory is OCP 4.1 is not trusting such certificate. Was proxy on OCP 3.11 skipping cert validation? Or is still default client certifikace for kubeapiserver cn=system:master-proxy? But we are missing any log information to confirm our deductions. Could you advice how we can see additional logs from proxy ideally TLS handshake log from proxy similar to -Djavax.net.debug=all? Authentication in jolokia is configured like this: sh-4.2$ cat /opt/jolokia/etc/jolokia.properties host=* port=8778 discoveryEnabled=false user=jolokia password=gTa34RrujeddqzxcT3ZnRCmoTrMfe3 protocol=https useSslClientAuthentication=true extraClientCheck=true caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt clientPrincipal=cn=system:master-proxy sh-4.2$ I have managed to get TLS handshake detail log (pod.log) on eap side by setting environment variable `JAVA_OPTS_APPEND=-Djavax.net.debug=all`. From that log I see kube api server does not provide client certificate {code} [0m[0m07:57:39,050 INFO [stdout] (Thread-97) *** Certificate chain [0m[0m07:57:39,050 INFO [stdout] (Thread-97) <Empty> [0m[0m07:57:39,050 INFO [stdout] (Thread-97) *** {code} Which could be caused by fact kube api server client certificate does not comply to what jolokia is requesting {code} [0m[0m07:57:38,860 INFO [stdout] (Thread-97) Cert Authorities: [0m[0m07:57:38,860 INFO [stdout] (Thread-97) <CN=Jolokia Agent 1.5.0, OU=JVM, O=jolokia.org, L=Pegnitz, ST=Franconia, C=DE> [0m[0m07:57:38,860 INFO [stdout] (Thread-97) <CN=kube-apiserver-lb-signer, OU=openshift> {code} I see kube-apiserver-lb-signer is from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt which is used by jolokia as ca cert. Question now is what is kube api server using as default client certificate on OCP 4.1? Apparently that is not signed by kube-apiserver-lb-signer. So that is why it stopped work on OCP 4.1. Created attachment 1568314 [details]
pod.log
Reopening as it seems to me this needs more attention. Seems to me certificate which proxy(kube api server) use for CLIENT_CERT authentication does not match with truststore /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. Where can I find proxy(kube api server) certificate? I don't know if that changed in OpenShift 4.x, but in 3.x, the client certificate used by Jolokia is the one configured in master-config.yaml, that allows setting a client cert for the proxy to present when proxying using TLS: kubernetesMasterConfig: ... certFile: master.proxy-client.crt keyFile: master.proxy-client.key FWIW, I've been able to reproduce the issue on OpenShift 3.11.0. While the same test works with OpenShift 3.10.0: $ minishift start --openshift-version=v3.10.0 $ kubectl get --raw /api/v1/namespaces/fuse/pods/https:s2i-fuse73-spring-boot-camel-2-rdqq8:8778/proxy/jolokia/ --v=9 I0524 16:44:14.764310 91448 loader.go:359] Config loaded from file /Users/astefanu/.kube/config I0524 16:44:14.799425 91448 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.14.2 (darwin/amd64) kubernetes/66049e3" -H "Authorization: Bearer cC7M7_vUiQrRxmAF-GhPwPa7d0txaV1QmIOhkTK5KMQ" 'https://192.168.64.71:8443/api/v1/namespaces/fuse/pods/https:s2i-fuse73-spring-boot-camel-2-rdqq8:8778/proxy/jolokia/' I0524 16:44:15.394543 91448 round_trippers.go:438] GET https://192.168.64.71:8443/api/v1/namespaces/fuse/pods/https:s2i-fuse73-spring-boot-camel-2-rdqq8:8778/proxy/jolokia/ 200 OK in 595 milliseconds I0524 16:44:15.394577 91448 round_trippers.go:444] Response Headers: I0524 16:44:15.394587 91448 round_trippers.go:447] Cache-Control: no-store I0524 16:44:15.394635 91448 round_trippers.go:447] Cache-Control: no-cache I0524 16:44:15.394655 91448 round_trippers.go:447] Content-Type: text/plain; charset=utf-8 I0524 16:44:15.394662 91448 round_trippers.go:447] Date: Fri, 24 May 2019 14:44:30 GMT I0524 16:44:15.394668 91448 round_trippers.go:447] Expires: Fri, 24 May 2019 13:44:30 GMT I0524 16:44:15.394673 91448 round_trippers.go:447] Pragma: no-cache I0524 16:44:15.394678 91448 round_trippers.go:447] Content-Length: 780 {"request":{"type":"version"},"value":{"agent":"1.5.0","protocol":"7.2","config":{"listenForHttpService":"true","maxCollectionSize":"0","authIgnoreCerts":"false","agentId":"172.17.0.8-1-6d6f6e28-jvm","agentType":"jvm","policyLocation":"classpath:\/jolokia-access.xml","agentContext":"\/jolokia","mimeType":"text\/plain","discoveryEnabled":"false","streaming":"true","password":"CxUUoVagLOAIGUTLi5FnYKn0gPNr0c","historyMaxEntries":"10","allowDnsReverseLookup":"true","maxObjects":"0","debug":"false","serializeException":"false","maxDepth":"15","authMode":"basic","canonicalNaming":"true","allowErrorDetails":"true","realm":"jolokia","includeStackTrace":"true","user":"jolokia","useRestrictorService":"false","debugMaxEntries":"100"},"info":{}},"timestamp":1558709070,"status":200}% With 3.11.0, the client certificate presented to Jolokia by the API proxy is empty as reported with JAVA_OPTIONS=-Djavax.net.debug=all: *** Certificate chain <Empty> *** With 3.10.0, the client certificate is presented as expected: *** Certificate chain chain [0] = [ [ Version: V3 Subject: CN=system:master-proxy Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11 Key: Sun RSA public key, 2048 bits modulus: 24538824463296429476158344027594341222983297637389985686887239626224123943539183061884281786478584619787095828890492968667750261339004288466315046775944846036259902977051175812769295285797591920694908406110083775657031670360073456304933676685464354372471723029633162734612235584266478091096909553121514228176976984963272813650863854915581269997667245569268972451538541446020787147645568489329212986241076399956143008390668266104878254701988032248727491492858261198111842986990701298510005873073307331156245102739091290424276784359738653894539831173743994305380255566487472854693996012479501593989350720180343982383703 public exponent: 65537 Validity: [From: Fri May 24 08:45:58 UTC 2019, To: Sun May 23 08:45:59 UTC 2021] Issuer: CN=openshift-signer@1558687558 SerialNumber: [ 05] Certificate Extensions: 3 [1]: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:false PathLen: undefined ] [2]: ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ clientAuth ] [3]: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment ] ] It is clear, that with an empty certificate being presented, Jolokia SSL client authentication will fail and may fallback to username / password challenge if enabled. The notes LGTM It is OK to make above document notes. But the issue should be fixed. Changing bug fields. |