Bug 1887639

Summary: [https_proxy] query console-operator metrics report certificate problem via oc exec
Product: OpenShift Container Platform Reporter: Yadan Pei <yapei>
Component: Management ConsoleAssignee: Jakub Hadvig <jhadvig>
Status: CLOSED NOTABUG QA Contact: Yadan Pei <yapei>
Severity: low Docs Contact:
Priority: medium    
Version: 4.6CC: aos-bugs, jokerman, spadgett
Target Milestone: ---Keywords: UpcomingSprint
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-21 17:29:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yadan Pei 2020-10-13 03:56:02 UTC
Description of problem:
based on our testing, on https_proxy and ipv6 cluster, when querying console operator exposed metrics via `oc exec` command, it will report certification problem.

Version-Release number of selected component (if applicable):
4.6.0-rc.2

How reproducible:
Always

Steps to Reproduce:
1. query console operator exposed metrics with 'oc exec'

# oc get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE                             NOMINATED NODE   READINESS GATES
console-operator-757f85b94b-dj2s4   1/1     Running   0          24h   10.128.0.16   wsun1012-brdkd-control-plane-1   <none>           <none>
# export token=$(oc serviceaccounts get-token prometheus-k8s -n openshift-monitoring)

# oc exec console-operator-757f85b94b-dj2s4 -- curl -k -H "Authorization: Bearer $token" https://10.128.0.16:8443/metrics | grep console_url
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
command terminated with exit code 60

# oc get proxy cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2020-10-12T02:59:21Z"
  generation: 1
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:httpProxy: {}
        f:httpsProxy: {}
        f:noProxy: {}
        f:trustedCA:
          .: {}
          f:name: {}
      f:status:
        .: {}
        f:httpProxy: {}
        f:httpsProxy: {}
        f:noProxy: {}
    manager: cluster-bootstrap
    operation: Update
    time: "2020-10-12T02:59:22Z"
  name: cluster
  resourceVersion: "513"
  selfLink: /apis/config.openshift.io/v1/proxies/cluster
  uid: 4525c2e1-1350-4aec-a5d4-8f31ac5e9af8
spec:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3128
  httpsProxy: https://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3130
  noProxy: test.no-proxy.com
  trustedCA:
    name: user-ca-bundle
status:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3128
  httpsProxy: https://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3130
  noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.wsun1012.qe.devcluster.openshift.com,etcd-0.wsun1012.qe.devcluster.openshift.com,etcd-1.wsun1012.qe.devcluster.openshift.com,etcd-2.wsun1012.qe.devcluster.openshift.com,localhost,test.no-proxy.com

2. Viewing metrics from console Monitoring -> Metrics -> console_url, a correct result record is returned
console_url	https	10.128.0.16:8443	metrics	openshift-console-operator	console-operator-757f85b94b-dj2s4	openshift-monitoring/k8s	metrics	https://console-openshift-console.apps.wsun1012.qe.devcluster.openshift.com	1

3. Run `oc exec` to query metrics via prometheus endpoint, results can be returned
# oc project openshift-monitoring
Now using project "openshift-monitoring" on server "https://api.wsun1012.qe.devcluster.openshift.com:6443".

# oc exec prometheus-k8s-0  -c prometheus -- curl -k -H "Authorization: Bearer $token" https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query\?query\=ALERTS\%7Balertname\%3D\%22PodDisruptionBudgetAtLimit\%22\%7D
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    63  100    63    0     0   1235      0 --:--:-- --:--:-- --:--:--  1260{"status":"success","data":{"resultType":"vector","result":[]}}



Actual results:
1. curl commands reports error:
curl: (60) SSL certificate problem: self signed certificate in certificate chain 

Expected results:
1. correct metrics data should be returned

Additional info:
1. For comparison, on a normal cluster(without https_proxy), the query command can return successfully

# oc project
Using project "openshift-console-operator" on server "https://api.qe-ui46-1013.qe.devcluster.openshift.com:6443".

# oc get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE    IP            NODE                                         NOMINATED NODE   READINESS GATES
console-operator-6d7f7d464d-d48hd   1/1     Running   0          148m   10.129.0.15   ip-10-0-178-119.us-east-2.compute.internal   <none>           <none>

# export token=$(oc serviceaccounts get-token prometheus-k8s -n openshift-monitoring)

# oc exec console-operator-6d7f7d464d-d48hd -- curl -k -H "Authorization: Bearer $token" https://10.129.0.15:8443/metrics | grep console_url
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP console_url [ALPHA] URL of the console exposed on the cluster
# TYPE console_url gauge
console_url{url="https://console-openshift-console.apps.qe-ui46-1013.qe.devcluster.openshift.com"} 1

# oc get proxy cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2020-10-13T00:35:24Z"
  generation: 1
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:trustedCA:
          .: {}
          f:name: {}
      f:status: {}
    manager: cluster-bootstrap
    operation: Update
    time: "2020-10-13T00:35:24Z"
  name: cluster
  resourceVersion: "526"
  selfLink: /apis/config.openshift.io/v1/proxies/cluster
  uid: bbfe7319-3938-44e6-9654-bd7c0bf426a7
spec:
  trustedCA:
    name: ""
status: {}

Comment 4 Jakub Hadvig 2020-12-23 16:11:57 UTC
We did not have time to fix this issue this sprint. Will reevaluate and try to fix in next sprint.

Comment 5 Samuel Padgett 2021-01-21 17:29:32 UTC
Hi, Ya Dan. What is being tested here?

The metrics endpoint is using a service serving certificate, so I would expect the curl command to fail unless you pass the correct CA bundle. This is expected and doesn't indicate that there's a problem. This endpoint is unrelated to the cluster proxy settings (although the proxy settings could change the default CA bundle when exec'ing into the pod). Do you see any alert or error indicating that the metrics aren't being scraped?

Based on the information in the description, I'm closing as NOTABUG. I believe this is expected. If you see any bad effects other than the curl command failing, let us know.