Bug 1546033 - Promtheus ansible playbook install results in oauthproxy errors and 3 out of 5 kubernetes-service-endpoints DOWN
Summary: Promtheus ansible playbook install results in oauthproxy errors and 3 out of ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.1
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 3.10.0
Assignee: Paul Gier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-16 07:07 UTC by Diane Feddema
Modified: 2018-10-16 20:09 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-07-30 19:09:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
inventory file for ansible playbook install (4.96 KB, text/plain)
2018-02-16 07:07 UTC, Diane Feddema
no flags Details
prometheus pod logs (27.76 KB, text/plain)
2018-03-30 09:16 UTC, Junqi Zhao
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:10:18 UTC
Red Hat Bugzilla 1638658 'unspecified' 'CLOSED' '[3.9] endpoint for alertmamager and alert-buffer gave HTTP response to HTTPS client' 2019-11-23 10:21:43 UTC
Red Hat Bugzilla 1639082 'unspecified' 'CLOSED' '[3.7] endpoint for alertmamager and alert-buffer gave HTTP response to HTTPS client' 2019-11-23 10:21:43 UTC

Internal Links: 1638658 1639082

Description Diane Feddema 2018-02-16 07:07:26 UTC
Created attachment 1396843 [details]
inventory file for ansible playbook install

Description of problem:
Ansible playbook for prometheus with settings as specified in "Openshift Container Platform 3.7 Installation and Configuration" (URL https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/pdf/installation_and_configuration/OpenShift_Container_Platform-3.7-Installation_and_Configuration-en-US.pdf )

results in prometheus installation with 
3 out of 5 kubernetes-service-endpoints DOWN 
(alerts-proxy,alert-buffer & alertmanger endpoints are DOWN)

These endpoint are working (see attached image for more details)
kubernetes-apserver(1/1 up)
kubernetes-cadvisor(2/2 up)
kubernetes-controllers(1/1 up)
kubernetes-nodes(2/2 up)
kubernetes-service-endpoints (2/5 up) 


Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch

rpm -q ansible
ansible-2.4.2.0-1.el7.noarch

ansible --version
ansible 2.4.2.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:
100%
Steps to Reproduce:
1.ansible-playbook -i /root/scripts/inventory.et9 /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml 

note: see attached inventory file, /root/scripts/inventory.et9

2. login to openshift webui as user with cluster-admin role
(e.g. oc policy add-role-to-user cluster-admin <user-name>)
3. Look at logs for pod prometheus 

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

prometheus pod Logs show:
2018/02/16 05:50:00 provider.go:476: Performing OAuth discovery against https://172.30.0.1/.well-known/oauth-authorization-server 
2018/02/16 05:50:00 provider.go:522: 200 GET https://172.30.0.1/.well-known/oauth-authorization-server  {
  "issuer": "https://et9.et.eng.bos.redhat.com:8443 ",
  "authorization_endpoint": "https://et9.com:8443/oauth/authorize ",
  "token_endpoint": "https://et9.et.eng.bos.redhat.com:8443/oauth/token ",
  "scopes_supported": [
    "user:check-access",
    "user:full",
    "user:info",
    "user:list-projects",
    "user:list-scoped-projects"
  ],
  "response_types_supported": [
    "code",
    "token"
  ],
  "grant_types_supported": [
    "authorization_code",
    "implicit"
  ],
  "code_challenge_methods_supported": [
    "plain",
    "S256"
  ]
}
2018/02/16 05:50:00 provider.go:265: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2018/02/16 05:50:00 oauthproxy.go:161: mapping path "/" => upstream "http://localhost:9090 "
2018/02/16 05:50:00 oauthproxy.go:184: compiled skip-auth-regex => "^/metrics"
2018/02/16 05:50:00 oauthproxy.go:190: OAuthProxy configured for  Client ID: system:serviceaccount:openshift-metrics:prometheus
2018/02/16 05:50:00 oauthproxy.go:200: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2018/02/16 05:50:00 http.go:96: HTTPS: listening on [::]:8443
2018/02/16 05:53:36 oauthproxy.go:657: 10.129.0.1:34596 Cookie Signature not valid
2018/02/16 05:53:36 oauthproxy.go:657: 10.129.0.1:34596 Cookie Signature not valid
Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Paul Gier 2018-03-02 03:48:24 UTC
The issue with two of the service endpoints being down seems to be that prometheus is automatically discovering the containerPorts defined in the stateful set, and it probably shouldn't be trying to scrape those since they are also discovered via the exposed services.

The alertbuffer scrape is failing because the /metrics path requires authentication, and it should probably be set up to skip auth similar to what the prom-proxy and alertmanager-proxy are doing.

PR for openshift-ansible: https://github.com/openshift/openshift-ansible/pull/7356
PR for origin: https://github.com/openshift/origin/pull/18802

Would you mind opening a separate issue for the oauth proxy errors?  That may need to be assigned to the security team.

Comment 2 Paul Gier 2018-03-21 20:05:40 UTC
origin and openshift-ansible PRs have been merged to master, so the fix will be in 3.10

Comment 4 Junqi Zhao 2018-03-30 09:16:19 UTC
Created attachment 1415043 [details]
prometheus pod logs

Comment 6 errata-xmlrpc 2018-07-30 19:09:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.