1546033 – Promtheus ansible playbook install results in oauthproxy errors and 3 out of 5 kubernetes-service-endpoints DOWN

Bug 1546033 - Promtheus ansible playbook install results in oauthproxy errors and 3 out of 5 kubernetes-service-endpoints DOWN

Summary: Promtheus ansible playbook install results in oauthproxy errors and 3 out of ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.1
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Paul Gier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-16 07:07 UTC by Diane Feddema
Modified:	2018-10-16 20:09 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-07-30 19:09:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
inventory file for ansible playbook install (4.96 KB, text/plain) 2018-02-16 07:07 UTC, Diane Feddema	no flags	Details
prometheus pod logs (27.76 KB, text/plain) 2018-03-30 09:16 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1638658	unspecified	CLOSED	[3.9] endpoint for alertmamager and alert-buffer gave HTTP response to HTTPS client	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1639082	unspecified	CLOSED	[3.7] endpoint for alertmamager and alert-buffer gave HTTP response to HTTPS client	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2018:1816	None	None	None	2018-07-30 19:10:18 UTC

Internal Links: 1638658 1639082

Description Diane Feddema 2018-02-16 07:07:26 UTC

Created attachment 1396843 [details]
inventory file for ansible playbook install

Description of problem:
Ansible playbook for prometheus with settings as specified in "Openshift Container Platform 3.7 Installation and Configuration" (URL https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/pdf/installation_and_configuration/OpenShift_Container_Platform-3.7-Installation_and_Configuration-en-US.pdf )

results in prometheus installation with 
3 out of 5 kubernetes-service-endpoints DOWN 
(alerts-proxy,alert-buffer & alertmanger endpoints are DOWN)

These endpoint are working (see attached image for more details)
kubernetes-apserver(1/1 up)
kubernetes-cadvisor(2/2 up)
kubernetes-controllers(1/1 up)
kubernetes-nodes(2/2 up)
kubernetes-service-endpoints (2/5 up) 


Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch

rpm -q ansible
ansible-2.4.2.0-1.el7.noarch

ansible --version
ansible 2.4.2.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:
100%
Steps to Reproduce:
1.ansible-playbook -i /root/scripts/inventory.et9 /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml 

note: see attached inventory file, /root/scripts/inventory.et9

2. login to openshift webui as user with cluster-admin role
(e.g. oc policy add-role-to-user cluster-admin <user-name>)
3. Look at logs for pod prometheus 

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

prometheus pod Logs show:
2018/02/16 05:50:00 provider.go:476: Performing OAuth discovery against https://172.30.0.1/.well-known/oauth-authorization-server 
2018/02/16 05:50:00 provider.go:522: 200 GET https://172.30.0.1/.well-known/oauth-authorization-server  {
  "issuer": "https://et9.et.eng.bos.redhat.com:8443 ",
  "authorization_endpoint": "https://et9.com:8443/oauth/authorize ",
  "token_endpoint": "https://et9.et.eng.bos.redhat.com:8443/oauth/token ",
  "scopes_supported": [
    "user:check-access",
    "user:full",
    "user:info",
    "user:list-projects",
    "user:list-scoped-projects"
  ],
  "response_types_supported": [
    "code",
    "token"
  ],
  "grant_types_supported": [
    "authorization_code",
    "implicit"
  ],
  "code_challenge_methods_supported": [
    "plain",
    "S256"
  ]
}
2018/02/16 05:50:00 provider.go:265: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2018/02/16 05:50:00 oauthproxy.go:161: mapping path "/" => upstream "http://localhost:9090 "
2018/02/16 05:50:00 oauthproxy.go:184: compiled skip-auth-regex => "^/metrics"
2018/02/16 05:50:00 oauthproxy.go:190: OAuthProxy configured for  Client ID: system:serviceaccount:openshift-metrics:prometheus
2018/02/16 05:50:00 oauthproxy.go:200: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2018/02/16 05:50:00 http.go:96: HTTPS: listening on [::]:8443
2018/02/16 05:53:36 oauthproxy.go:657: 10.129.0.1:34596 Cookie Signature not valid
2018/02/16 05:53:36 oauthproxy.go:657: 10.129.0.1:34596 Cookie Signature not valid
Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Paul Gier 2018-03-02 03:48:24 UTC

The issue with two of the service endpoints being down seems to be that prometheus is automatically discovering the containerPorts defined in the stateful set, and it probably shouldn't be trying to scrape those since they are also discovered via the exposed services.

The alertbuffer scrape is failing because the /metrics path requires authentication, and it should probably be set up to skip auth similar to what the prom-proxy and alertmanager-proxy are doing.

PR for openshift-ansible: https://github.com/openshift/openshift-ansible/pull/7356
PR for origin: https://github.com/openshift/origin/pull/18802

Would you mind opening a separate issue for the oauth proxy errors?  That may need to be assigned to the security team.

Comment 2 Paul Gier 2018-03-21 20:05:40 UTC

origin and openshift-ansible PRs have been merged to master, so the fix will be in 3.10

Comment 4 Junqi Zhao 2018-03-30 09:16:19 UTC

Created attachment 1415043 [details]
prometheus pod logs

Comment 6 errata-xmlrpc 2018-07-30 19:09:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Note You need to log in before you can comment on or make changes to this bug.