1552235 – Prometheus is unable to scrape hosted router components due to iptables rules from openshift-ansible

Bug 1552235 - Prometheus is unable to scrape hosted router components due to iptables rules from openshift-ansible

Summary: Prometheus is unable to scrape hosted router components due to iptables rules...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.7.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.10.z
Assignee:	Paul Gier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1589023 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-06 18:50 UTC by David H
Modified:	2019-03-06 08:42 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The default firewall settings block the router stats/metrics port. Consequence: This prevents prometheus from collecting the metrics from the openshift router. Fix: Open the firewall to allow connections to the router stats port. Result: Prometheus can now collect metrics from the router.
Clone Of:
Environment:
Last Closed:	2019-02-20 10:11:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
router targets are up (84.79 KB, image/png) 2019-02-12 05:50 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift openshift-ansible pull 6636	None	closed	Expose router stats port when prometheus is installed	2020-12-02 08:11:30 UTC
Red Hat Bugzilla	1625510	low	CLOSED	[3.11]router targets is down in prometheus targets UI	2022-08-04 22:20:47 UTC
Red Hat Product Errata	RHBA-2019:0328	None	None	None	2019-02-20 10:11:17 UTC

Internal Links: 1625510

Description David H 2018-03-06 18:50:45 UTC

Description of problem:

When deploying prometheus (same issue seems present in master) -- https://prometheus-openshift-metrics.<cluster fqdn>/targets shows  "getsockopt: no route to host" when trying to scrape the /metrics endpoint on the OpenShift hosted routers

Version-Release number of selected component (if applicable):
Seen in release-3.7 but no fundamental changes in master that were evident that might change this. IPTables on nodes where the hosted router is deployed are not updated to expose this port

How reproducible:
Consistent

Steps to Reproduce:
1. Deploy openshift-ansible with prometheus and hosted router
2. Check prometheus target status

Actual results:

http://<node ip>:1936/metrics DOWN	instance="<node ip>:1936" ... Get http://<node ip>:1936/metrics: dial tcp <node ip>:1936: getsockopt: no route to host

Expected results:

Expect that all "kubernetes-service-endpoints" scrape targets are Green

Additional info:

Initial PR proposed https://github.com/openshift/openshift-ansible/pull/6636/files
Some concerns raised with the way the firewall module interacts with the actual hosts where the hosted router runs but needs feedback on how the deployment team wants to see this executed or any proposed alternative

Comment 1 Junqi Zhao 2018-06-10 23:58:23 UTC

*** Bug 1589023 has been marked as a duplicate of this bug. ***

Comment 3 Frederic Branczyk 2018-09-07 08:02:55 UTC

*** Bug 1625510 has been marked as a duplicate of this bug. ***

Comment 6 Klaas Demter 2018-11-05 08:39:44 UTC

The upstream issue was closed, this is not correct. I still can't access all routers in a multi-infrastructure node setup. It can only access one router -- my guess: the one thats running on the same node as prometheus.

Comment 12 kedar 2019-01-15 05:29:02 UTC

Hello Team,

Any updates on this issue

Regards,
Kedar

Comment 13 Paul Gier 2019-01-22 21:47:31 UTC

I don't see an easy way to open the router metrics port (1936) during install for only the router nodes since the node firewall configuration takes place mostly before anything is done with the routers.  Also, even if we could do that, I'm not sure how it would work post install if for example you wanted to move a router to a different node, you'd still need to manually open that port.  So I've created a PR against 3.10 to optionally open that port for all nodes during install.
https://github.com/openshift/openshift-ansible/pull/11052

Comment 15 Junqi Zhao 2019-02-12 05:50:05 UTC

Tested with 
openshift-ansible-3.10.110-1.git.0.1e03ab3.el7.noarch.rpm
openshift-ansible-docs-3.10.110-1.git.0.1e03ab3.el7.noarch.rpm
openshift-ansible-playbooks-3.10.110-1.git.0.1e03ab3.el7.noarch.rpm
openshift-ansible-roles-3.10.110-1.git.0.1e03ab3.el7.noarch.rpm
oopenshift-ansible-test-3.10.110-1.git.0.1e03ab3.el7.noarch.rpm

1936 port are opened for all nodes after install
# iptables-save | grep 1936
-A KUBE-SEP-DFSWOTRTOQBRAYA4 -s 10.0.77.74/32 -m comment --comment "default/router:1936-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-DFSWOTRTOQBRAYA4 -p tcp -m comment --comment "default/router:1936-tcp" -m tcp -j DNAT --to-destination 10.0.77.74:1936
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.139.20/32 -p tcp -m comment --comment "default/router:1936-tcp cluster IP" -m tcp --dport 1936 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.139.20/32 -p tcp -m comment --comment "default/router:1936-tcp cluster IP" -m tcp --dport 1936 -j KUBE-SVC-4JCRTMMYZAAYMIJ2
-A KUBE-SVC-4JCRTMMYZAAYMIJ2 -m comment --comment "default/router:1936-tcp" -j KUBE-SEP-DFSWOTRTOQBRAYA4
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 1936 -j ACCEPT

Comment 16 Junqi Zhao 2019-02-12 05:50:40 UTC

Created attachment 1533903 [details]
router targets are up

Comment 18 errata-xmlrpc 2019-02-20 10:11:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0328

Note You need to log in before you can comment on or make changes to this bug.