Red Hat Bugzilla – Bug 1563888
Installing prometheus should update iptable rules for node-exporter
Last modified: 2018-10-11 03:20:12 EDT
Description of problem: In OCP 3.9, when you install prometheus it sets up the node-exporter as a daemonset listening on hostport 9100. The problem is that the iptable rules are not configured to allow 9100 and thus scraping fails with "No route to host". For example, this is what I see with debug logging on in prometheus: level=debug ts=2018-04-05T00:49:22.480744133Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-nodes-exporter target=http://10.0.1.76:9100/metrics msg="Scrape failed" err="Get http://10.0.1.76:9100/metrics: dial tcp 10.0.1.76:9100: getsockopt: no route to host" level=debug ts=2018-04-05T00:49:27.506758234Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-nodes-exporter target=http://10.0.1.65:9100/metrics msg="Scrape failed" err="Get http://10.0.1.65:9100/metrics: dial tcp 10.0.1.65:9100: getsockopt: no route to host" ... Using the update_firewall.yml playbook from https://github.com/wkulhanek/openshift-prometheus/tree/master/node-exporter fixes the problem. Version-Release number of selected component (if applicable): How reproducible: Always in AWS Steps to Reproduce: 1. Install prometheus using advanced installer with openshift_hosted_prometheus_deploy=true in inventory 2. 3. Actual results: Scraping fails due to lack of iptable rule for 9100 Expected results: Installer configures iptable rule for 9100, scraping works Additional info: I see other errors for scraping on port 1936, not sure if it's related: level=debug ts=2018-04-05T01:04:27.626451171Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-service-endpoints target=http://10.0.1.152:1936/metrics msg="Scrape failed" err="server returned HTTP status 403 Forbidden" level=debug ts=2018-04-05T01:05:27.626622466Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-service-endpoints target=http://10.0.1.152:1936/metrics msg="Scrape failed" err="server returned HTTP status 403 Forbidden" ...
I'm a prometheus newbie, I see now there is a status page for the targets and port 1936 is for the haproxy router which is being discussed on sme-openshift already.
(In reply to Gerald Nunn from comment #0) > Description of problem: > > In OCP 3.9, when you install prometheus it sets up the node-exporter as a > daemonset listening on hostport 9100. The problem is that the iptable rules > are not configured to allow 9100 and thus scraping fails with "No route to > host". For example, this is what I see with debug logging on in prometheus: > > level=debug ts=2018-04-05T00:49:22.480744133Z caller=scrape.go:676 > component="scrape manager" scrape_pool=kubernetes-nodes-exporter > target=http://10.0.1.76:9100/metrics msg="Scrape failed" err="Get > http://10.0.1.76:9100/metrics: dial tcp 10.0.1.76:9100: getsockopt: no > route to host" > level=debug ts=2018-04-05T00:49:27.506758234Z caller=scrape.go:676 > component="scrape manager" scrape_pool=kubernetes-nodes-exporter > target=http://10.0.1.65:9100/metrics msg="Scrape failed" err="Get > http://10.0.1.65:9100/metrics: dial tcp 10.0.1.65:9100: getsockopt: no > route to host" > ... > > Using the update_firewall.yml playbook from > https://github.com/wkulhanek/openshift-prometheus/tree/master/node-exporter > fixes the problem. > > Version-Release number of selected component (if applicable): > > > How reproducible: > > Always in AWS Just to clarify: the problem isn't specific to AWS, right? It's true that different infrastructure providers will require some specific network settings (see e.g. https://github.com/openshift/openshift-ansible/pull/6920 ) but the node exporter port will still need to be opened at the node level. > Expected results: > > Installer configures iptable rule for 9100, scraping works Submitted https://github.com/openshift/openshift-ansible/pull/7860 with a suggested fix. > Additional info: > level=debug ts=2018-04-05T01:05:27.626622466Z caller=scrape.go:676 > component="scrape manager" scrape_pool=kubernetes-service-endpoints > target=http://10.0.1.152:1936/metrics msg="Scrape failed" err="server > returned HTTP status 403 Forbidden" As you mentioned, the auth issue with the router metrics is unrelated to firewall ports; filled bug 1565095 to track that.
I do not believe it is specific to AWS since the issue is with the node firewall/iptable and not AWS security groups.
*** Bug 1589023 has been marked as a duplicate of this bug. ***
There's a proposed fix in this PR https://github.com/openshift/openshift-ansible/pull/7860
https://github.com/openshift/openshift-ansible/pull/9072 has been merged which opens up the 9000-10000 port range (eg including the 9100 port for node_exporter).
Depends on Bug 1608288, node-exporter port has changed to 9101
Depends on Bug 1608288, node-exporter port has changed to 9102
prometheus-node-exporter target could be accessed, 9000:10000 port is added in iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 9000:10000 -j ACCEPT prometheus-node-exporter-v3.11.0-0.24.0.0 openshift-ansible version openshift-ansible-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm openshift-ansible-docs-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm openshift-ansible-playbooks-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm openshift-ansible-roles-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm
Created attachment 1479197 [details] prometheus-node-exporter target
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652