Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1563888 - Installing prometheus should update iptable rules for node-exporter
Installing prometheus should update iptable rules for node-exporter
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring (Show other bugs)
3.9.0
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.11.0
Assigned To: Simon Pasquier
Junqi Zhao
:
Depends On:
Blocks: 1600562 1603144
  Show dependency treegraph
 
Reported: 2018-04-04 21:09 EDT by Gerald Nunn
Modified: 2018-10-11 03:20 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the 9100 port is blocked on all nodes by default. Consequence: Prometheus can't scrape the node_exporter service running on the other nodes and which listens on port 9100. Fix: the firewall configuration is modified to allow incoming TCP traffic for the 9000-1000 port range. Result: Prometheus can scrape the node_exporter services.
Story Points: ---
Clone Of:
: 1600562 1603144 (view as bug list)
Environment:
Last Closed: 2018-10-11 03:19:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
prometheus-node-exporter target (132.80 KB, image/png)
2018-08-28 05:12 EDT, Junqi Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Github openshift/openshift-ansible/issues/7999 None None None 2018-06-13 09:11 EDT
Github openshift/openshift-ansible/pull/9072 None None None 2018-07-12 09:30 EDT
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 03:20 EDT

  None (edit)
Description Gerald Nunn 2018-04-04 21:09:51 EDT
Description of problem:

In OCP 3.9, when you install prometheus it sets up the node-exporter as a daemonset listening on hostport 9100. The problem is that the iptable rules are not configured to allow 9100 and thus scraping fails with "No route to host". For example, this is what I see with debug logging on in prometheus:

level=debug ts=2018-04-05T00:49:22.480744133Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-nodes-exporter target=http://10.0.1.76:9100/metrics  msg="Scrape failed" err="Get http://10.0.1.76:9100/metrics:  dial tcp 10.0.1.76:9100: getsockopt: no route to host"
level=debug ts=2018-04-05T00:49:27.506758234Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-nodes-exporter target=http://10.0.1.65:9100/metrics  msg="Scrape failed" err="Get http://10.0.1.65:9100/metrics:  dial tcp 10.0.1.65:9100: getsockopt: no route to host"
...

Using the update_firewall.yml playbook from https://github.com/wkulhanek/openshift-prometheus/tree/master/node-exporter fixes the problem.

Version-Release number of selected component (if applicable):


How reproducible:

Always in AWS

Steps to Reproduce:
1. Install prometheus using advanced installer with openshift_hosted_prometheus_deploy=true in inventory
2.
3.

Actual results:

Scraping fails due to lack of iptable rule for 9100

Expected results:

Installer configures iptable rule for 9100, scraping works

Additional info:

I see other errors for scraping on port 1936, not sure if it's related:

level=debug ts=2018-04-05T01:04:27.626451171Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-service-endpoints target=http://10.0.1.152:1936/metrics  msg="Scrape failed" err="server returned HTTP status 403 Forbidden"
level=debug ts=2018-04-05T01:05:27.626622466Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-service-endpoints target=http://10.0.1.152:1936/metrics  msg="Scrape failed" err="server returned HTTP status 403 Forbidden"
...
Comment 1 Gerald Nunn 2018-04-04 21:14:47 EDT
I'm a prometheus newbie, I see now there is a status page for the targets and port 1936 is for the haproxy router which is being discussed on sme-openshift already.
Comment 2 Josep 'Pep' Turro Mauri 2018-04-09 11:27:18 EDT
(In reply to Gerald Nunn from comment #0)
> Description of problem:
> 
> In OCP 3.9, when you install prometheus it sets up the node-exporter as a
> daemonset listening on hostport 9100. The problem is that the iptable rules
> are not configured to allow 9100 and thus scraping fails with "No route to
> host". For example, this is what I see with debug logging on in prometheus:
> 
> level=debug ts=2018-04-05T00:49:22.480744133Z caller=scrape.go:676
> component="scrape manager" scrape_pool=kubernetes-nodes-exporter
> target=http://10.0.1.76:9100/metrics  msg="Scrape failed" err="Get
> http://10.0.1.76:9100/metrics:  dial tcp 10.0.1.76:9100: getsockopt: no
> route to host"
> level=debug ts=2018-04-05T00:49:27.506758234Z caller=scrape.go:676
> component="scrape manager" scrape_pool=kubernetes-nodes-exporter
> target=http://10.0.1.65:9100/metrics  msg="Scrape failed" err="Get
> http://10.0.1.65:9100/metrics:  dial tcp 10.0.1.65:9100: getsockopt: no
> route to host"
> ...
> 
> Using the update_firewall.yml playbook from
> https://github.com/wkulhanek/openshift-prometheus/tree/master/node-exporter
> fixes the problem.
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> Always in AWS

Just to clarify: the problem isn't specific to AWS, right? 

It's true that different infrastructure providers will require some specific network settings (see e.g. https://github.com/openshift/openshift-ansible/pull/6920 ) but the node exporter port will still need to be opened at the node level.

> Expected results:
> 
> Installer configures iptable rule for 9100, scraping works

Submitted https://github.com/openshift/openshift-ansible/pull/7860 with a suggested fix.

> Additional info:
> level=debug ts=2018-04-05T01:05:27.626622466Z caller=scrape.go:676
> component="scrape manager" scrape_pool=kubernetes-service-endpoints
> target=http://10.0.1.152:1936/metrics  msg="Scrape failed" err="server
> returned HTTP status 403 Forbidden"

As you mentioned, the auth issue with the router metrics is unrelated to firewall ports; filled bug 1565095 to track that.
Comment 3 Gerald Nunn 2018-04-12 08:42:03 EDT
I do not believe it is specific to AWS since the issue is with the node firewall/iptable and not AWS security groups.
Comment 8 Junqi Zhao 2018-06-10 19:56:14 EDT
*** Bug 1589023 has been marked as a duplicate of this bug. ***
Comment 9 Scott Dodson 2018-06-13 11:16:38 EDT
There's a proposed fix in this PR https://github.com/openshift/openshift-ansible/pull/7860
Comment 10 Simon Pasquier 2018-07-12 09:30:16 EDT
https://github.com/openshift/openshift-ansible/pull/9072 has been merged which opens up the 9000-10000 port range (eg including the 9100 port for node_exporter).
Comment 12 Junqi Zhao 2018-08-23 04:23:56 EDT
Depends on Bug 1608288, node-exporter port has changed to 9101
Comment 13 Junqi Zhao 2018-08-26 20:45:05 EDT
Depends on Bug 1608288, node-exporter port has changed to 9102
Comment 14 Junqi Zhao 2018-08-28 05:12:27 EDT
prometheus-node-exporter target could be accessed,
9000:10000 port is added in iptables
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 9000:10000 -j ACCEPT


prometheus-node-exporter-v3.11.0-0.24.0.0

openshift-ansible version
openshift-ansible-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm
openshift-ansible-docs-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm
openshift-ansible-playbooks-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm
openshift-ansible-roles-3.11.0-0.24.0.git.0.3cd1597None.noarch.rpm
Comment 15 Junqi Zhao 2018-08-28 05:12 EDT
Created attachment 1479197 [details]
prometheus-node-exporter target
Comment 17 errata-xmlrpc 2018-10-11 03:19:09 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.