Bug 1659441
Summary: | Port range from 9000-10000 need to be opened from iptables explicitly after upgrading from 3.10 to 3.11 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Saurabh Sadhale <ssadhale> |
Component: | Monitoring | Assignee: | Simon Pasquier <spasquie> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.11.0 | CC: | adeshpan, aos-bugs, bbennett, cdc, ddelcian, fbranczy, gferrazs, jokerman, juzhao, lmeyer, minden, mmccomas, Nikolaus.Eppinger, pkanthal, spasquie, surbania, travi, vrutkovs, wmeng |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 3.11.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: the 9000-10000 port range isn't opened after upgrading from 3.10 to 3.11.
Consequence: Prometheus can't scrape the infrastructure services such as node_exporter.
Fix: the upgrade playbook is modified to apply the updated firewall rules.
Result: Prometheus can scrape the infrastructure services such as node_exporter.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-02-20 14:11:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Saurabh Sadhale
2018-12-14 11:37:47 UTC
Description of problem: After upgrade from 3.10 to 3.11 for cluster monitoring the port 9100 was not opened for iptables and needed to open it explicitly to make it work. Version-Release number of selected component (if applicable): rpm -qa | grep openshift-ansible openshift-ansible-playbooks-3.11.43-1.git.0.fa69a02.el7.noarch openshift-ansible-docs-3.11.43-1.git.0.fa69a02.el7.noarch openshift-ansible-3.11.43-1.git.0.fa69a02.el7.noarch openshift-ansible-roles-3.11.43-1.git.0.fa69a02.el7.noarch How reproducible: Steps to Reproduce: 1.Upgrade from 3.10 to 3.11 2. 3. Actual results: The port 9100 was not opened Expected results: The cluster monitoring should work without the need to open the port explicitly. Additional info: This is by design... we do not want to manipulate the node's iptables rules to allow this. The cluster admin has to choose what traffic they want to allow in to the cluster and make the appropriate changes. the node-exporter pods don't work without this port and prometheus needed this funktional. i have found BZ 1563888 but in this case this ports were not opened by update Moving to the install team since if the port is supposed to be opened when Prometheus is installed, then that would be the team to do it. (In reply to Ben Bennett from comment #8) > Moving to the install team since if the port is supposed to be opened when > Prometheus is installed, then that would be the team to do it. Shouldn't Prometheus operator take care of that? we have updated from 3.7 to 3.11 PR to the openshift-ansible repository to run the firewall task during upgrade: https://github.com/openshift/openshift-ansible/pull/11186 The upstream PR has been merged. @Junqi Zhao, can it be moved to ON_QA? (In reply to Simon Pasquier from comment #32) > The upstream PR has been merged. @Junqi Zhao, can it be moved to ON_QA? ON_QA now, will verify it today, but I am afraid since we did not have the original error in our environment and port range from 9000-10000 already opened before the fix, for this bug, what could I do now is only upgrade OCP from 3.10 to 3.11 and install cluster monitoring, then check if port range from 9000-10000 are opened. Do you have some suggestions on how to verify it? To test locally, I have removed manually the iptables rule that opened ports 9000-10000 then apply the upgrade playbook and check that the rule is added back. Tested with ose-ansible:3.11.82-5 images which packaged openshift-ansible-3.11.82-3.git.0.9718d0a.el7, issue is fixed, 9000-10000 ports are opened from iptables # docker run -u root --rm -it --entrypoint=/bin/rpm brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible:v3.11.82-5 -qa | grep ansible shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory ansible-2.6.13-1.el7ae.noarch openshift-ansible-roles-3.11.82-3.git.0.9718d0a.el7.noarch openshift-ansible-3.11.82-3.git.0.9718d0a.el7.noarch openshift-ansible-playbooks-3.11.82-3.git.0.9718d0a.el7.noarch openshift-ansible-docs-3.11.82-3.git.0.9718d0a.el7.noarch Steps: 1. install one 310 env firstly 2. removed manually the iptables rule 3. upgrade OCP from 310 to 311 3. check port # iptables-save | grep 9000:10000 -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 9000:10000 -j ACCEPT # netstat -nalp | grep '9101' tcp 0 0 127.0.0.1:9101 0.0.0.0:* LISTEN 15582/node_exporter tcp 0 0 127.0.0.1:52168 127.0.0.1:9101 ESTABLISHED 15703/kube-rbac-pro tcp 0 0 127.0.0.1:9101 127.0.0.1:52168 ESTABLISHED 15582/node_exporter Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0326 |