Bug 1847082
| Summary: | [IPI baremetal] baremetal-runtimecfg k8s health-check use hardcoded IPv4 local address (127.0.0.1) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yossi Boaron <yboaron> |
| Component: | Installer | Assignee: | Ben Nemec <bnemec> |
| Installer sub component: | OpenShift on Bare Metal IPI | QA Contact: | Aleksandra Malykhin <amalykhi> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | asegurap, bnemec |
| Version: | 4.5 | Keywords: | Triaged |
| Target Milestone: | --- | Flags: | bnemec:
needinfo-
|
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Use of ipv4 address in ipv6 deployment
Consequence:
Fix:
Result:
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-13 16:49:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yossi Boaron
2020-06-15 15:46:58 UTC
*** Bug 1847083 has been marked as a duplicate of this bug. *** *** Bug 1847086 has been marked as a duplicate of this bug. *** This was fixed by https://github.com/openshift/baremetal-runtimecfg/pull/68 Verified on 4.6.0-0.nightly-2020-07-15-065024, see the detailes below: On the first terminal: [kni@provisionhost-0-0 ~]$ ssh core.qe.lab.redhat.com hostname master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com [kni@provisionhost-0-0 ~]$ oc get pods -n openshift-apiserver -o wide apiserver-6bbb844d98-hd924 1/1 Running 0 19m fd01:0:0:2::f master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com <none> <none> [kni@provisionhost-0-0 ~]$ oc rsh -n openshift-apiserver apiserver-6bbb844d98-hd924 sh-4.2# while true; do curl -k https://localhost:8443/readyz; done okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokoko... On the second terminal: [kni@provisionhost-0-0 ~]$ oc debug node/master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com ... sh-4.2# chroot /host sh-4.4# bash [root@master-0-0 /]# ps aux | grep "openshift-apiserver start" root 139003 6.6 0.6 1761432 205092 ? Ssl 13:51 2:14 openshift-apiserver start --config=/var/run/configmaps/config/config.yaml -v=2 ... [root@master-0-0 /]# kill -INT 139003 First terminal output: ...okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokcommand terminated with exit code 137 ============================================================================================= Related issues: on the first terminal I got the error: [kni@provisionhost-0-0 ~]$ oc debug node/master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com Starting pod/master-0-0ocp-edge-cluster-0qelabredhatcom-debug ... To use host binaries, run `chroot /host` Removing debug pod ... error: Back-off pulling image "registry.redhat.io/rhel7/support-tools" The bug has already opened here: https://bugzilla.redhat.com/show_bug.cgi?id=1782852 Used workaround from there: $ oc tag -d openshift/tools:latest $ oc tag -n openshift $(oc get pods -n openshift-multus -l app=multus -o jsonpath='{.items[0].spec.containers[?(@.name=="kube-multus")].image}') tools:latest $ oc get imagetag -n openshift tools:latest Link for the workaround in the previous comment is incorrect. See here: https://bugzilla.redhat.com/show_bug.cgi?id=1728135#c32 Verified on 4.6.0-0.nightly-2020-07-15-065024, see the details below:
After shut down all of the kube-apiservers, the haproxy-monitor removed the firewall rule in less than 30 seconds (greater than 30 seconds would suggest it's still using /healthz).
(verified that we didn't break anything with this fix)
[kni@provisionhost-0-0 ~]$ ssh core.qe.lab.redhat.com
[core@master-0-2 ~]$ while true; do sleep 1; sudo crictl rm -f $(sudo crictl ps --name haproxy | awk 'FNR==2{ print $1}'); done
[core@master-0-2 ~]$ date
Sun Jul 19 07:01:15 UTC 2020
[core@master-0-2 ~]$ sudo cat /var/log/pods/openshift-kni-infra_haproxy-master-0-2.ocp-edge-cluster-0.qe.lab.redhat.com_1bea838fdefc74b7bc393e1b9a638c96/haproxy-monitor/2.log
2020-07-19T07:01:22.602585718+00:00 stderr F time="2020-07-19T07:01:22Z" level=info msg="API is not reachable through HAProxy"
2020-07-19T07:01:22.633405741+00:00 stderr F time="2020-07-19T07:01:22Z" level=info msg="Config change detected" configChangeCtr=1 curConfig="{6443 9445 50000 [{master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com fd2e:6f44:5dd8::138 6443} {master-0-2.ocp-edge-cluster-0.qe.lab.redhat.com fd2e:6f44:5dd8::13d 6443} {master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com fd2e:6f44:5dd8::143 6443}] ::}"
Ben, please advise if I have anything else to check?
As discussed, the ticket is verified by https://bugzilla.redhat.com/show_bug.cgi?id=1847082#c4 In the case when the connection fails, localhost resolves to both ipv4 and ipv6 and automatically handle both. [kni@provisionhost-0-0 ~]$ oc rsh -n openshift-apiserver apiserver-6bbb844d98-pjxsg sh-4.2# curl -k https://localhost:6443/readyz curl: (7) Failed connect to localhost:6443; Connection refused sh-4.2# curl -k -vvv https://localhost:6443/readyz * About to connect() to localhost port 6443 (#0) * Trying ::1... * Connection refused * Trying 127.0.0.1... * Connection refused * Failed connect to localhost:6443; Connection refused * Closing connection 0 curl: (7) Failed connect to localhost:6443; Connection refused |