+++ This bug was initially created as a clone of Bug #1844384 +++ Description of problem: OpenStack keepalive health check only fails on connection errors: https://github.com/openshift/machine-config-operator/blame/master/templates/master/00-master/openstack/files/openstack-keepalived-keepalived.yaml#L6 Background: `curl -s` does not fail on non-200 errors with successful tcp connect. --- Additional comment from Antonio Murdaca on 2020-06-05 11:53:18 CEST --- Moving to the openstack owners
Checked with 4.6.0-0.nightly-2020-06-26-035408, moved to verified. $ oc version Client Version: 4.6.0-202006270004.p0-ad8b00f Server Version: 4.6.0-0.nightly-2020-06-26-035408 Kubernetes Version: v1.18.3+8871b3d $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-06-26-035408 True False 17m Cluster version is 4.6.0-0.nightly-2020-06-26-035408 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj46ios629a-b4pbw-master-0 Ready master 36m v1.18.3+ba54539 192.168.2.202 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev wj46ios629a-b4pbw-master-1 Ready master 36m v1.18.3+ba54539 192.168.1.6 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev wj46ios629a-b4pbw-master-2 Ready master 36m v1.18.3+ba54539 192.168.1.184 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev wj46ios629a-b4pbw-worker-j9zl8 Ready worker 20m v1.18.3+ba54539 192.168.2.27 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev wj46ios629a-b4pbw-worker-mjrfc Ready worker 22m v1.18.3+ba54539 192.168.3.56 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev wj46ios629a-b4pbw-worker-mwdbk Ready worker 24m v1.18.3+ba54539 192.168.2.119 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev $ oc get pods -n openshift-openstack-infra -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-wj46ios629a-b4pbw-master-0 1/1 Running 0 35m 192.168.2.202 wj46ios629a-b4pbw-master-0 <none> <none> coredns-wj46ios629a-b4pbw-master-1 1/1 Running 0 35m 192.168.1.6 wj46ios629a-b4pbw-master-1 <none> <none> coredns-wj46ios629a-b4pbw-master-2 1/1 Running 0 35m 192.168.1.184 wj46ios629a-b4pbw-master-2 <none> <none> coredns-wj46ios629a-b4pbw-worker-j9zl8 1/1 Running 0 21m 192.168.2.27 wj46ios629a-b4pbw-worker-j9zl8 <none> <none> coredns-wj46ios629a-b4pbw-worker-mjrfc 1/1 Running 0 21m 192.168.3.56 wj46ios629a-b4pbw-worker-mjrfc <none> <none> coredns-wj46ios629a-b4pbw-worker-mwdbk 1/1 Running 0 23m 192.168.2.119 wj46ios629a-b4pbw-worker-mwdbk <none> <none> haproxy-wj46ios629a-b4pbw-master-0 2/2 Running 0 35m 192.168.2.202 wj46ios629a-b4pbw-master-0 <none> <none> haproxy-wj46ios629a-b4pbw-master-1 2/2 Running 0 35m 192.168.1.6 wj46ios629a-b4pbw-master-1 <none> <none> haproxy-wj46ios629a-b4pbw-master-2 2/2 Running 0 35m 192.168.1.184 wj46ios629a-b4pbw-master-2 <none> <none> keepalived-wj46ios629a-b4pbw-master-0 1/1 Running 0 35m 192.168.2.202 wj46ios629a-b4pbw-master-0 <none> <none> keepalived-wj46ios629a-b4pbw-master-1 1/1 Running 0 35m 192.168.1.6 wj46ios629a-b4pbw-master-1 <none> <none> keepalived-wj46ios629a-b4pbw-master-2 1/1 Running 0 35m 192.168.1.184 wj46ios629a-b4pbw-master-2 <none> <none> keepalived-wj46ios629a-b4pbw-worker-j9zl8 1/1 Running 0 20m 192.168.2.27 wj46ios629a-b4pbw-worker-j9zl8 <none> <none> keepalived-wj46ios629a-b4pbw-worker-mjrfc 1/1 Running 0 21m 192.168.3.56 wj46ios629a-b4pbw-worker-mjrfc <none> <none> keepalived-wj46ios629a-b4pbw-worker-mwdbk 1/1 Running 0 23m 192.168.2.119 wj46ios629a-b4pbw-worker-mwdbk <none> <none> mdns-publisher-wj46ios629a-b4pbw-master-0 1/1 Running 0 35m 192.168.2.202 wj46ios629a-b4pbw-master-0 <none> <none> mdns-publisher-wj46ios629a-b4pbw-master-1 1/1 Running 0 35m 192.168.1.6 wj46ios629a-b4pbw-master-1 <none> <none> mdns-publisher-wj46ios629a-b4pbw-master-2 1/1 Running 0 35m 192.168.1.184 wj46ios629a-b4pbw-master-2 <none> <none> mdns-publisher-wj46ios629a-b4pbw-worker-j9zl8 1/1 Running 0 20m 192.168.2.27 wj46ios629a-b4pbw-worker-j9zl8 <none> <none> mdns-publisher-wj46ios629a-b4pbw-worker-mjrfc 1/1 Running 0 21m 192.168.3.56 wj46ios629a-b4pbw-worker-mjrfc <none> <none> mdns-publisher-wj46ios629a-b4pbw-worker-mwdbk 1/1 Running 0 24m 192.168.2.119 wj46ios629a-b4pbw-worker-mwdbk <none> <none> $ oc -n openshift-openstack-infra rsh keepalived-wj46ios629a-b4pbw-master-0 sh-4.2# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 123020 6912 ? Ss 01:33 0:00 /usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp --log-detail --log-console root 8 0.1 0.0 127288 6244 ? S 01:33 0:03 /usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp --log-detail --log-console root 20500 0.0 0.0 11836 2792 pts/0 Ss 02:11 0:00 /bin/sh root 20548 0.0 0.0 51768 3472 pts/0 R+ 02:11 0:00 ps aux sh-4.2# cat /etc/keepalived/keepalived.conf sh-4.2# cat /etc/keepalived/keepalived.conf vrrp_script chk_ocp { script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz" interval 1 weight 50 } # TODO: Improve this check. The port is assumed to be alive. # Need to assess what is the ramification if the port is not there. vrrp_script chk_ingress { script "/usr/bin/curl -o /dev/null -Lfs http://localhost:1936/healthz/ready" interval 1 weight 50 } vrrp_instance wj46ios629a_API { state BACKUP interface ens3 virtual_router_id 197 priority 40 advert_int 1 authentication { auth_type PASS auth_pass wj46ios629a_api_vip } virtual_ipaddress { 192.168.0.5/18 } track_script { chk_ocp } } vrrp_instance wj46ios629a_INGRESS { state BACKUP interface ens3 virtual_router_id 180 priority 40 advert_int 1 authentication { auth_type PASS auth_pass wj46ios629a_ingress_vip } virtual_ipaddress { 192.168.0.7/18 } track_script { chk_ingress } }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196