Description of problem: All the Keepalived pods fail with the Liveness probe error. openshift-openstack-infra 1h9m Warning Unhealthy pod/keepalived-ocp-xlkqn-master-0 Liveness probe failed: command timed out openshift-openstack-infra 1h26m Warning Unhealthy pod/keepalived-ocp-xlkqn-master-1 Liveness probe failed: command timed out openshift-openstack-infra 6m24s Warning Unhealthy pod/keepalived-ocp-xlkqn-master-2 Liveness probe failed: command timed out openshift-openstack-infra 29m Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-5zz4m Liveness probe failed: command timed out openshift-openstack-infra 19m Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-9c65w Liveness probe failed: command timed out openshift-openstack-infra 4h47m Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-cq99b Liveness probe failed: command timed out openshift-openstack-infra 2m57s Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-fjlfd Liveness probe failed: command timed out openshift-openstack-infra 9m8s Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-ksmp9 Liveness probe failed: command timed out openshift-openstack-infra 1h40m Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-lhnkw Liveness probe failed: command timed out openshift-openstack-infra 14m Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-ttt4m Liveness probe failed: command timed out openshift-openstack-infra 4h14m Warning Unhealthy pod/keepalived-ocp-xlkqn-worker-0-zl8ff Liveness probe failed: command timed out openshift-openstack-infra 5h7m Warning Unhealthy pod/mdns-publisher-ocp-xlkqn-worker-0-zl8ff Liveness probe failed: command timed out How reproducible: Install a cluster 4.7 on OpenStack using IPI Steps to Reproduce: Install a cluster 4.7 on OpenStack using IPI Actual results: The alerts are getting triggered with liveness probe failed for keepalived pods but there is no actual error seen in the cluster Expected results: Alerts should not be triggered if there is no issue. Additional info:
Hmm, interesting. Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1949664, but not quite the same thing. I've gone ahead and backported that fix to 4.7, but I think this one may require a different fix because I still see the same behavior reported here on my local 4.8 cluster.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759