Bug 1986757 - Keepalived fails with Liveness probe failed: command timed out
Summary: Keepalived fails with Liveness probe failed: command timed out
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: x86_64
OS: All
high
high
Target Milestone: ---
: 4.9.0
Assignee: Ben Nemec
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks: 1999531
TreeView+ depends on / blocked
 
Reported: 2021-07-28 09:47 UTC by Sonigra Saurab
Modified: 2021-10-20 13:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:42:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2703 0 None None None 2021-08-04 16:15:42 UTC
Red Hat Knowledge Base (Solution) 6249711 0 None None None 2021-08-09 14:52:42 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:43:06 UTC

Description Sonigra Saurab 2021-07-28 09:47:49 UTC
Description of problem:

All the Keepalived pods fail with the Liveness probe error.

openshift-openstack-infra                         1h9m       Warning  Unhealthy                pod/keepalived-ocp-xlkqn-master-0                                    Liveness probe failed: command timed out
openshift-openstack-infra                         1h26m      Warning  Unhealthy                pod/keepalived-ocp-xlkqn-master-1                                    Liveness probe failed: command timed out
openshift-openstack-infra                         6m24s      Warning  Unhealthy                pod/keepalived-ocp-xlkqn-master-2                                    Liveness probe failed: command timed out
openshift-openstack-infra                         29m        Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-5zz4m                              Liveness probe failed: command timed out
openshift-openstack-infra                         19m        Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-9c65w                              Liveness probe failed: command timed out
openshift-openstack-infra                         4h47m      Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-cq99b                              Liveness probe failed: command timed out
openshift-openstack-infra                         2m57s      Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-fjlfd                              Liveness probe failed: command timed out
openshift-openstack-infra                         9m8s       Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-ksmp9                              Liveness probe failed: command timed out
openshift-openstack-infra                         1h40m      Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-lhnkw                              Liveness probe failed: command timed out
openshift-openstack-infra                         14m        Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-ttt4m                              Liveness probe failed: command timed out
openshift-openstack-infra                         4h14m      Warning  Unhealthy                pod/keepalived-ocp-xlkqn-worker-0-zl8ff                              Liveness probe failed: command timed out
openshift-openstack-infra                         5h7m       Warning  Unhealthy                pod/mdns-publisher-ocp-xlkqn-worker-0-zl8ff                          Liveness probe failed: command timed out

How reproducible:

Install a cluster 4.7 on OpenStack using IPI 

Steps to Reproduce:

Install a cluster 4.7 on OpenStack using IPI

Actual results:

The alerts are getting triggered with liveness probe failed for keepalived pods but there is no actual error seen in the cluster

Expected results:

Alerts should not be triggered if there is no issue.

Additional info:

Comment 3 Ben Nemec 2021-08-03 22:21:35 UTC
Hmm, interesting. Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1949664, but not quite the same thing. I've gone ahead and backported that fix to 4.7, but I think this one may require a different fix because I still see the same behavior reported here on my local 4.8 cluster.

Comment 17 errata-xmlrpc 2021-10-18 17:42:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.