Bug 1826053

Summary: [IPI][Baremetal] Keepalived container stopped working after applying node's networking configuration change
Product: OpenShift Container Platform Reporter: Yossi Boaron <yboaron>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Status: CLOSED ERRATA QA Contact: Victor Voronkov <vvoronko>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: augol, bperkins, eweiss, ncocker, smilner
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Updating node's networking post-deployment (for example by using kind: MachineConfig) in such a way that the interface that was originally carrying VIPs is attached to a bridge (OVS or Linux bridge). Consequence: While the Keepalived-monitor observed the change and rendered new config, Keepalived container failed with 'permanent error CONFIG' and not being restarted by Kubelet. Fix: Update Liveness probe of Keepalived container to check also Keepalived process existence. Result: If for any reason the Keepalived process exits, Kubelet will detect this and restart the Keepalived container.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:29:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yossi Boaron 2020-04-20 19:02:09 UTC
Description of problem:

This use case was reported by Petr Horacek.

Use case details:
I reconfigured the host by running [A], so the interface that was originally carrying VIPs is attached to an OVS bridge.

While the keepalived-monitor observed the change and rendered new config, keepalived container failed with 'permanent error CONFIG' and not being restarted by Kubelet.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Stop/ kill the keepalives process inside keepalived container (use sudo crictl)
2. Using *sudo crictl ps* and the keepalived logs, verify the container was restarted properly by kubelet.




Actual results:
Keepalived container stopped working (error CONFIG) and kubelet doesn't restart the container.


Expected results:
keepalived-monitor should render a new keepalived config file.
Keepalived should be restarted, and apply the new config.

[A] https://access.redhat.com/articles/4934131

Comment 2 Eldar Weiss 2020-07-05 07:58:25 UTC
Please add Reproduction steps as [A] link is unavailable.

Comment 3 Yossi Boaron 2020-07-05 16:34:50 UTC
You can stop/kill the keepalived process inside the Kepelaived container and verify that the container is restarted properly by Kubelet.

Comment 6 errata-xmlrpc 2020-07-13 17:29:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 7 Yossi Boaron 2020-09-09 13:46:06 UTC
*** Bug 1851447 has been marked as a duplicate of this bug. ***