Bug 1775123

Summary: OSP Keepalived static pod crashlooping
Product: OpenShift Container Platform Reporter: Tomas Sedovic <tsedovic>
Component: InstallerAssignee: Tomas Sedovic <tsedovic>
Installer sub component: OpenShift on OpenStack QA Contact: David Sanz <dsanzmor>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bperkins, dsanzmor
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1775119 Environment:
Last Closed: 2020-01-07 17:55:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1775119    
Bug Blocks:    

Description Tomas Sedovic 2019-11-21 12:28:40 UTC
+++ This bug was initially created as a clone of Bug #1775119 +++

Description of problem:

The keepalived static pod on the bootstrap machine keeps crashlooping. This is because the VRID we calculate occasionally ends up being 0 which keepalived considers invalid.

How reproducible: random

Actual results:

The keepalived static pod does not start up properly.

Expected results:

The keepalived pod should always start up.

--- Additional comment from Tomas Sedovic on 2019-11-21 12:24:48 UTC ---

Keepalived logs on the bootstrap machine:

Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Opening file '/etc/keepalived/keepalived.conf'.
Starting VRRP child process, pid=7
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Registering gratuitous ARP shared channel
Opening file '/etc/keepalived/keepalived.conf'.
VRRP Error : VRID not valid - must be between 1 & 255. reconfigure !
Truncating auth_pass to 8 characters
Truncating auth_pass to 8 characters
VRRP_Instance(c3rs517m-90437_API) the virtual id must be set!
Stopped
Keepalived_vrrp exited with permanent error CONFIG. Terminating
Stopping
Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2

--- Additional comment from Tomas Sedovic on 2019-11-21 12:25:40 UTC ---

Github issue: https://github.com/openshift/baremetal-runtimecfg/issues/21

--- Additional comment from Tomas Sedovic on 2019-11-21 12:26:44 UTC ---

Fixed by: https://github.com/openshift/baremetal-runtimecfg/pull/23

--- Additional comment from Tomas Sedovic on 2019-11-21 12:27:07 UTC ---

We're no longer seeing these issues in our CI.

Comment 3 David Sanz 2019-12-17 10:23:27 UTC
Verified on 4.3.0-0.nightly-2019-12-13-180405

# crictl logs -f 6cc8c7a189023
Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Opening file '/etc/keepalived/keepalived.conf'.
Starting VRRP child process, pid=7
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Registering gratuitous ARP shared channel
Opening file '/etc/keepalived/keepalived.conf'.
Truncating auth_pass to 8 characters
Truncating auth_pass to 8 characters
VRRP_Instance(mrnd-13-43-no_API) removing protocol VIPs.
VRRP_Instance(mrnd-13-43-no_DNS) removing protocol VIPs.
Using LinkWatch kernel netlink reflector...
VRRP_Instance(mrnd-13-43-no_API) Entering BACKUP STATE
VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(9,10)]
VRRP_Instance(mrnd-13-43-no_DNS) Transition to MASTER STATE
VRRP_Instance(mrnd-13-43-no_DNS) Entering MASTER STATE
VRRP_Instance(mrnd-13-43-no_DNS) setting protocol VIPs.
Sending gratuitous ARP on ens3 for 192.168.0.6
VRRP_Instance(mrnd-13-43-no_DNS) Sending/queueing gratuitous ARPs on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
VRRP_Instance(mrnd-13-43-no_API) Transition to MASTER STATE
VRRP_Instance(mrnd-13-43-no_API) Entering MASTER STATE
VRRP_Instance(mrnd-13-43-no_API) setting protocol VIPs.
Sending gratuitous ARP on ens3 for 192.168.0.5
VRRP_Instance(mrnd-13-43-no_API) Sending/queueing gratuitous ARPs on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.6
VRRP_Instance(mrnd-13-43-no_DNS) Sending/queueing gratuitous ARPs on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.6
Sending gratuitous ARP on ens3 for 192.168.0.5
VRRP_Instance(mrnd-13-43-no_API) Sending/queueing gratuitous ARPs on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5
Sending gratuitous ARP on ens3 for 192.168.0.5

Comment 5 errata-xmlrpc 2020-01-07 17:55:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0014