Bug 1630485

Summary: Undercloud keepalived instance collides with the keepalived deployed on the Openshift overcloud
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Martin André <m.andre>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: bperkins, dbecker, jtrowbri, m.andre, mburns, morazi, racedoro, tsedovic
Target Milestone: betaKeywords: Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060862.ffbe879.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-11 11:53:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2018-09-18 18:19:18 UTC
Description of problem:
Undercloud keepalived instance collides with the keepalived deployed on the Openshift overcloud:

Checking /var/log/messages on undercloud and master nodes we can a huge amount of 'bogus VRRP packet received' messages:

[...]
Sep 18 14:13:56 undercloud-0 Keepalived_vrrp[11]: (52): ip address associated with VRID 52 not present in MASTER advert : 192.168.24.2
Sep 18 14:13:56 undercloud-0 Keepalived_vrrp[11]: bogus VRRP packet received on br-ctlplane !!!
Sep 18 14:13:56 undercloud-0 Keepalived_vrrp[11]: VRRP_Instance(52) Dropping received VRRP packet...
Sep 18 14:13:57 undercloud-0 Keepalived_vrrp[11]: (51): ip address associated with VRID 51 not present in MASTER advert : 192.168.24.3
Sep 18 14:13:57 undercloud-0 Keepalived_vrrp[11]: bogus VRRP packet received on br-ctlplane !!!
Sep 18 14:13:57 undercloud-0 Keepalived_vrrp[11]: VRRP_Instance(51) Dropping received VRRP packet...
[...]
p 18 14:15:37 openshift-openshiftmaster-0 Keepalived_vrrp[10]: (51): ip address associated with VRID 51 not present in MASTER advert : 192.168.24.9
Sep 18 14:15:37 openshift-openshiftmaster-0 Keepalived_vrrp[10]: bogus VRRP packet received on eth0 !!!
Sep 18 14:15:37 openshift-openshiftmaster-0 Keepalived_vrrp[10]: VRRP_Instance(51) Dropping received VRRP packet...
Sep 18 14:15:37 openshift-openshiftmaster-0 Keepalived_vrrp[10]: (52): ip address associated with VRID 52 not present in MASTER advert : 192.168.24.9
Sep 18 14:15:37 openshift-openshiftmaster-0 Keepalived_vrrp[10]: bogus VRRP packet received on eth0 !!!
Sep 18 14:15:37 openshift-openshiftmaster-0 Keepalived_vrrp[10]: VRRP_Instance(52) Dropping received VRRP packet...
[...]

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy Openshift via OSPd
2. Check /var/log/messages on undercloud and master nodes

Actual results:
Keepalived on undercloud and master nodes reports bogus VRRP packet messages because keepalived configures the same vrrp instances:

on undercloud:

(undercloud) [stack@undercloud-0 ~]$ docker exec keepalived cat /etc/keepalived/keepalived.conf
# This file is managed by Puppet. DO NOT EDIT.
global_defs {
  notification_email {
    root
  }
  notification_email_from keepalived
  smtp_server localhost
  smtp_connect_timeout 30
  router_id undercloud-0
}

static_ipaddress {
  
}

vrrp_script haproxy {
  script   "test -S /var/lib/haproxy/stats && echo "show info" | socat /var/lib/haproxy/stats stdio"
  interval 2
  weight   2
}
vrrp_instance 51 {
  virtual_router_id 51

  # Advert interval
  advert_int 1

  # for electing MASTER, highest priority wins.
  priority  101
  state     MASTER

  interface br-ctlplane

  virtual_ipaddress {

      192.168.24.3 dev br-ctlplane
  }

  track_script {
  haproxy
  }




}
vrrp_instance 52 {
  virtual_router_id 52

  # Advert interval
  advert_int 1

  # for electing MASTER, highest priority wins.
  priority  101
  state     MASTER

  interface br-ctlplane

  virtual_ipaddress {

      192.168.24.2 dev br-ctlplane
  }

  track_script {
  haproxy
  }




}

on overcloud:
[root@openshift-openshiftmaster-0 heat-admin]# docker exec keepalived cat /etc/keepalived/keepalived.conf
# This file is managed by Puppet. DO NOT EDIT.
global_defs {
  notification_email {
    root@localdomain
  }
  notification_email_from keepalived@localdomain
  smtp_server localhost
  smtp_connect_timeout 30
  router_id openshift-openshiftmaster-0
}

static_ipaddress {
  
}

vrrp_script haproxy {
  script   "test -S /var/lib/haproxy/stats && echo "show info" | socat /var/lib/haproxy/stats stdio"
  interval 2
  weight   2
}
vrrp_instance 51 {
  virtual_router_id 51

  # Advert interval
  advert_int 1

  # for electing MASTER, highest priority wins.
  priority  101
  state     MASTER

  interface eth0

  virtual_ipaddress {

      192.168.24.9 dev eth0
  }

  track_script {
  haproxy
  }




}
vrrp_instance 52 {
  virtual_router_id 52

  # Advert interval
  advert_int 1

  # for electing MASTER, highest priority wins.
  priority  101
  state     MASTER

  interface eth0

  virtual_ipaddress {

      192.168.24.9 dev eth0
  }

  track_script {
  haproxy
  }




}
vrrp_instance 53 {
  virtual_router_id 53

  # Advert interval
  advert_int 1

  # for electing MASTER, highest priority wins.
  priority  101
  state     MASTER

  interface eth0

  virtual_ipaddress {

      192.168.24.6/32 dev eth0
  }

  track_script {
  haproxy
  }




}


Expected results:
Different vrrp instances on undercloud and overcloud.

Additional info:

Comment 1 Martin André 2018-10-09 06:25:43 UTC
Fixing it upstream by using a different base virtual_router_id for openshift master nodes: https://review.openstack.org/#/c/608719/

Comment 10 errata-xmlrpc 2019-01-11 11:53:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045