Bug 1804793

Summary: [IPI baremetal]: during bootstrap, two dhcp servers could be active on the provisioning network
Product: OpenShift Container Platform Reporter: Stephen Benjamin <stbenjam>
Component: InstallerAssignee: Stephen Benjamin <stbenjam>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Nataf Sharabi <nsharabi>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: augol, rbartal, vvoronko
Version: 4.4   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1800746 Environment:
Last Closed: 2020-05-04 11:37:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1800746    
Bug Blocks:    

Description Stephen Benjamin 2020-02-19 16:09:13 UTC
+++ This bug was initially created as a clone of Bug #1800746 +++

The bootstrap VM can now co-exist with machine-api being up. That means there could be an instance of Ironic, dnsmasq, etc running in both the cluster and the bootstrap. This causes problems, as it's not deterministic which dnsmasq instance the worker provisioned by the machine-api will use. If bootstrap responds first then the worker will not come online as it'll be pointing at the wrong place.

This is causing a percentage of baremetal installs to fail, with the worker being offline, ingress and other operators never come up.

Comment 3 Nataf Sharabi 2020-03-24 16:18:06 UTC
In order to verify: 

1.During installation notice that the bootstrap machine is created:
  virsh list --all
  Id    Name                               State
  ----------------------------------------------------
   219   provisionhost-0                    running
   220   ocp-edge-cluster-77jtp-bootstrap   running

2. from baremetal run : 
   virsh console ocp-edge-cluster-77jtp-bootstrap

3. You should see in the console:
   ens3: 192.168.123.126 fe80::9337:ec5a:fc32:16c1                                                                                                                                               
   ens4:  fd00:1101::2  

4. from baremetal run:
   ssh kni@provisionhost

5.from provisionhost run:
  ssh core.123.126

6.from bootstrap run:
   sudo ip6tables -t raw -L


Chain PREROUTING (policy ACCEPT)                                                                                                                                                              
target     prot opt source               destination                                                                                                                                          
DHCP       udp      anywhere             anywhere             udp dpt:bootps                                                                                                                  
DHCP       udp      anywhere             anywhere             udp dpt:dhcpv6-server                                                                                                           
                                                                                                                                                                                              
Chain OUTPUT (policy ACCEPT)                                                                                                                                                                  
target     prot opt source               destination                                                                                                                                          
                                                                                                                                                                                              
Chain DHCP (2 references)                                                                                                                                                                     
target     prot opt source               destination                                                                                                                                          
ACCEPT     all      anywhere             anywhere             MAC 52:54:00:2B:C2:2A                                                                                                           
ACCEPT     all      anywhere             anywhere             MAC 52:54:00:07:5C:BA                                                                                                           
ACCEPT     all      anywhere             anywhere             MAC 52:54:00:47:48:CB
DROP       all      anywhere             anywhere            


The rules match the code in : https://github.com/openshift/installer/pull/3079/files
                              https://github.com/openshift/installer/pull/3243/files

Comment 5 errata-xmlrpc 2020-05-04 11:37:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581